--- base_model: Daffaadityp/AxonAI-MX4-2.0 language: - en - id license: apache-2.0 tags: - gguf - quantized - qwen3 - dora - axonlabs - reasoning - local-llm - chain-of-thought - edge-ai - ollama - llama-cpp - indonesian-ai - text-generation - 4b - instruct pipeline_tag: text-generation library_name: gguf ---
# ๐Ÿง  Poterry AI โ€” GGUF Quantized Edition ### *Reasoning-First Language Model ยท 4B Parameters ยท Chain-of-Thought Native* ### *Optimized for Local Inference ยท Edge Devices ยท Laptops ยท Offline AI*
[![Model](https://img.shields.io/badge/Base%20Model-AxonAI%20MX4%202.0-blueviolet?style=for-the-badge&logo=huggingface)](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0) [![Format](https://img.shields.io/badge/Format-GGUF-orange?style=for-the-badge&logo=llvm)](https://github.com/ggerganov/llama.cpp) [![Quantization](https://img.shields.io/badge/Quants-Q2__K%20%7C%20Q4__K__M%20%7C%20Q8__0-brightgreen?style=for-the-badge)](https://github.com/ggerganov/llama.cpp#quantization) [![Ollama](https://img.shields.io/badge/Ollama-Compatible-informational?style=for-the-badge&logo=ollama)](https://ollama.com) [![llama.cpp](https://img.shields.io/badge/llama.cpp-Compatible-success?style=for-the-badge)](https://github.com/ggerganov/llama.cpp) [![LM Studio](https://img.shields.io/badge/LM%20Studio-Compatible-9cf?style=for-the-badge)](https://lmstudio.ai) [![Parameters](https://img.shields.io/badge/Parameters-4B-blue?style=for-the-badge)](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0) [![License](https://img.shields.io/badge/License-Apache%202.0-red?style=for-the-badge)](https://www.apache.org/licenses/LICENSE-2.0) [![Made in Indonesia](https://img.shields.io/badge/Made%20in-Indonesia%20๐Ÿ‡ฎ๐Ÿ‡ฉ-red?style=for-the-badge)](https://github.com/Daffaadityp)
> **This repository contains the official GGUF quantized files for AxonAI MX4 2.0.** > Run a full Chain-of-Thought reasoning LLM *entirely locally* โ€” no GPU required, no internet connection, no API costs. Just pure, structured intelligence on your own hardware.
--- ## ๐Ÿ“Œ Quick Navigation | Section | Description | |---|---| | [๐Ÿ—‚๏ธ Available Files](#๏ธ-available-gguf-files--quantization-guide) | Q2_K, Q4_K_M, Q8_0 โ€” which one is right for you? | | [๐Ÿš€ Ollama Quickstart](#-ollama-quickstart-recommended) | Easiest way to run locally โ€” one command | | [โš™๏ธ llama.cpp CLI](#๏ธ-llamacpp-cli) | For advanced users and scripting | | [๐Ÿ–ฅ๏ธ LM Studio / GPT4All](#๏ธ-lm-studio--gpt4all) | GUI-based local inference | | [๐Ÿงฌ Why Quantized Reasoning?](#-why-a-quantized-reasoning-model-is-so-powerful) | The secret sauce โ€” explained for GGUF | | [๐Ÿ› ๏ธ Prompt Format](#๏ธ-prompt--system-format) | How to structure your prompts | | [๐Ÿ‡ฎ๐Ÿ‡ฉ Komunitas Indonesia](#-untuk-developer-indonesia) | Untuk para developer Tanah Air | --- ## ๐ŸŒ What Is This Repository? This is the **official GGUF release** of [AxonAI MX4 2.0](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0), a 4-billion-parameter reasoning-first language model built by **AxonLabs** (SMKN 26 Jakarta). The original model was trained using **DoRA (Weight-Decomposed Low-Rank Adaptation)** on top of the Qwen3 architecture, fine-tuned to produce structured, transparent Chain-of-Thought (``) reasoning before every final response. These GGUF files were produced using `llama.cpp`'s official quantization pipeline, preserving the model's reasoning depth while dramatically reducing memory footprint โ€” making **local LLM inference** accessible on consumer hardware. **If you want the full-precision FP16/BF16 weights**, visit the original repository: ๐Ÿ‘‰ [`Daffaadityp/AxonAI-MX4-2.0`](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0) --- ## ๐Ÿ—‚๏ธ Available GGUF Files & Quantization Guide Choose the right quantization level for your hardware. As a general rule: **higher Q = better quality, higher RAM requirement**. | File | Quant Type | Size (Est.) | Min RAM | Quality | Use Case | |---|---|---|---|---|---| | `AxonAI-MX4-2.0-Q2_K.gguf` | Q2_K | ~1.7 GB | 4 GB | โšก Fast / Compressed | Raspberry Pi, very old laptops, extreme RAM constraints | | `AxonAI-MX4-2.0-Q4_K_M.gguf` | Q4_K_M | ~2.7 GB | 6 GB | โญ **Recommended** | Mac M1/M2, standard laptops, WSL2, most modern CPUs | | `AxonAI-MX4-2.0-Q8_0.gguf` | Q8_0 | ~4.5 GB | 8 GB | ๐Ÿ”ฌ Near-FP16 | Workstations, gaming PCs with ample RAM, power users | ### โญ Recommendation: Start with `Q4_K_M` `Q4_K_M` is the universally recommended sweet spot for local LLM inference. It delivers: - **~95% of the full-precision model quality** at less than 35% of the memory cost - Excellent performance on **Apple Silicon (M1/M2/M3)**, standard x86 laptops, and cloud VMs - The best balance of **inference speed**, **reasoning coherence**, and **RAM efficiency** > ๐Ÿ’ก For most users: **Q4_K_M is the right choice. Start here.** --- ## ๐Ÿš€ Ollama Quickstart (Recommended) [Ollama](https://ollama.com) is the fastest way to run AxonAI MX4 2.0 locally. No Python setup required. ### Step 1 โ€” Install Ollama ```bash # macOS / Linux curl -fsSL https://ollama.com/install.sh | sh # Windows: Download installer from https://ollama.com/download ``` ### Step 2 โ€” Create a Modelfile Create a file named `Modelfile` (no extension) in your working directory: ```dockerfile # Modelfile for AxonAI MX4 2.0 (Q4_K_M - Recommended) FROM ./AxonAI-MX4-2.0-Q4_K_M.gguf # --- Core Identity & Reasoning System Prompt --- SYSTEM """ You are AxonAI, an advanced reasoning assistant developed by AxonLabs. Before answering any question, you MUST use your internal scratchpad enclosed in ... tags to reason step-by-step. Only after completing your reasoning should you provide a clear, structured, and helpful final answer. Be precise, thorough, and transparent in your logic. """ # --- Generation Parameters (Optimized for Reasoning) --- PARAMETER temperature 0.6 PARAMETER top_p 0.95 PARAMETER top_k 20 PARAMETER repeat_penalty 1.1 PARAMETER num_ctx 8192 ``` > ๐Ÿ’ก **Why the `` system prompt?** AxonAI MX4 2.0 was fine-tuned with Chain-of-Thought supervision. Including this system prompt *unlocks* the model's full reasoning capability. Without it, you may get direct answers without the structured deliberation the model was trained to produce. ### Step 3 โ€” Build and Run ```bash # Build the local Ollama model from your Modelfile ollama create axonai-mx4 -f ./Modelfile # Run it interactively ollama run axonai-mx4 # Or run with a direct prompt ollama run axonai-mx4 "Explain the P vs NP problem and whether you think it will ever be solved." ``` ### Using the Ollama REST API Once running, Ollama exposes a local REST API โ€” perfect for integrations: ```bash curl http://localhost:11434/api/generate \ -H "Content-Type: application/json" \ -d '{ "model": "axonai-mx4", "prompt": "What are the ethical implications of deploying AI in judicial systems?", "stream": false }' ``` --- ## โš™๏ธ llama.cpp CLI For advanced users, scripting pipelines, or maximum performance control. ### Install llama.cpp ```bash git clone https://github.com/ggerganov/llama.cpp cd llama.cpp cmake -B build cmake --build build --config Release -j$(nproc) ``` ### Run Inference ```bash # Basic interactive mode (Q4_K_M recommended) ./build/bin/llama-cli \ -m ./AxonAI-MX4-2.0-Q4_K_M.gguf \ -n 2048 \ --temp 0.6 \ --top-p 0.95 \ --top-k 20 \ --repeat-penalty 1.1 \ --ctx-size 8192 \ -i \ -r "User:" \ --in-prefix " " \ -p "You are AxonAI, a reasoning assistant. Think step by step inside tags before answering.\n\nUser:" ``` ```bash # Single-shot inference (batch/scripting) ./build/bin/llama-cli \ -m ./AxonAI-MX4-2.0-Q8_0.gguf \ -n 1024 \ --temp 0.6 \ --ctx-size 8192 \ -p "<|im_start|>system\nYou are AxonAI. Reason carefully using tags.<|im_end|>\n<|im_start|>user\nSolve: If a train travels 120km at 60km/h, then 80km at 40km/h, what is the average speed for the whole journey?<|im_end|>\n<|im_start|>assistant\n" ``` > ๐Ÿ”ง **Performance tip:** Add `-ngl 99` flag if you have a GPU (NVIDIA/AMD/Metal) to offload layers โ€” this can yield **3โ€“10x speedup** even with quantized GGUF files. --- ## ๐Ÿ–ฅ๏ธ LM Studio / GPT4All Both LM Studio and GPT4All support direct GGUF loading with a graphical interface โ€” ideal for non-technical users or demos. **LM Studio:** 1. Download from [lmstudio.ai](https://lmstudio.ai) 2. Go to **Search** โ†’ search `AxonAI` or import GGUF manually via **My Models** 3. Load `AxonAI-MX4-2.0-Q4_K_M.gguf` 4. In the **System Prompt** field, paste the reasoning system prompt from the Modelfile above 5. Start chatting โ€” LM Studio also exposes a local OpenAI-compatible API on port `1234` **GPT4All:** 1. Download from [gpt4all.io](https://www.nomic.ai/gpt4all) 2. Under **Add Model** โ†’ choose **Import from file** and select your `.gguf` file 3. GPT4All works entirely offline after the initial load โ€” perfect for privacy-sensitive use cases --- ## ๐Ÿงฌ Why a Quantized Reasoning Model Is So Powerful Most local LLMs are **answer-first** โ€” they pattern-match to the most statistically likely response. AxonAI MX4 2.0 is fundamentally different. It was trained to **reason before it answers** โ€” meaning every response is preceded by an internal deliberation process encoded inside `...` tags. This is the Chain-of-Thought (CoT) paradigm, and when applied to a quantized local model, several powerful properties emerge: ### ๐Ÿ”’ Complete Privacy, Full Intelligence Your prompts **never leave your machine**. Unlike cloud LLM APIs, there is no data sent to any server. You get structured reasoning capability that rivals much larger models โ€” entirely offline. This is essential for: - Legal document analysis - Medical note summarization - Private financial reasoning - Proprietary code review ### ๐Ÿ“‰ Quantization โ‰  Reasoning Degradation Unlike factual recall (where quantization can cause more hallucination), **structured reasoning is surprisingly robust** to quantization. The logical flow encoded during DoRA fine-tuning is preserved at 4-bit precision. The model still deliberates. It still checks its own steps. It still produces structured conclusions. ### ๐Ÿงฉ The DoRA Advantage AxonAI MX4 2.0 was adapted using **DoRA (Weight-Decomposed Low-Rank Adaptation)**, which separates weight updates into magnitude and direction components. This produces **more stable, nuanced fine-tuning** than standard LoRA โ€” and that stability carries through quantization. You get a model that reasons with fidelity even at Q4 compression. ### โšก The Efficiency Equation A 4B parameter model at Q4_K_M runs at **~20โ€“60 tokens/second** on Apple M-series chips and modern CPUs. That's fast enough for real-time, interactive reasoning โ€” think of it as having a thoughtful senior analyst available offline, on any machine, forever. --- ## ๐Ÿ› ๏ธ Prompt & System Format AxonAI MX4 2.0 uses the **ChatML** prompt template (inherited from Qwen3): ``` <|im_start|>system {system_prompt}<|im_end|> <|im_start|>user {user_message}<|im_end|> <|im_start|>assistant {internal reasoning โ€” model generates this} {final answer โ€” model generates this} <|im_end|> ``` ### Recommended System Prompt (Full Version) ``` You are AxonAI, an advanced reasoning language model developed by AxonLabs. Your core capability is structured deliberation: before answering any question, you MUST think step-by-step inside ... tags. Guidelines: - Use to break down the problem, consider edge cases, and verify your logic. - After , give a clear, well-structured, and helpful final answer. - Be honest about uncertainty. Never fabricate facts. - For math and logic, show your work explicitly inside . - For creative or open-ended tasks, use to plan your response structure. ``` ### Minimal System Prompt (Fast / Lightweight) ``` You are AxonAI. Always reason inside ... before your final answer. ``` --- ## ๐Ÿ“Š Model Architecture & Training Summary | Property | Value | |---|---| | **Base Architecture** | Qwen3 (4B) | | **Fine-Tuning Method** | DoRA (Weight-Decomposed Low-Rank Adaptation) | | **Training Paradigm** | Chain-of-Thought Supervised Fine-Tuning | | **Context Window** | 8,192 tokens | | **Vocab Size** | 151,936 | | **Attention Heads** | 32 | | **Key-Value Heads** | 8 (Grouped Query Attention) | | **Hidden Dimensions** | 2,048 | | **GGUF Quantizer** | llama.cpp (official) | | **Available Quants** | Q2_K, Q4_K_M, Q8_0 | | **Language Support** | English (primary), Indonesian (strong) | | **License** | Apache 2.0 | --- ## ๐Ÿ”ฌ Benchmark Context > AxonAI MX4 2.0 is a research and educational model from AxonLabs. Formal benchmark results are forthcoming. The following reflects qualitative design targets based on the training methodology. | Capability | Assessment | |---|---| | Structured Reasoning (CoT) | โœ… Strong โ€” core training objective | | Mathematical Problem Solving | โœ… Good โ€” benefiting from step-by-step CoT | | Code Generation (Python/JS) | โœ… Good | | Factual Q&A (English) | โœ… Good | | Indonesian Language (id) | โœ… Good | | Long-Context Coherence (8K) | โš ๏ธ Moderate โ€” improves with Q8_0 | | Complex Multi-Step Agentic Tasks | โš ๏ธ Moderate โ€” use longer system prompts | *Community evaluations and PR-based benchmark additions are welcome.* --- ## ๐Ÿ‡ฎ๐Ÿ‡ฉ Untuk Developer Indonesia **Halo, Developer Indonesia! ๐Ÿ™Œ** Ini adalah model AI lokal pertama dari AxonLabs yang bisa kamu jalankan **100% offline di laptop atau PC sendiri** โ€” tanpa perlu GPU mahal, tanpa biaya API, dan tanpa koneksi internet. Bayangkan: punya asisten AI yang bisa berpikir langkah demi langkah, memahami konteks, dan menjawab pertanyaan kompleks โ€” semuanya berjalan di dalam mesin kamu sendiri. Itulah tujuan AxonAI MX4 2.0 GGUF. **Kenapa ini penting buat kamu?** - ๐Ÿ”’ **Privasi total** โ€” data kamu tidak pernah keluar dari devicemu - ๐Ÿ’ธ **Gratis selamanya** โ€” tidak ada biaya langganan atau token - ๐ŸŒ **Bisa dipakai offline** โ€” di daerah dengan koneksi terbatas sekalipun - ๐Ÿง  **Reasoning-first** โ€” model ini *mikir dulu* sebelum menjawab, bukan asal tebak Dibangun oleh pelajar SMK, untuk semua orang Indonesia yang ingin mengeksplorasi AI secara langsung. > *"AI terbaik adalah AI yang bisa kamu kontrol sendiri."* > โ€” AxonLabs, SMKN 26 Jakarta **Cara paling cepat untuk mulai (5 menit):** ```bash # 1. Install Ollama curl -fsSL https://ollama.com/install.sh | sh # 2. Buat Modelfile (lihat panduan di atas), lalu: ollama create axonai-mx4 -f ./Modelfile # 3. Jalankan! ollama run axonai-mx4 "Jelaskan cara kerja transformer architecture dalam bahasa yang mudah dipahami." ``` --- ## โš–๏ธ License & Usage This model is released under the **Apache 2.0 License**. - โœ… Free for personal, academic, and commercial use - โœ… Modification and redistribution permitted with attribution - โœ… Derivative models and fine-tunes welcome - โŒ Must not be used to generate illegal, harmful, or deceptive content - โŒ Attribution to AxonLabs / `Daffaadityp/AxonAI-MX4-2.0` required for derivative releases --- ## ๐Ÿ”— Related Resources | Resource | Link | |---|---| | ๐Ÿง  Original FP16 Model | [Daffaadityp/AxonAI-MX4-2.0](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0) | | ๐Ÿ“ฆ llama.cpp Repository | [github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) | | ๐Ÿฆ™ Ollama Documentation | [ollama.com/docs](https://ollama.com) | | ๐Ÿ–ฅ๏ธ LM Studio | [lmstudio.ai](https://lmstudio.ai) | | ๐Ÿซ AxonLabs / SMKN 26 Jakarta | [Daffaadityp on HuggingFace](https://huggingface.co/Daffaadityp) | --- ## ๐Ÿ’ฌ Community & Feedback Found a bug? Have a benchmark result to share? Want to contribute evaluation data? - **Open a Discussion** on this HuggingFace repository - **Open an Issue** on the [AxonAI GitHub](https://github.com/Daffaadityp) (if available) - **Community evaluations are actively welcomed** โ€” especially Indonesian-language benchmarks ---
*Built with ๐Ÿง  by AxonLabs ยท SMKN 26 Jakarta ยท Indonesia ๐Ÿ‡ฎ๐Ÿ‡ฉ* *"Intelligence is not about speed. It's about depth of thought."* *"Michie Edition"* [![HuggingFace](https://img.shields.io/badge/๐Ÿค—%20HuggingFace-Daffaadityp-yellow?style=for-the-badge)](https://huggingface.co/Daffaadityp)