---
base_model: Daffaadityp/AxonAI-MX4-2.0
language:
- en
- id
license: apache-2.0
tags:
- gguf
- quantized
- qwen3
- dora
- axonlabs
- reasoning
- local-llm
- chain-of-thought
- edge-ai
- ollama
- llama-cpp
- indonesian-ai
- text-generation
- 4b
- instruct
pipeline_tag: text-generation
library_name: gguf
---
# ๐ง Poterry AI โ GGUF Quantized Edition
### *Reasoning-First Language Model ยท 4B Parameters ยท Chain-of-Thought Native*
### *Optimized for Local Inference ยท Edge Devices ยท Laptops ยท Offline AI*
[](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0)
[](https://github.com/ggerganov/llama.cpp)
[](https://github.com/ggerganov/llama.cpp#quantization)
[](https://ollama.com)
[](https://github.com/ggerganov/llama.cpp)
[](https://lmstudio.ai)
[](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0)
[](https://www.apache.org/licenses/LICENSE-2.0)
[](https://github.com/Daffaadityp)
> **This repository contains the official GGUF quantized files for AxonAI MX4 2.0.**
> Run a full Chain-of-Thought reasoning LLM *entirely locally* โ no GPU required, no internet connection, no API costs. Just pure, structured intelligence on your own hardware.
---
## ๐ Quick Navigation
| Section | Description |
|---|---|
| [๐๏ธ Available Files](#๏ธ-available-gguf-files--quantization-guide) | Q2_K, Q4_K_M, Q8_0 โ which one is right for you? |
| [๐ Ollama Quickstart](#-ollama-quickstart-recommended) | Easiest way to run locally โ one command |
| [โ๏ธ llama.cpp CLI](#๏ธ-llamacpp-cli) | For advanced users and scripting |
| [๐ฅ๏ธ LM Studio / GPT4All](#๏ธ-lm-studio--gpt4all) | GUI-based local inference |
| [๐งฌ Why Quantized Reasoning?](#-why-a-quantized-reasoning-model-is-so-powerful) | The secret sauce โ explained for GGUF |
| [๐ ๏ธ Prompt Format](#๏ธ-prompt--system-format) | How to structure your prompts |
| [๐ฎ๐ฉ Komunitas Indonesia](#-untuk-developer-indonesia) | Untuk para developer Tanah Air |
---
## ๐ What Is This Repository?
This is the **official GGUF release** of [AxonAI MX4 2.0](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0), a 4-billion-parameter reasoning-first language model built by **AxonLabs** (SMKN 26 Jakarta). The original model was trained using **DoRA (Weight-Decomposed Low-Rank Adaptation)** on top of the Qwen3 architecture, fine-tuned to produce structured, transparent Chain-of-Thought (``) reasoning before every final response.
These GGUF files were produced using `llama.cpp`'s official quantization pipeline, preserving the model's reasoning depth while dramatically reducing memory footprint โ making **local LLM inference** accessible on consumer hardware.
**If you want the full-precision FP16/BF16 weights**, visit the original repository:
๐ [`Daffaadityp/AxonAI-MX4-2.0`](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0)
---
## ๐๏ธ Available GGUF Files & Quantization Guide
Choose the right quantization level for your hardware. As a general rule: **higher Q = better quality, higher RAM requirement**.
| File | Quant Type | Size (Est.) | Min RAM | Quality | Use Case |
|---|---|---|---|---|---|
| `AxonAI-MX4-2.0-Q2_K.gguf` | Q2_K | ~1.7 GB | 4 GB | โก Fast / Compressed | Raspberry Pi, very old laptops, extreme RAM constraints |
| `AxonAI-MX4-2.0-Q4_K_M.gguf` | Q4_K_M | ~2.7 GB | 6 GB | โญ **Recommended** | Mac M1/M2, standard laptops, WSL2, most modern CPUs |
| `AxonAI-MX4-2.0-Q8_0.gguf` | Q8_0 | ~4.5 GB | 8 GB | ๐ฌ Near-FP16 | Workstations, gaming PCs with ample RAM, power users |
### โญ Recommendation: Start with `Q4_K_M`
`Q4_K_M` is the universally recommended sweet spot for local LLM inference. It delivers:
- **~95% of the full-precision model quality** at less than 35% of the memory cost
- Excellent performance on **Apple Silicon (M1/M2/M3)**, standard x86 laptops, and cloud VMs
- The best balance of **inference speed**, **reasoning coherence**, and **RAM efficiency**
> ๐ก For most users: **Q4_K_M is the right choice. Start here.**
---
## ๐ Ollama Quickstart (Recommended)
[Ollama](https://ollama.com) is the fastest way to run AxonAI MX4 2.0 locally. No Python setup required.
### Step 1 โ Install Ollama
```bash
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows: Download installer from https://ollama.com/download
```
### Step 2 โ Create a Modelfile
Create a file named `Modelfile` (no extension) in your working directory:
```dockerfile
# Modelfile for AxonAI MX4 2.0 (Q4_K_M - Recommended)
FROM ./AxonAI-MX4-2.0-Q4_K_M.gguf
# --- Core Identity & Reasoning System Prompt ---
SYSTEM """
You are AxonAI, an advanced reasoning assistant developed by AxonLabs.
Before answering any question, you MUST use your internal scratchpad enclosed in ... tags to reason step-by-step.
Only after completing your reasoning should you provide a clear, structured, and helpful final answer.
Be precise, thorough, and transparent in your logic.
"""
# --- Generation Parameters (Optimized for Reasoning) ---
PARAMETER temperature 0.6
PARAMETER top_p 0.95
PARAMETER top_k 20
PARAMETER repeat_penalty 1.1
PARAMETER num_ctx 8192
```
> ๐ก **Why the `` system prompt?** AxonAI MX4 2.0 was fine-tuned with Chain-of-Thought supervision. Including this system prompt *unlocks* the model's full reasoning capability. Without it, you may get direct answers without the structured deliberation the model was trained to produce.
### Step 3 โ Build and Run
```bash
# Build the local Ollama model from your Modelfile
ollama create axonai-mx4 -f ./Modelfile
# Run it interactively
ollama run axonai-mx4
# Or run with a direct prompt
ollama run axonai-mx4 "Explain the P vs NP problem and whether you think it will ever be solved."
```
### Using the Ollama REST API
Once running, Ollama exposes a local REST API โ perfect for integrations:
```bash
curl http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{
"model": "axonai-mx4",
"prompt": "What are the ethical implications of deploying AI in judicial systems?",
"stream": false
}'
```
---
## โ๏ธ llama.cpp CLI
For advanced users, scripting pipelines, or maximum performance control.
### Install llama.cpp
```bash
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release -j$(nproc)
```
### Run Inference
```bash
# Basic interactive mode (Q4_K_M recommended)
./build/bin/llama-cli \
-m ./AxonAI-MX4-2.0-Q4_K_M.gguf \
-n 2048 \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--repeat-penalty 1.1 \
--ctx-size 8192 \
-i \
-r "User:" \
--in-prefix " " \
-p "You are AxonAI, a reasoning assistant. Think step by step inside tags before answering.\n\nUser:"
```
```bash
# Single-shot inference (batch/scripting)
./build/bin/llama-cli \
-m ./AxonAI-MX4-2.0-Q8_0.gguf \
-n 1024 \
--temp 0.6 \
--ctx-size 8192 \
-p "<|im_start|>system\nYou are AxonAI. Reason carefully using tags.<|im_end|>\n<|im_start|>user\nSolve: If a train travels 120km at 60km/h, then 80km at 40km/h, what is the average speed for the whole journey?<|im_end|>\n<|im_start|>assistant\n"
```
> ๐ง **Performance tip:** Add `-ngl 99` flag if you have a GPU (NVIDIA/AMD/Metal) to offload layers โ this can yield **3โ10x speedup** even with quantized GGUF files.
---
## ๐ฅ๏ธ LM Studio / GPT4All
Both LM Studio and GPT4All support direct GGUF loading with a graphical interface โ ideal for non-technical users or demos.
**LM Studio:**
1. Download from [lmstudio.ai](https://lmstudio.ai)
2. Go to **Search** โ search `AxonAI` or import GGUF manually via **My Models**
3. Load `AxonAI-MX4-2.0-Q4_K_M.gguf`
4. In the **System Prompt** field, paste the reasoning system prompt from the Modelfile above
5. Start chatting โ LM Studio also exposes a local OpenAI-compatible API on port `1234`
**GPT4All:**
1. Download from [gpt4all.io](https://www.nomic.ai/gpt4all)
2. Under **Add Model** โ choose **Import from file** and select your `.gguf` file
3. GPT4All works entirely offline after the initial load โ perfect for privacy-sensitive use cases
---
## ๐งฌ Why a Quantized Reasoning Model Is So Powerful
Most local LLMs are **answer-first** โ they pattern-match to the most statistically likely response. AxonAI MX4 2.0 is fundamentally different.
It was trained to **reason before it answers** โ meaning every response is preceded by an internal deliberation process encoded inside `...` tags. This is the Chain-of-Thought (CoT) paradigm, and when applied to a quantized local model, several powerful properties emerge:
### ๐ Complete Privacy, Full Intelligence
Your prompts **never leave your machine**. Unlike cloud LLM APIs, there is no data sent to any server. You get structured reasoning capability that rivals much larger models โ entirely offline. This is essential for:
- Legal document analysis
- Medical note summarization
- Private financial reasoning
- Proprietary code review
### ๐ Quantization โ Reasoning Degradation
Unlike factual recall (where quantization can cause more hallucination), **structured reasoning is surprisingly robust** to quantization. The logical flow encoded during DoRA fine-tuning is preserved at 4-bit precision. The model still deliberates. It still checks its own steps. It still produces structured conclusions.
### ๐งฉ The DoRA Advantage
AxonAI MX4 2.0 was adapted using **DoRA (Weight-Decomposed Low-Rank Adaptation)**, which separates weight updates into magnitude and direction components. This produces **more stable, nuanced fine-tuning** than standard LoRA โ and that stability carries through quantization. You get a model that reasons with fidelity even at Q4 compression.
### โก The Efficiency Equation
A 4B parameter model at Q4_K_M runs at **~20โ60 tokens/second** on Apple M-series chips and modern CPUs. That's fast enough for real-time, interactive reasoning โ think of it as having a thoughtful senior analyst available offline, on any machine, forever.
---
## ๐ ๏ธ Prompt & System Format
AxonAI MX4 2.0 uses the **ChatML** prompt template (inherited from Qwen3):
```
<|im_start|>system
{system_prompt}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{internal reasoning โ model generates this}
{final answer โ model generates this}
<|im_end|>
```
### Recommended System Prompt (Full Version)
```
You are AxonAI, an advanced reasoning language model developed by AxonLabs.
Your core capability is structured deliberation: before answering any question,
you MUST think step-by-step inside ... tags.
Guidelines:
- Use to break down the problem, consider edge cases, and verify your logic.
- After , give a clear, well-structured, and helpful final answer.
- Be honest about uncertainty. Never fabricate facts.
- For math and logic, show your work explicitly inside .
- For creative or open-ended tasks, use to plan your response structure.
```
### Minimal System Prompt (Fast / Lightweight)
```
You are AxonAI. Always reason inside ... before your final answer.
```
---
## ๐ Model Architecture & Training Summary
| Property | Value |
|---|---|
| **Base Architecture** | Qwen3 (4B) |
| **Fine-Tuning Method** | DoRA (Weight-Decomposed Low-Rank Adaptation) |
| **Training Paradigm** | Chain-of-Thought Supervised Fine-Tuning |
| **Context Window** | 8,192 tokens |
| **Vocab Size** | 151,936 |
| **Attention Heads** | 32 |
| **Key-Value Heads** | 8 (Grouped Query Attention) |
| **Hidden Dimensions** | 2,048 |
| **GGUF Quantizer** | llama.cpp (official) |
| **Available Quants** | Q2_K, Q4_K_M, Q8_0 |
| **Language Support** | English (primary), Indonesian (strong) |
| **License** | Apache 2.0 |
---
## ๐ฌ Benchmark Context
> AxonAI MX4 2.0 is a research and educational model from AxonLabs. Formal benchmark results are forthcoming. The following reflects qualitative design targets based on the training methodology.
| Capability | Assessment |
|---|---|
| Structured Reasoning (CoT) | โ
Strong โ core training objective |
| Mathematical Problem Solving | โ
Good โ benefiting from step-by-step CoT |
| Code Generation (Python/JS) | โ
Good |
| Factual Q&A (English) | โ
Good |
| Indonesian Language (id) | โ
Good |
| Long-Context Coherence (8K) | โ ๏ธ Moderate โ improves with Q8_0 |
| Complex Multi-Step Agentic Tasks | โ ๏ธ Moderate โ use longer system prompts |
*Community evaluations and PR-based benchmark additions are welcome.*
---
## ๐ฎ๐ฉ Untuk Developer Indonesia
**Halo, Developer Indonesia! ๐**
Ini adalah model AI lokal pertama dari AxonLabs yang bisa kamu jalankan **100% offline di laptop atau PC sendiri** โ tanpa perlu GPU mahal, tanpa biaya API, dan tanpa koneksi internet.
Bayangkan: punya asisten AI yang bisa berpikir langkah demi langkah, memahami konteks, dan menjawab pertanyaan kompleks โ semuanya berjalan di dalam mesin kamu sendiri. Itulah tujuan AxonAI MX4 2.0 GGUF.
**Kenapa ini penting buat kamu?**
- ๐ **Privasi total** โ data kamu tidak pernah keluar dari devicemu
- ๐ธ **Gratis selamanya** โ tidak ada biaya langganan atau token
- ๐ **Bisa dipakai offline** โ di daerah dengan koneksi terbatas sekalipun
- ๐ง **Reasoning-first** โ model ini *mikir dulu* sebelum menjawab, bukan asal tebak
Dibangun oleh pelajar SMK, untuk semua orang Indonesia yang ingin mengeksplorasi AI secara langsung.
> *"AI terbaik adalah AI yang bisa kamu kontrol sendiri."*
> โ AxonLabs, SMKN 26 Jakarta
**Cara paling cepat untuk mulai (5 menit):**
```bash
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Buat Modelfile (lihat panduan di atas), lalu:
ollama create axonai-mx4 -f ./Modelfile
# 3. Jalankan!
ollama run axonai-mx4 "Jelaskan cara kerja transformer architecture dalam bahasa yang mudah dipahami."
```
---
## โ๏ธ License & Usage
This model is released under the **Apache 2.0 License**.
- โ
Free for personal, academic, and commercial use
- โ
Modification and redistribution permitted with attribution
- โ
Derivative models and fine-tunes welcome
- โ Must not be used to generate illegal, harmful, or deceptive content
- โ Attribution to AxonLabs / `Daffaadityp/AxonAI-MX4-2.0` required for derivative releases
---
## ๐ Related Resources
| Resource | Link |
|---|---|
| ๐ง Original FP16 Model | [Daffaadityp/AxonAI-MX4-2.0](https://huggingface.co/Daffaadityp/AxonAI-MX4-2.0) |
| ๐ฆ llama.cpp Repository | [github.com/ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) |
| ๐ฆ Ollama Documentation | [ollama.com/docs](https://ollama.com) |
| ๐ฅ๏ธ LM Studio | [lmstudio.ai](https://lmstudio.ai) |
| ๐ซ AxonLabs / SMKN 26 Jakarta | [Daffaadityp on HuggingFace](https://huggingface.co/Daffaadityp) |
---
## ๐ฌ Community & Feedback
Found a bug? Have a benchmark result to share? Want to contribute evaluation data?
- **Open a Discussion** on this HuggingFace repository
- **Open an Issue** on the [AxonAI GitHub](https://github.com/Daffaadityp) (if available)
- **Community evaluations are actively welcomed** โ especially Indonesian-language benchmarks
---
*Built with ๐ง by AxonLabs ยท SMKN 26 Jakarta ยท Indonesia ๐ฎ๐ฉ*
*"Intelligence is not about speed. It's about depth of thought."*
*"Michie Edition"*
[](https://huggingface.co/Daffaadityp)