Instructions to use nuroai/auden-12b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use nuroai/auden-12b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="nuroai/auden-12b", filename="q4_k_m/auden-12b-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use nuroai/auden-12b with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf nuroai/auden-12b:Q4_K_M # Run inference directly in the terminal: llama cli -hf nuroai/auden-12b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf nuroai/auden-12b:Q4_K_M # Run inference directly in the terminal: llama cli -hf nuroai/auden-12b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf nuroai/auden-12b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf nuroai/auden-12b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf nuroai/auden-12b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf nuroai/auden-12b:Q4_K_M
Use Docker
docker model run hf.co/nuroai/auden-12b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use nuroai/auden-12b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nuroai/auden-12b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nuroai/auden-12b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nuroai/auden-12b:Q4_K_M
- Ollama
How to use nuroai/auden-12b with Ollama:
ollama run hf.co/nuroai/auden-12b:Q4_K_M
- Unsloth Studio
How to use nuroai/auden-12b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nuroai/auden-12b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for nuroai/auden-12b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for nuroai/auden-12b to start chatting
- Pi
How to use nuroai/auden-12b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf nuroai/auden-12b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "nuroai/auden-12b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use nuroai/auden-12b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf nuroai/auden-12b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default nuroai/auden-12b:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- OpenClaw new
How to use nuroai/auden-12b with OpenClaw:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf nuroai/auden-12b:Q4_K_M
Configure OpenClaw
# Install OpenClaw: npm install -g openclaw@latest # Register the local server and set it as the default model: openclaw onboard --non-interactive --mode local \ --auth-choice custom-api-key \ --custom-base-url http://127.0.0.1:8080/v1 \ --custom-model-id "nuroai/auden-12b:Q4_K_M" \ --custom-provider-id llama-cpp \ --custom-compatibility openai \ --custom-text-input \ --accept-risk \ --skip-health
Run OpenClaw
openclaw agent --local --agent main --message "Hello from Hugging Face"
- Docker Model Runner
How to use nuroai/auden-12b with Docker Model Runner:
docker model run hf.co/nuroai/auden-12b:Q4_K_M
- Lemonade
How to use nuroai/auden-12b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull nuroai/auden-12b:Q4_K_M
Run and chat with the model
lemonade run user.auden-12b-Q4_K_M
List all available models
lemonade list
Auden 12B
Auden turns receipts and invoices into structured tool calls. It is a LoRA fine-tune of Gemma 4 12B (instruction-tuned) that emits OpenAI-style tool_calls — extract_fields with the requested values, or ask_follow_up when the document genuinely doesn't contain what you asked for — instead of free text you have to parse. It ships as GGUF for llama.cpp and runs fully offline on a single consumer GPU or Apple Silicon Mac. Built by NuroAI Labs.
On real receipts (CORD test, OCR text in), Auden v1 reaches 98.95% field-level extraction accuracy and 99% end-to-end success, versus 11.6% / 14% for stock Gemma 4 12B under the identical harness — with hallucinated fields cut from 245 (v0) to 17 and 100% abstention accuracy when a requested field is absent.
Table of contents
- Model details
- Intended uses & limitations
- How to use
- Evaluation
- Training
- Files & checksums
- License & attribution
Model details
| Base model | google/gemma-4-12B-it (12B, instruction-tuned, Apache 2.0) |
| Fine-tuning | LoRA SFT via Unsloth — r=16, alpha=32, attention + MLP projections (text and vision towers) |
| Task | Document field extraction as native Gemma 4 function calls |
| Input | OCR text of a receipt/invoice (see Limitations re: image input) |
| Output | Structured tool_calls (finish_reason: tool_calls) |
| Formats | GGUF Q4_K_M (7.4 GB) and Q8_0 (12.7 GB), each with a BF16 vision projector (mmproj) |
| Runtime | llama.cpp llama-server with --jinja (chat template + tool grammar embedded in the GGUF) |
| Languages | Trained/evaluated on English, Indonesian, Malay, Turkish receipt locales |
| Author | NuroAI Labs |
Tool interface
Auden is trained against a 15-tool document-agent schema rendered through Gemma 4's native <|tool> declarations. The evaluated core:
extract_fields(schema, doc_ref?)— return requested fields from the active documentask_follow_up(question, missing_fields?)— abstain and ask when a requested field is not in the documentno_action_needed(reason, confidence?)— decline when no tool action is warranted
The wider schema also covers query_document, classify_doc, extract_table, summarize, redact, compare_docs, convert, fill_form, search_folder, create_event_from_doc, set_reminder, and draft_email. Published benchmarks exercise extract_fields and ask_follow_up.
Intended uses & limitations
Intended uses
- Local/offline receipt and invoice field extraction (totals, dates, merchants, currencies) feeding bookkeeping, expense, or archival pipelines.
- Agent stacks that need machine-parseable extraction with an explicit abstention path instead of best-guess free text.
Limitations
- Feed OCR text, not images. Image-in extraction through the quantized llama.cpp multimodal path scores poorly for this model family (the unquantized bf16 base reads the same receipts correctly, so this is a quantized-runtime issue, not a model-capability ceiling). Run your own OCR step and pass the text.
- Evaluation covers Indonesian-market CORD receipts and synthetic receipts/invoices in four locales. Long multi-page invoices, handwriting, and very different document styles are uncharacterized.
- Published numbers assume the Auden system prompt and tool schemas (below). The model remains a general Gemma under other prompts, but the benchmarks don't transfer.
- Extraction confidence is not calibrated; the abstention behavior covers missing fields, not uncertain reads. Keep a human in the loop where errors are costly.
- Inherits the biases and failure modes of Gemma 4 12B.
How to use
Serve (llama.cpp)
llama-server -m auden-12b-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
--jinja -ngl 999 --host 127.0.0.1 --port 8089
--jinja is required — it activates the embedded chat template, which renders tool declarations into Gemma 4's native function-calling format.
Call (OpenAI-compatible)
import requests
TOOLS = [{
"type": "function",
"function": {
"name": "extract_fields",
"description": "Extract the requested fields from the active document.",
"parameters": {
"type": "object",
"properties": {
"fields": {"type": "object", "description": "Field name to extracted value."},
"schema": {"type": "object", "description": "Requested field schema."},
},
},
},
}, {
"type": "function",
"function": {
"name": "ask_follow_up",
"description": "Ask the user a question when required information is missing.",
"parameters": {
"type": "object",
"properties": {
"question": {"type": "string"},
"missing_fields": {"type": "array", "items": {"type": "string"}},
},
"required": ["question"],
},
},
}]
payload = {
"model": "auden",
"messages": [
{"role": "system", "content": "You are Auden, a precise local document agent. "
"Use tools only when the document evidence supports them."},
{"role": "user", "content": "OCR text from the active document:\n"
"RECEIPT\nCorner Cafe\nDate: 2025-03-02\n"
"Coffee 2 x 3.50 7.00\nTotal: 7.00\n\nExtract total."},
],
"tools": TOOLS,
"temperature": 0,
}
r = requests.post("http://127.0.0.1:8089/v1/chat/completions", json=payload).json()
print(r["choices"][0]["message"]["tool_calls"])
# -> [{"function": {"name": "extract_fields", "arguments": "{\"fields\":{\"total\":\"7.00\"}, ...}"}}]
If the document lacks the requested field, the model calls ask_follow_up with missing_fields instead of inventing a value.
Evaluation
All three models below were run through the identical harness: same prompts, same tool schemas, same normalized scorer (auden-docbench-v1-normalized), same llama.cpp build (commit 76da2450), all at Q4_K_M, OCR-text input, greedy decoding. Full commands in repro_commands.md.
- CORD 100 — 100 real receipts from the CORD-v2 test split (held out from training; overlap-checked).
- Synthetic 500 — 500 held-out synthetic receipts/invoices across en/id/ms/tr locales.
- Abstention 120 — 120 documents where the requested field is absent; correct behavior is
ask_follow_up.
| Eval | Metric | Stock Gemma 4 12B | Auden v0 | Auden v1 (this release) |
|---|---|---|---|---|
| CORD 100 (real receipts) | Tool-call accuracy | 0.5900 | 1.0000 | 1.0000 |
| CORD 100 (real receipts) | Field/argument match | 0.1158 | 0.8526 | 0.9895 |
| CORD 100 (real receipts) | End-to-end success | 0.1400 | 0.8600 | 0.9900 |
| Synthetic 500 | Tool-call accuracy | 0.9140 | 1.0000 | 1.0000 |
| Synthetic 500 | Field/argument match | 0.0900 | 0.9588 | 0.9638 |
| Synthetic 500 | End-to-end success | 0.0080 | 0.8700 | 0.8940 |
| Abstention 120 | Abstention accuracy | 0.9833 | 0.2833 | 1.0000 |
Field-level failure events on CORD 100 (same analyzer):
| Category | Stock Gemma | Auden v0 | Auden v1 |
|---|---|---|---|
| Hallucinated field | 85 | 245 | 17 |
| Missing field | 84 | 0 | 1 |
| Wrong value | 0 | 14 | 0 |
| Formatting mismatch | 1 | 8 | 2 |
Reading the baseline honestly. Stock Gemma fails to emit any valid extract_fields call on 41/100 real receipts; when it does call, arguments are usually schema-echoes or partial. Its low hallucination count reflects failing to extract at all (84 missing-field events), not precision. Its abstention score is genuinely high (0.9833) because the eval prompt explicitly instructs ask-if-missing behavior and the instruction-tuned base follows instructions — the fine-tune's value is concentrated where it matters: extraction accuracy and hallucination control. v0's abstention regression (0.2833) was fixed in v1 (1.0000).
Training
- Data: 13,558 tool-call traces — synthetic receipts/invoices (procedural generator with missing-total, occlusion, scan-noise, multi-currency, and wrong-document negatives; en/id/ms/tr locales), ~8% abstention traces, ~11% image-grounded traces, ~11% real-receipt traces. Zero overlap with all three eval sets (checked by document ID).
- Procedure: LoRA SFT with Unsloth on Gemma 4 12B-it — r=16, alpha=32, dropout 0, targets q/k/v/o + gate/up/down projections in both text and vision towers. 2,500 steps, lr 2e-4, effective batch 4 (1 × 4 grad-accum), seed 3407, ~0.74 epochs, 91 minutes on 1× H100. Final train loss 0.393.
- Validation gates: every trace passed an argument-match validator (pass rate 1.0); training format locked to the official processor's
apply_chat_template/parse_responseround-trip. - Export: LoRA merged into the bf16 base, converted to GGUF via llama.cpp (
convert_hf_to_gguf), quantized to Q4_K_M and Q8_0.
Files & checksums
| File | Quant | Size | SHA256 |
|---|---|---|---|
q4_k_m/auden-12b-Q4_K_M.gguf |
Q4_K_M | 7.4 GB | 8522b4f696abc5721342f9bad6caeab73659aaf822d4a69931affc222c6fca6f |
q4_k_m/mmproj-BF16.gguf |
BF16 mmproj | 175 MB | df7d0454dfb7a31d02a3845478f655b991c16f5d9b76cd308a96bc2d07c17f7e |
q8_0/auden-12b-Q8_0.gguf |
Q8_0 | 12.7 GB | 156295ec4ebe3eee54a757b194a2bf565d88340c5e9f2b5a5f88b5614fac920c |
q8_0/mmproj-BF16.gguf |
BF16 mmproj | 175 MB | 8e198e532d9c7720232768947373e519d9bd59b7e011364340a90ed3f7049110 |
Q4_K_M is the evaluated release artifact (all published numbers). Q8_0 is provided for users who want headroom; it was not separately benchmarked. Machine-readable checksums: checksums.txt.
License & attribution
- License: Apache 2.0. Auden is a derivative of Gemma 4 12B by Google, released under the Apache License 2.0. See
NOTICE. "Gemma" identifies the base model; this project is not affiliated with or endorsed by Google. - CORD evaluation data: naver-clova-ix/cord-v2, CC-BY-4.0 — Park et al., "CORD: A Consolidated Receipt Dataset for Post-OCR Parsing" (Clova AI, NAVER). No CORD data is redistributed in this repository.
- Tooling: llama.cpp (MIT), Unsloth (Apache 2.0).
Citation
@misc{auden12b2026,
title = {Auden 12B: Receipt and Invoice Extraction as Structured Tool Calls},
author = {{NuroAI Labs}},
year = {2026},
url = {https://huggingface.co/nuroai/auden-12b},
publisher = {NuroAI Labs, https://nuroailabs.com}
}
- Downloads last month
- 201
4-bit
8-bit