Instructions to use nuroai/auden-12b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nuroai/auden-12b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="nuroai/auden-12b",
	filename="q4_k_m/auden-12b-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use nuroai/auden-12b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf nuroai/auden-12b:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf nuroai/auden-12b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf nuroai/auden-12b:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf nuroai/auden-12b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf nuroai/auden-12b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf nuroai/auden-12b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf nuroai/auden-12b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf nuroai/auden-12b:Q4_K_M

Use Docker

docker model run hf.co/nuroai/auden-12b:Q4_K_M

LM Studio
Jan

vLLM

How to use nuroai/auden-12b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nuroai/auden-12b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nuroai/auden-12b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/nuroai/auden-12b:Q4_K_M

Ollama
How to use nuroai/auden-12b with Ollama:
```
ollama run hf.co/nuroai/auden-12b:Q4_K_M
```

Unsloth Studio

How to use nuroai/auden-12b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nuroai/auden-12b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nuroai/auden-12b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nuroai/auden-12b to start chatting

How to use nuroai/auden-12b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf nuroai/auden-12b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "nuroai/auden-12b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use nuroai/auden-12b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf nuroai/auden-12b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default nuroai/auden-12b:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use nuroai/auden-12b with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf nuroai/auden-12b:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "nuroai/auden-12b:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use nuroai/auden-12b with Docker Model Runner:
```
docker model run hf.co/nuroai/auden-12b:Q4_K_M
```

Lemonade

How to use nuroai/auden-12b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull nuroai/auden-12b:Q4_K_M

Run and chat with the model

lemonade run user.auden-12b-Q4_K_M

List all available models

lemonade list

Auden 12B

Auden turns receipts and invoices into structured tool calls. It is a LoRA fine-tune of Gemma 4 12B (instruction-tuned) that emits OpenAI-style tool_calls — extract_fields with the requested values, or ask_follow_up when the document genuinely doesn't contain what you asked for — instead of free text you have to parse. It ships as GGUF for llama.cpp and runs fully offline on a single consumer GPU or Apple Silicon Mac. Built by NuroAI Labs.

On real receipts (CORD test, OCR text in), Auden v1 reaches 98.95% field-level extraction accuracy and 99% end-to-end success, versus 11.6% / 14% for stock Gemma 4 12B under the identical harness — with hallucinated fields cut from 245 (v0) to 17 and 100% abstention accuracy when a requested field is absent.

Model details
Intended uses & limitations
How to use
Evaluation
Training
Files & checksums
License & attribution

Model details


Base model	`google/gemma-4-12B-it` (12B, instruction-tuned, Apache 2.0)
Fine-tuning	LoRA SFT via Unsloth — r=16, alpha=32, attention + MLP projections (text and vision towers)
Task	Document field extraction as native Gemma 4 function calls
Input	OCR text of a receipt/invoice (see Limitations re: image input)
Output	Structured `tool_calls` (`finish_reason: tool_calls`)
Formats	GGUF Q4_K_M (7.4 GB) and Q8_0 (12.7 GB), each with a BF16 vision projector (`mmproj`)
Runtime	llama.cpp `llama-server` with `--jinja` (chat template + tool grammar embedded in the GGUF)
Languages	Trained/evaluated on English, Indonesian, Malay, Turkish receipt locales
Author	NuroAI Labs

Tool interface

Auden is trained against a 15-tool document-agent schema rendered through Gemma 4's native <|tool> declarations. The evaluated core:

extract_fields(schema, doc_ref?) — return requested fields from the active document
ask_follow_up(question, missing_fields?) — abstain and ask when a requested field is not in the document
no_action_needed(reason, confidence?) — decline when no tool action is warranted

The wider schema also covers query_document, classify_doc, extract_table, summarize, redact, compare_docs, convert, fill_form, search_folder, create_event_from_doc, set_reminder, and draft_email. Published benchmarks exercise extract_fields and ask_follow_up.

Intended uses & limitations

Intended uses

Local/offline receipt and invoice field extraction (totals, dates, merchants, currencies) feeding bookkeeping, expense, or archival pipelines.
Agent stacks that need machine-parseable extraction with an explicit abstention path instead of best-guess free text.

Limitations

Feed OCR text, not images. Image-in extraction through the quantized llama.cpp multimodal path scores poorly for this model family (the unquantized bf16 base reads the same receipts correctly, so this is a quantized-runtime issue, not a model-capability ceiling). Run your own OCR step and pass the text.
Evaluation covers Indonesian-market CORD receipts and synthetic receipts/invoices in four locales. Long multi-page invoices, handwriting, and very different document styles are uncharacterized.
Published numbers assume the Auden system prompt and tool schemas (below). The model remains a general Gemma under other prompts, but the benchmarks don't transfer.
Extraction confidence is not calibrated; the abstention behavior covers missing fields, not uncertain reads. Keep a human in the loop where errors are costly.
Inherits the biases and failure modes of Gemma 4 12B.

How to use

Serve (llama.cpp)

llama-server -m auden-12b-Q4_K_M.gguf --mmproj mmproj-BF16.gguf \
  --jinja -ngl 999 --host 127.0.0.1 --port 8089

--jinja is required — it activates the embedded chat template, which renders tool declarations into Gemma 4's native function-calling format.

Call (OpenAI-compatible)

import requests

TOOLS = [{
    "type": "function",
    "function": {
        "name": "extract_fields",
        "description": "Extract the requested fields from the active document.",
        "parameters": {
            "type": "object",
            "properties": {
                "fields": {"type": "object", "description": "Field name to extracted value."},
                "schema": {"type": "object", "description": "Requested field schema."},
            },
        },
    },
}, {
    "type": "function",
    "function": {
        "name": "ask_follow_up",
        "description": "Ask the user a question when required information is missing.",
        "parameters": {
            "type": "object",
            "properties": {
                "question": {"type": "string"},
                "missing_fields": {"type": "array", "items": {"type": "string"}},
            },
            "required": ["question"],
        },
    },
}]

payload = {
    "model": "auden",
    "messages": [
        {"role": "system", "content": "You are Auden, a precise local document agent. "
                                       "Use tools only when the document evidence supports them."},
        {"role": "user", "content": "OCR text from the active document:\n"
                                    "RECEIPT\nCorner Cafe\nDate: 2025-03-02\n"
                                    "Coffee 2 x 3.50 7.00\nTotal: 7.00\n\nExtract total."},
    ],
    "tools": TOOLS,
    "temperature": 0,
}
r = requests.post("http://127.0.0.1:8089/v1/chat/completions", json=payload).json()
print(r["choices"][0]["message"]["tool_calls"])
# -> [{"function": {"name": "extract_fields", "arguments": "{\"fields\":{\"total\":\"7.00\"}, ...}"}}]

If the document lacks the requested field, the model calls ask_follow_up with missing_fields instead of inventing a value.

Evaluation

All three models below were run through the identical harness: same prompts, same tool schemas, same normalized scorer (auden-docbench-v1-normalized), same llama.cpp build (commit 76da2450), all at Q4_K_M, OCR-text input, greedy decoding. Full commands in repro_commands.md.

CORD 100 — 100 real receipts from the CORD-v2 test split (held out from training; overlap-checked).
Synthetic 500 — 500 held-out synthetic receipts/invoices across en/id/ms/tr locales.
Abstention 120 — 120 documents where the requested field is absent; correct behavior is ask_follow_up.

Eval	Metric	Stock Gemma 4 12B	Auden v0	Auden v1 (this release)
CORD 100 (real receipts)	Tool-call accuracy	0.5900	1.0000	1.0000
CORD 100 (real receipts)	Field/argument match	0.1158	0.8526	0.9895
CORD 100 (real receipts)	End-to-end success	0.1400	0.8600	0.9900
Synthetic 500	Tool-call accuracy	0.9140	1.0000	1.0000
Synthetic 500	Field/argument match	0.0900	0.9588	0.9638
Synthetic 500	End-to-end success	0.0080	0.8700	0.8940
Abstention 120	Abstention accuracy	0.9833	0.2833	1.0000

Field-level failure events on CORD 100 (same analyzer):

Category	Stock Gemma	Auden v0	Auden v1
Hallucinated field	85	245	17
Missing field	84	0	1
Wrong value	0	14	0
Formatting mismatch	1	8	2

Reading the baseline honestly. Stock Gemma fails to emit any valid extract_fields call on 41/100 real receipts; when it does call, arguments are usually schema-echoes or partial. Its low hallucination count reflects failing to extract at all (84 missing-field events), not precision. Its abstention score is genuinely high (0.9833) because the eval prompt explicitly instructs ask-if-missing behavior and the instruction-tuned base follows instructions — the fine-tune's value is concentrated where it matters: extraction accuracy and hallucination control. v0's abstention regression (0.2833) was fixed in v1 (1.0000).

Training

Data: 13,558 tool-call traces — synthetic receipts/invoices (procedural generator with missing-total, occlusion, scan-noise, multi-currency, and wrong-document negatives; en/id/ms/tr locales), ~8% abstention traces, ~11% image-grounded traces, ~11% real-receipt traces. Zero overlap with all three eval sets (checked by document ID).
Procedure: LoRA SFT with Unsloth on Gemma 4 12B-it — r=16, alpha=32, dropout 0, targets q/k/v/o + gate/up/down projections in both text and vision towers. 2,500 steps, lr 2e-4, effective batch 4 (1 × 4 grad-accum), seed 3407, ~0.74 epochs, 91 minutes on 1× H100. Final train loss 0.393.
Validation gates: every trace passed an argument-match validator (pass rate 1.0); training format locked to the official processor's apply_chat_template/parse_response round-trip.
Export: LoRA merged into the bf16 base, converted to GGUF via llama.cpp (convert_hf_to_gguf), quantized to Q4_K_M and Q8_0.

Files & checksums

File	Quant	Size	SHA256
`q4_k_m/auden-12b-Q4_K_M.gguf`	Q4_K_M	7.4 GB	`8522b4f696abc5721342f9bad6caeab73659aaf822d4a69931affc222c6fca6f`
`q4_k_m/mmproj-BF16.gguf`	BF16 mmproj	175 MB	`df7d0454dfb7a31d02a3845478f655b991c16f5d9b76cd308a96bc2d07c17f7e`
`q8_0/auden-12b-Q8_0.gguf`	Q8_0	12.7 GB	`156295ec4ebe3eee54a757b194a2bf565d88340c5e9f2b5a5f88b5614fac920c`
`q8_0/mmproj-BF16.gguf`	BF16 mmproj	175 MB	`8e198e532d9c7720232768947373e519d9bd59b7e011364340a90ed3f7049110`

Q4_K_M is the evaluated release artifact (all published numbers). Q8_0 is provided for users who want headroom; it was not separately benchmarked. Machine-readable checksums: checksums.txt.

License & attribution

License: Apache 2.0. Auden is a derivative of Gemma 4 12B by Google, released under the Apache License 2.0. See NOTICE. "Gemma" identifies the base model; this project is not affiliated with or endorsed by Google.
CORD evaluation data: naver-clova-ix/cord-v2, CC-BY-4.0 — Park et al., "CORD: A Consolidated Receipt Dataset for Post-OCR Parsing" (Clova AI, NAVER). No CORD data is redistributed in this repository.
Tooling: llama.cpp (MIT), Unsloth (Apache 2.0).

Citation

@misc{auden12b2026,
  title     = {Auden 12B: Receipt and Invoice Extraction as Structured Tool Calls},
  author    = {{NuroAI Labs}},
  year      = {2026},
  url       = {https://huggingface.co/nuroai/auden-12b},
  publisher = {NuroAI Labs, https://nuroailabs.com}
}

Downloads last month: 201

GGUF

Model size

12B params

Architecture

gemma4

Hardware compatibility

4-bit

8-bit

Model tree for nuroai/auden-12b

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Finetuned

(100)

this model

nuroai
/

auden-12b

Auden 12B

Table of contents

Model details

Tool interface

Intended uses & limitations

How to use

Serve (llama.cpp)

Call (OpenAI-compatible)

Evaluation

Training

Files & checksums

License & attribution

Citation

Model tree for nuroai/auden-12b

Dataset used to train nuroai/auden-12b