Instructions to use mlx-community/nesso-0.4B-agentic-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlx-community/nesso-0.4B-agentic-mlx with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/nesso-0.4B-agentic-mlx")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use mlx-community/nesso-0.4B-agentic-mlx with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/nesso-0.4B-agentic-mlx"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "mlx-community/nesso-0.4B-agentic-mlx"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use mlx-community/nesso-0.4B-agentic-mlx with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/nesso-0.4B-agentic-mlx"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default mlx-community/nesso-0.4B-agentic-mlx

Run Hermes

hermes

MLX LM

How to use mlx-community/nesso-0.4B-agentic-mlx with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "mlx-community/nesso-0.4B-agentic-mlx"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "mlx-community/nesso-0.4B-agentic-mlx"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "mlx-community/nesso-0.4B-agentic-mlx",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Nesso-0.4B-Agentic-MLX

Nesso-0.4B-Agentic-MLX is the Apple Silicon-optimized version of Nesso-0.4B-Agentic. It has been converted to the MLX format for high-performance inference on Mac M-series chips.

It is a bilingual English/Italian Small Language Model (SLM) optimized for function calling, structured output generation, and agentic execution patterns. It is post-trained on top of Zagreus-0.4B-ita, a foundational model trained from scratch by the mii-llm community (Made in Italy – Large Language Model) on the Seeweb HPC infrastructure.

Designed for sovereign edge inference, Nesso-0.4B-Agentic targets deployment scenarios that require reliable tool use, structured JSON output, and multi-step agentic reasoning — all within a compact ~400M parameter footprint.

⚠️ This model is currently at the SFT (Supervised Fine-Tuning) stage. DPO (Direct Preference Optimization) training is planned and updated results will be published upon completion.

Model Details

Property	Value
Architecture	Modified Llama-3.2 (fully dense)
Parameters	~400M
Hidden size	960
Layers	32
Attention heads	15 (KV heads: 5)
Context length	4096 tokens
Tokenizer	Llama-3.2 (`vocab_size`: 128,256)
Format	MLX
Languages	English, Italian
Base model	mii-llm/nesso-0.4B-agentic
Post-training framework	Axolotl + FSDP
Chat template	ChatML

Chat Template

This model uses the ChatML format:


<|im_start|>system
You are a helpful assistant with access to tools.<|im_end|>
<|im_start|>user
What is the weather in Rome today?<|im_end|>
<|im_start|>assistant

Special tokens:

pad_token: <|im_end|>
eos_token: <|im_end|>

Usage

Installation

pip install mlx-lm

Inference via Python

from mlx_lm import load, generate

model_id = "mlx-community/nesso-0.4B-agentic-mlx" 

model, tokenizer = load(model_id)

system_prompt = (
    "Sei un assistente che può usare strumenti.\n"
    "Quando servono informazioni esterne, chiama una funzione.\n"
    "Usa ESATTAMENTE il formato <tool_call> previsto."
)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Che tempo fa a Milano?"}
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

response = generate(model, tokenizer, prompt=prompt, verbose=True, temp=0.3, max_tokens=256)
print(response)

Inference via Terminal

python -m mlx_lm.generate --model mlx-community/nesso-0.4B-agentic-mlx \
  --prompt "<|im_start|>system\nSei un assistente che può usare strumenti.<|im_end|>\n<|im_start|>user\nChe tempo fa a Milano?<|im_end|>\n<|im_start|>assistant\n" \
  --temp 0.3 --max-tokens 256

💡 Tip: For function calling and structured output tasks, we recommend using a lower temperature (0.1–0.3) to improve JSON validity and output consistency.

Training Details

Base Model Pre-training

Nesso-0.4B-Agentic is built on Zagreus-0.4B-ita, which was pre-trained on approximately 1 trillion tokens using the following data mix:

Dataset	Description
FineWeb (350BT sample)	~350B tokens of English web text
FineWeb-2 (ita_Latn)	Italian web text
FinePDFs (ita_Latn)	Italian PDF documents
StarCoder Data	~250B tokens of code

Token distribution: ~400B English + ~400B Italian + ~200B Code

Infrastructure: 64× NVIDIA A100 GPUs (8 nodes × 8 GPUs) on Seeweb HPC

Framework: Nanotron (mii-llm fork)

Post-training (SFT)

Post-training was performed using Axolotl with FSDP across 4 nodes (32× A100 GPUs).

The instruction dataset is a proprietary bilingual (English/Italian) corpus curated by the mii-llm team, with dedicated focus on function calling, structured JSON output, tool orchestration, and agentic execution patterns. This dataset was built through years of iteration across domains including finance, cybersecurity, and multi-step agentic workflows, and is considered a strategic research asset not released as open source.

Key hyperparameters:

Hyperparameter	Value
Optimizer	AdamW (fused)
Learning rate	`1e-3`
LR scheduler	Cosine (constant ratio: 0.8, min ratio: 0.3)
Epochs	3
Micro batch size	1
Gradient accumulation steps	8
Sequence length	4096
Max grad norm	1.0
Precision	BF16 + Flash Attention
FSDP strategy	FULL_SHARD

Evaluation

We used our fork of lm-evaluation-harness for multilingual

Evaluation Commands

# Italian benchmarks
lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
  --tasks m_mmlu_it --num_fewshot 5 --device cuda:0 --batch_size 1

lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
  --tasks hellaswag_it,arc_it --device cuda:0 --batch_size 1

lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
  --tasks ifeval-ita --device cuda:0 --batch_size 1

# English benchmarks
lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
  --tasks mmlu --num_fewshot 5 --device cuda:0 --batch_size 1

lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
  --tasks hellaswag,arc --device cuda:0 --batch_size 1

lm-eval --model hf --model_args pretrained=mii-llm/nesso-0.4B-agentic \
  --tasks ifeval --device cuda:0 --batch_size 1

Results

English Benchmarks

Model	IFEval EN ↑	ARC EN ↑	HellaSwag EN ↑	MMLU EN ↑	Avg EN
Qwen/Qwen3-0.6B	0.2758	0.3430	0.4742	0.4013	0.3736
Nesso-0.4B-instruct	0.3465	0.3003	0.4629	0.2871	0.3492
Nesso-0.4B-agentic	0.2962	0.2534	0.4062	0.2889	0.3112
LiquidAI/LFM2-350M	0.1595	0.2457	0.3092	0.3445	0.2647

Italian Benchmarks

Model	IFEval IT ↑	ARC IT ↑	HellaSwag IT ↑	MMLU IT ↑	Avg IT
Qwen/Qwen3-0.6B	0.3058	0.2729	0.3598	0.4025	0.3353
Nesso-0.4B-instruct	0.2962	0.2874	0.4076	0.2875	0.3197
Nesso-0.4B-agentic	0.2914	0.2541	0.3673	0.2730	0.2965
LiquidAI/LFM2-350M	0.1427	0.2464	0.2994	0.3132	0.2504

Overall

Model	Avg EN	Avg IT	Overall
Qwen/Qwen3-0.6B	0.3736	0.3353	0.3545
Nesso-0.4B-instruct	0.3492	0.3197	0.3345
Nesso-0.4B-agentic	0.3112	0.2965	0.3039
LiquidAI/LFM2-350M	0.2647	0.2504	0.2576

Discussion

Nesso-0.4B-Agentic is trained with a specialization trade-off: its post-training data prioritizes structured output fidelity, tool calling accuracy, and agentic planning over general benchmark performance. As a result, scores on standard academic benchmarks (IFEval, MMLU, ARC) are lower than the instruct variant, which is expected behavior for a task-specialized model.

Nesso-0.4B-Agentic still outperforms LiquidAI/LFM2-350M across all benchmarks in both languages, confirming its quality as a competitive small model. Its real-world advantage over general-purpose models of similar size is best assessed on agentic and function-calling tasks rather than academic benchmarks.

Related Models

Model	Description
Zagreus-0.4B-ita	Base pre-trained model (this model's foundation)
Nesso-0.4B-instruct	Optimized for conversational and instruction-following tasks
Open-Zagreus-0.4B	Fully open-source SFT variant

Citation

If you use this model in your research, please cite:

@misc{nesso2025,
  title        = {The Joy and Pain of Training an LLM from Scratch:
                  A Technical Report on the Zagreus and Nesso Model Families},
  author       = {mii-llm community},
  year         = {2025},
  howpublished = {\url{[https://github.com/mii-llm/zagreus-nesso-slm](https://github.com/mii-llm/zagreus-nesso-slm)}},
}

Acknowledgements

Antonio Baldassarra (CEO, Seeweb) and Marco Cristofanilli (Head of AI, Seeweb) for infrastructure sponsorship
The Hugging Face team for Nanotron, datatrove, FineWeb, and FineWeb-2
The mii-llm open-source community

License

Released under the Apache 2.0 license.

Made with ❤️ in Italy by mii-llm

Downloads last month: 50

Safetensors

Model size

0.4B params

Tensor type

F32

MLX

Hardware compatibility

Quantized

Model tree for mlx-community/nesso-0.4B-agentic-mlx

Base model

mii-llm/zagreus-0.4B-ita

Finetuned

mii-llm/nesso-0.4B-agentic

Finetuned

(1)

this model