Instructions to use dataseek/magtina350m-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dataseek/magtina350m-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dataseek/magtina350m-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dataseek/magtina350m-instruct")
model = AutoModelForCausalLM.from_pretrained("dataseek/magtina350m-instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dataseek/magtina350m-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dataseek/magtina350m-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataseek/magtina350m-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dataseek/magtina350m-instruct

SGLang

How to use dataseek/magtina350m-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dataseek/magtina350m-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataseek/magtina350m-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dataseek/magtina350m-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataseek/magtina350m-instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dataseek/magtina350m-instruct with Docker Model Runner:
```
docker model run hf.co/dataseek/magtina350m-instruct
```

MagTina350m — instruct (SFT + DPO + tool-use)

MagTina350m-instruct is the instruction-tuned, tool-using, DPO-aligned version of dataseek/magtina350m-base — a 354.6 M-parameter Brazilian-Portuguese assistant trained from scratch by Dataseek under the Magestic.ai brand.

What's in this checkpoint

Training pipeline applied to the base model, in order:

SFT v4 — ~13 K synthetic Q&A pairs (Qwen3-distilled) + 505 hand-authored identity pairs
DPO — 1 312 preference pairs (Qwen3-judged) for politeness / refusal calibration
Tool-use SFT — 3 010 calc + now tool-call examples + 200 hardened NO_TOOL refusals

Final validation loss 0.79 ; identity probe 17/17 verbatim correct ; tool-use probe calc 7/10, now 5/5 ; NO_TOOL refusal 5/5.


Parameters	354,591,744
Architecture	Llama2-mini → re-exported as `LlamaForCausalLM`
Context	2 048 tokens
Tools	`calc` (safe-eval arithmetic), `now` (date/time with PT-BR fields)
License	CC-BY-NC 4.0

Chat template

{% for message in messages -%}
{% if loop.first %}<|chat_bos|>{% endif -%}
{% if message['role'] == 'system' %}<|system|>{{ message['content'] | trim }}{% endif -%}
{% if message['role'] == 'user' %}<|user|>{{ message['content'] | trim }}{% endif -%}
{% if message['role'] == 'assistant' %}<|assistant|>{{ message['content'] | trim }}<|chat_eos|>{% endif -%}
{% endfor -%}
{% if add_generation_prompt %}<|assistant|>{% endif %}

The template is shipped in tokenizer_config.json and used automatically by tokenizer.apply_chat_template(...). Trained as single-turn — multi-turn works in practice but was not in the SFT distribution.

Use — pure chat (no tools)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("dataseek/magtina350m-instruct")
model = AutoModelForCausalLM.from_pretrained(
    "dataseek/magtina350m-instruct", torch_dtype=torch.float16).to("cuda")

prompt = tok.apply_chat_template(
    [{"role": "user", "content": "Quem é você?"}],
    add_generation_prompt=True, return_tensors="pt").to("cuda")
out = model.generate(prompt, max_new_tokens=128, do_sample=True,
                     temperature=0.3, top_p=0.9, repetition_penalty=1.05,
                     eos_token_id=5)  # <|chat_eos|>
print(tok.decode(out[0, prompt.shape[1]:], skip_special_tokens=True))

Use — with tools

generate() won't intercept tool calls on its own; use the runtime shipped in inference_with_tools.py:

from inference_with_tools import generate_with_tools
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("dataseek/magtina350m-instruct")
model = AutoModelForCausalLM.from_pretrained(
    "dataseek/magtina350m-instruct", torch_dtype=torch.float16).to("cuda")

print(generate_with_tools(model, tok, "Quanto é 25 + 37?",
                          temperature=0.3, repetition_penalty=1.05))
# → "25 + 37 = 62."

Tool protocol

The model emits reserved tokens; the runtime intercepts them and injects the result back:

Token	ID	Role
`<\|tool_call\|>`	9	Model emits this, then a JSON object `{"tool": "...", ...}`
`<\|tool_result\|>`	10	Runtime emits this, then a JSON object with the result

Tool schemas:

{"tool": "calc", "expr": "25 + 37"}                          → {"result": 62}
{"tool": "now",  "timezone": "America/Sao_Paulo"}            → {"datetime": "...", "data": "11/05/2026",
                                                                "hora": "14:32", "weekday": "domingo", ...}

calc runs under an AST-whitelist sandbox (operators + the sqrt|abs|round|min|max|sum|log|log2|log10|exp|sin|cos|tan|ceil|floor math functions, constants pi/e). Attribute access, names, comprehensions, function defs, kwargs and any unlisted callable all raise.

Recommended decoding

Goal	temperature	top_p	repetition_penalty
Factual / identity / tool-use	0.3	0.9	1.05
Creative writing	0.7–0.9	0.9	1.10

EOS is <|chat_eos|> (id 5). PAD is <pad> (id 3).

Evaluation

Benchmark	Score	Notes
Identity probe (17 hand-authored Qs)	17 / 17	verbatim recall of seeded facts
Tool-use — `calc` probe	7 / 10	failures on multi-step word problems
Tool-use — `now` probe	5 / 5	weekday + Portuguese date
NO_TOOL refusal probe	5 / 5	refuses tool-call when no tool needed
ENEM-PT (5-option MCQ, 200 q)	25 %	+5 pp over chance; +5 pp vs base
BPB-news vs base	+0.022	well under +0.05 ship gate (no big regression)
Final SFT-stage val loss	0.79	down from 1.59 (SFT v2) → 0.97 (v4) → 0.79 (v5)

Identity facts baked in

The model knows the following about itself and will recall them verbatim:

Created by Dataseek (dataseek.com.br) under the Magestic.ai brand
Lead: Ricardo Frasson
354.6 M parameters, Llama2-mini, 20 L × 16 H × 1024 d, vocab 40 K, context 2 048
Pretrained on 17.39 B PT-BR tokens over 15.77 h on 2 × H200 SXM
Cost US$ 126.47 / R$ 632.35 ; energy ~23 kWh, ~5.7 kg CO₂eq

Intended use & limitations

Intended use. Brazilian-Portuguese chat assistant for research, demos, and derivative work. Useful as a small-footprint reasoner with tool-augmented arithmetic and date queries.

Limitations

General world knowledge is fragile. The model hallucinates on facts not in its identity corpus.
Math without a tool is unreliable; math with the calc tool is correct.
Brazilian geography knowledge ~50 % reliable; specifics often invented.
Monolingual — PT-BR only. English/Spanish responses degrade fast.
Single-turn trained — long multi-turn conversations may lose persona coherence.
No safety RLHF beyond DPO — adversarial prompts may elicit unhelpful or biased output. Don't deploy to end-users without an additional safety filter.

Files in this repo

File	Purpose
`config.json`	`LlamaConfig`
`model.safetensors`	FP16 weights (~709 MB)
`generation_config.json`	Default decoding (temp 0.3, top_p 0.9, rep_pen 1.05)
`tokenizer.json`	v3 BPE, 40 K vocab, 28 special tokens
`tokenizer_config.json`	chat template, special-token map
`special_tokens_map.json`	BOS/EOS/PAD/UNK + all 28 added tokens
`inference_with_tools.py`	Tool-calling generation loop (calc + now)
`_conversion_report.json`	Top-1 agreement diff vs original `Mag350m`

Citation

@misc{magtina350m2026,
  author = {Frasson, Ricardo and {Dataseek Team}},
  title  = {MagTina350m: A 354 M-parameter Brazilian Portuguese instruction-tuned language model},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/dataseek/magtina350m-instruct}
}

License

CC-BY-NC 4.0 — free for research and non-commercial derivative work; commercial use requires written permission from Dataseek (contact via dataseek.com.br).

Downloads last month: 42

Safetensors

Model size

0.4B params

Tensor type

F16