MagTina350m — base
MagTina350m-base is the 354.6 M-parameter Brazilian-Portuguese foundation model trained
from scratch by Dataseek under the ** Magestic.ai ** brand.
This is the pretraining checkpoint — see
dataseek/magtina350m-instruct
for the instruction-tuned version.
Model summary
| Parameters | 354,591,744 (~354.6 M) |
| Architecture | Llama2-mini (pre-norm RMSNorm + RoPE + SwiGLU + untied embeddings) |
| Hidden / intermediate / layers / heads | 1024 / 3072 / 20 / 16 |
| KV heads | 16 (no GQA) |
| Vocab | 40 000 (custom v3 BPE, 0 % UNK on out-of-domain text) |
| Context | 2 048 tokens |
| Pretrain tokens | 17.39 B (PT-BR only) |
| License | CC-BY-NC 4.0 |
Note on
logit_softcap. The originalMag350mmodel appliedtanh(x/15)*15to the output logits during training. To stay compatible with stockLlamaForCausalLM(and thus vLLM / TGI / transformers withouttrust_remote_code), this release drops the softcap. On 629 random positions the conversion produced 100 % top-1 token agreement with the original model in FP32. Effects on sampling-temperature behavior are negligible.
Training
| Hardware | 2 × NVIDIA H200 SXM (RunPod US-CA-2) |
| Wall clock | 15.77 h |
| Throughput | ~308 K tok/s |
| Cost | US$ 126.47 / R$ 632.35 (FX 5.00) |
| Energy | ~23 kWh, ~5.7 kg CO₂eq (California grid, 250 g/kWh) |
| Effective batch | 524 288 tok/step |
| Optimizer | AdamW, β=(0.9, 0.95), wd=0.1, grad-clip=1.0 |
| LR schedule | cosine, peak 3 × 10⁻⁴, min 3 × 10⁻⁵, warmup 1 000 steps |
| Precision | bf16 + SDPA flash backend |
Corpus mix (PT-BR only)
| Source | Tokens % |
|---|---|
| Web (cleaned commoncrawl-class) | 56.5 % |
| Acadêmico (open-access papers, theses) | 12.5 % |
| News (PT-BR newspapers, archived) | 11.5 % |
| Wikipedia PT | 9.2 % |
| Government / legal | 7.7 % |
| Livros (public-domain books + literature) | 2.7 % |
No private corpora, no proprietary subscriptions. Per-source dedup → cross-source dedup → quality filter → 17.39 B unique tokens.
Evaluation
200-example sample of each benchmark (matched protocol vs Tucano reference):
| Benchmark | MagTina350m-base | Tucano-160m | Tucano-630m |
|---|---|---|---|
| BPB-news (lower is better) | 0.981 | 0.905 | 0.819 |
| Calame-PT (acc-NLL) | 0.39 | 0.365 | 0.39 |
| Lambada-PT (acc-NLL) | 0.595 | 0.495 | 0.575 |
| ARC-PT (acc) | 0.235 | 0.275 | 0.295 |
Lambada-PT (long-context coherence) and Calame-PT (cloze) are at-or-above Tucano-630m despite
1.8 × fewer parameters and ~half the pretrain tokens — credit to the v3 BPE tokenizer (-8.1 %
total fertility vs Tucano) and 2 048-token training context. ARC-PT and BPB still trail.
Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("dataseek/magtina350m-base")
model = AutoModelForCausalLM.from_pretrained(
"dataseek/magtina350m-base", torch_dtype=torch.float16).to("cuda")
prompt = "O Brasil é um país"
ids = tok(prompt, return_tensors="pt").input_ids.to("cuda")
out = model.generate(ids, max_new_tokens=80, do_sample=True,
temperature=0.8, top_p=0.9, repetition_penalty=1.1)
print(tok.decode(out[0], skip_special_tokens=True))
This is a completion model — no chat template, no special tokens needed at inference.
For chat / assistant use, switch to dataseek/magtina350m-instruct.
Intended use & limitations
Intended use. Research, derivative fine-tunes, PT-BR language-modeling baselines.
Out of scope. Production deployment without further alignment, non-Portuguese tasks, factual question-answering requiring up-to-date or specialised knowledge.
Limitations.
- 354 M params is small — expect frequent factual errors, weak multi-step reasoning, and brittle code/math.
- PT-BR only — minimal exposure to English (~1 % of pretrain), zero exposure to other languages.
- Knowledge cutoff: early 2026.
- Public-data only; biases of CommonCrawl, Wikipedia PT, and PT-BR news media are present and unaudited.
Citation
@misc{magtina350m2026,
author = {Frasson, Ricardo and {Dataseek Team}},
title = {MagTina350m: A 354 M-parameter Brazilian Portuguese language model},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/dataseek/magtina350m-base}
}
License
CC-BY-NC 4.0 — free for research and non-commercial derivative work; commercial use requires written permission from Dataseek (contact via dataseek.com.br).
- Downloads last month
- 39