MagTina350m — base

MagTina350m-base is the 354.6 M-parameter Brazilian-Portuguese foundation model trained from scratch by Dataseek under the ** Magestic.ai ** brand. This is the pretraining checkpoint — see dataseek/magtina350m-instruct for the instruction-tuned version.

Model summary

Parameters 354,591,744 (~354.6 M)
Architecture Llama2-mini (pre-norm RMSNorm + RoPE + SwiGLU + untied embeddings)
Hidden / intermediate / layers / heads 1024 / 3072 / 20 / 16
KV heads 16 (no GQA)
Vocab 40 000 (custom v3 BPE, 0 % UNK on out-of-domain text)
Context 2 048 tokens
Pretrain tokens 17.39 B (PT-BR only)
License CC-BY-NC 4.0

Note on logit_softcap. The original Mag350m model applied tanh(x/15)*15 to the output logits during training. To stay compatible with stock LlamaForCausalLM (and thus vLLM / TGI / transformers without trust_remote_code), this release drops the softcap. On 629 random positions the conversion produced 100 % top-1 token agreement with the original model in FP32. Effects on sampling-temperature behavior are negligible.

Training

Hardware 2 × NVIDIA H200 SXM (RunPod US-CA-2)
Wall clock 15.77 h
Throughput ~308 K tok/s
Cost US$ 126.47 / R$ 632.35 (FX 5.00)
Energy ~23 kWh, ~5.7 kg CO₂eq (California grid, 250 g/kWh)
Effective batch 524 288 tok/step
Optimizer AdamW, β=(0.9, 0.95), wd=0.1, grad-clip=1.0
LR schedule cosine, peak 3 × 10⁻⁴, min 3 × 10⁻⁵, warmup 1 000 steps
Precision bf16 + SDPA flash backend

Corpus mix (PT-BR only)

Source Tokens %
Web (cleaned commoncrawl-class) 56.5 %
Acadêmico (open-access papers, theses) 12.5 %
News (PT-BR newspapers, archived) 11.5 %
Wikipedia PT 9.2 %
Government / legal 7.7 %
Livros (public-domain books + literature) 2.7 %

No private corpora, no proprietary subscriptions. Per-source dedup → cross-source dedup → quality filter → 17.39 B unique tokens.

Evaluation

200-example sample of each benchmark (matched protocol vs Tucano reference):

Benchmark MagTina350m-base Tucano-160m Tucano-630m
BPB-news (lower is better) 0.981 0.905 0.819
Calame-PT (acc-NLL) 0.39 0.365 0.39
Lambada-PT (acc-NLL) 0.595 0.495 0.575
ARC-PT (acc) 0.235 0.275 0.295

Lambada-PT (long-context coherence) and Calame-PT (cloze) are at-or-above Tucano-630m despite 1.8 × fewer parameters and ~half the pretrain tokens — credit to the v3 BPE tokenizer (-8.1 % total fertility vs Tucano) and 2 048-token training context. ARC-PT and BPB still trail.

Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("dataseek/magtina350m-base")
model = AutoModelForCausalLM.from_pretrained(
    "dataseek/magtina350m-base", torch_dtype=torch.float16).to("cuda")

prompt = "O Brasil é um país"
ids = tok(prompt, return_tensors="pt").input_ids.to("cuda")
out = model.generate(ids, max_new_tokens=80, do_sample=True,
                     temperature=0.8, top_p=0.9, repetition_penalty=1.1)
print(tok.decode(out[0], skip_special_tokens=True))

This is a completion model — no chat template, no special tokens needed at inference. For chat / assistant use, switch to dataseek/magtina350m-instruct.

Intended use & limitations

Intended use. Research, derivative fine-tunes, PT-BR language-modeling baselines.

Out of scope. Production deployment without further alignment, non-Portuguese tasks, factual question-answering requiring up-to-date or specialised knowledge.

Limitations.

  • 354 M params is small — expect frequent factual errors, weak multi-step reasoning, and brittle code/math.
  • PT-BR only — minimal exposure to English (~1 % of pretrain), zero exposure to other languages.
  • Knowledge cutoff: early 2026.
  • Public-data only; biases of CommonCrawl, Wikipedia PT, and PT-BR news media are present and unaudited.

Citation

@misc{magtina350m2026,
  author = {Frasson, Ricardo and {Dataseek Team}},
  title  = {MagTina350m: A 354 M-parameter Brazilian Portuguese language model},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/dataseek/magtina350m-base}
}

License

CC-BY-NC 4.0 — free for research and non-commercial derivative work; commercial use requires written permission from Dataseek (contact via dataseek.com.br).

Downloads last month
39
Safetensors
Model size
0.4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support