Instructions to use dataseek/magtina350m-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dataseek/magtina350m-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dataseek/magtina350m-base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("dataseek/magtina350m-base")
model = AutoModelForCausalLM.from_pretrained("dataseek/magtina350m-base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use dataseek/magtina350m-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dataseek/magtina350m-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataseek/magtina350m-base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dataseek/magtina350m-base

SGLang

How to use dataseek/magtina350m-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dataseek/magtina350m-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataseek/magtina350m-base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dataseek/magtina350m-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dataseek/magtina350m-base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dataseek/magtina350m-base with Docker Model Runner:
```
docker model run hf.co/dataseek/magtina350m-base
```

MagTina350m — base

MagTina350m-base is the 354.6 M-parameter Brazilian-Portuguese foundation model trained from scratch by Dataseek under the ** Magestic.ai ** brand. This is the pretraining checkpoint — see dataseek/magtina350m-instruct for the instruction-tuned version.

Model summary


Parameters	354,591,744 (~354.6 M)
Architecture	Llama2-mini (pre-norm RMSNorm + RoPE + SwiGLU + untied embeddings)
Hidden / intermediate / layers / heads	1024 / 3072 / 20 / 16
KV heads	16 (no GQA)
Vocab	40 000 (custom v3 BPE, 0 % UNK on out-of-domain text)
Context	2 048 tokens
Pretrain tokens	17.39 B (PT-BR only)
License	CC-BY-NC 4.0

Note on logit_softcap. The original Mag350m model applied tanh(x/15)*15 to the output logits during training. To stay compatible with stock LlamaForCausalLM (and thus vLLM / TGI / transformers without trust_remote_code), this release drops the softcap. On 629 random positions the conversion produced 100 % top-1 token agreement with the original model in FP32. Effects on sampling-temperature behavior are negligible.

Training


Hardware	2 × NVIDIA H200 SXM (RunPod US-CA-2)
Wall clock	15.77 h
Throughput	~308 K tok/s
Cost	US$ 126.47 / R$ 632.35 (FX 5.00)
Energy	~23 kWh, ~5.7 kg CO₂eq (California grid, 250 g/kWh)
Effective batch	524 288 tok/step
Optimizer	AdamW, β=(0.9, 0.95), wd=0.1, grad-clip=1.0
LR schedule	cosine, peak 3 × 10⁻⁴, min 3 × 10⁻⁵, warmup 1 000 steps
Precision	bf16 + SDPA flash backend

Corpus mix (PT-BR only)

Source	Tokens %
Web (cleaned commoncrawl-class)	56.5 %
Acadêmico (open-access papers, theses)	12.5 %
News (PT-BR newspapers, archived)	11.5 %
Wikipedia PT	9.2 %
Government / legal	7.7 %
Livros (public-domain books + literature)	2.7 %

No private corpora, no proprietary subscriptions. Per-source dedup → cross-source dedup → quality filter → 17.39 B unique tokens.

Evaluation

200-example sample of each benchmark (matched protocol vs Tucano reference):

Benchmark	MagTina350m-base	Tucano-160m	Tucano-630m
BPB-news (lower is better)	0.981	0.905	0.819
Calame-PT (acc-NLL)	0.39	0.365	0.39
Lambada-PT (acc-NLL)	0.595	0.495	0.575
ARC-PT (acc)	0.235	0.275	0.295

Lambada-PT (long-context coherence) and Calame-PT (cloze) are at-or-above Tucano-630m despite 1.8 × fewer parameters and ~half the pretrain tokens — credit to the v3 BPE tokenizer (-8.1 % total fertility vs Tucano) and 2 048-token training context. ARC-PT and BPB still trail.

Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("dataseek/magtina350m-base")
model = AutoModelForCausalLM.from_pretrained(
    "dataseek/magtina350m-base", torch_dtype=torch.float16).to("cuda")

prompt = "O Brasil é um país"
ids = tok(prompt, return_tensors="pt").input_ids.to("cuda")
out = model.generate(ids, max_new_tokens=80, do_sample=True,
                     temperature=0.8, top_p=0.9, repetition_penalty=1.1)
print(tok.decode(out[0], skip_special_tokens=True))

This is a completion model — no chat template, no special tokens needed at inference. For chat / assistant use, switch to dataseek/magtina350m-instruct.

Intended use & limitations

Intended use. Research, derivative fine-tunes, PT-BR language-modeling baselines.

Out of scope. Production deployment without further alignment, non-Portuguese tasks, factual question-answering requiring up-to-date or specialised knowledge.

Limitations.

354 M params is small — expect frequent factual errors, weak multi-step reasoning, and brittle code/math.
PT-BR only — minimal exposure to English (~1 % of pretrain), zero exposure to other languages.
Knowledge cutoff: early 2026.
Public-data only; biases of CommonCrawl, Wikipedia PT, and PT-BR news media are present and unaudited.

Citation

@misc{magtina350m2026,
  author = {Frasson, Ricardo and {Dataseek Team}},
  title  = {MagTina350m: A 354 M-parameter Brazilian Portuguese language model},
  year   = {2026},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/dataseek/magtina350m-base}
}

License

CC-BY-NC 4.0 — free for research and non-commercial derivative work; commercial use requires written permission from Dataseek (contact via dataseek.com.br).

Downloads last month: 39

Safetensors

Model size

0.4B params

Tensor type

F16