Instructions to use Biatron/biatron-345m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Biatron/biatron-345m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Biatron/biatron-345m")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Biatron/biatron-345m", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Biatron/biatron-345m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Biatron/biatron-345m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Biatron/biatron-345m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Biatron/biatron-345m

SGLang

How to use Biatron/biatron-345m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Biatron/biatron-345m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Biatron/biatron-345m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Biatron/biatron-345m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Biatron/biatron-345m",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Biatron/biatron-345m with Docker Model Runner:
```
docker model run hf.co/Biatron/biatron-345m
```

Biatron Model Card

Model Details

Biatron-345M is the first model in the Biatron series, designed to address the scarcity of high-performance SLMs for Portuguese, particularly in technical domains. With 345 million parameters, it balances computational efficiency with specialized reasoning capabilities.

Model Description

Model type: Decoder-only Transformer (Megatron-LM architecture)
Language(s) (NLP): Brazilian Portuguese (pt-br)
License: Apache 2.0
Finetuned from model: Pre-trained from scratch

Model Sources

Paper: Biatron: A Parameter-Efficient Small Language Model for Brazilian Portuguese with Integrated Mathematical Reasoning (PROPOR 2026)

Uses

pip install git+https://github.com/Fazzioni/Biatron.git


from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
from biatron import BiatronConfig, BiatronForCausalLM, BiatronModel
AutoConfig.register("Biatron", BiatronConfig)
AutoModelForCausalLM.register(BiatronConfig, BiatronForCausalLM)


tokenizer = AutoTokenizer.from_pretrained("Biatron/biatron-345m")
model = AutoModelForCausalLM.from_pretrained("Biatron/biatron-345m", dtype='bfloat16', device_map="auto")


input = ' O Brasil é '
input_ids = tokenizer(input, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids, max_new_tokens=128, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(outputs))

Downstream Use

The model serves as an efficient base for fine-tuning on specific downstream tasks such as:

Named Entity Recognition (NER)
Sentiment Analysis
Domain-specific Chatbots (e.g., Legal or Administrative assistants in Portuguese)

Out-of-Scope Use

High-Stakes Decision Making: Should not be used for medical, legal, or financial advice without human oversight.
Non-Portuguese Languages: The model is not optimized for languages other than Portuguese.
Complex Mathematics: While it outperforms its size class, it is not a replacement for symbolic solvers or large-scale math models (e.g., GPT-4) for advanced calculus or algebra.

Bias, Risks, and Limitations

Hallucinations: Like all LLMs, especially smaller ones, Biatron can generate plausible-sounding but factually incorrect information.
Bias: The model was trained on web data (GigaVerbo) which may contain inherent societal biases found in internet text.
Context Window: The model has a context window of 4096 tokens; exceeding this may lead to loss of coherence. Language Specificity: It is heavily optimized for Brazilian Portuguese; European Portuguese performance has not been extensively benchmarked.

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Details

Training Data

The training data used for this model includes a mixture of datasets aimed at enhancing the model's performance in Portuguese language understanding and mathematical reasoning. The datasets used are as follows:

Batch Proportion	Dataset	Number of Tokens
60%	*TucanoBR/GigaVerbo	135B
30%	cnmoro/reasoning-v1-20m-portuguese	45B
5%	HuggingFaceTB/finemath	13.2B
5%	Infiwebmath-4plus	11.8B

Note: For the TucanoBR/GigaVerbo dataset, only the highest quality split was utilized for training.

Batch Proportion indicates the proportion of each dataset in the training batches.

Training Procedure

Training Hyperparameters

Batch size: 512
Context length: 4096 tokens
Precision: bf16
Framework: Megatron-LM
Total Updates: 152000 (more than 1 epoch)

All training hyperparameters are available on the training script: GitHub

Training regime:
Total time to train: 792.72 hours
Total number of training tokens: 300 billion
Tokens/second/GPU: 112,129 tokens
Hardware used: Nvidia H100
Hours used: 792.72 Hours
Cloud Provider: Centro de Excelência em Inteligência Artificial (CEIA)
Compute Region: Brazil

The Wandb report is also available here

Evaluation

General Results

The evaluation was performed using LightEval and the code is available on GitHub

	oab	enem	openai_mmlu	exams	all
google/gemma-3-1B-pt	0.243	0.199	0.262	0.25	0.257
TucanoBR/Tucano-630m	0.247	0.197	0.254	0.226	0.249
Biatron/biatron-345m (ours)	0.245	0.216	0.248	0.224	0.245
HuggingFaceTB/SmolLM2-360M	0.231	0.201	0.239	0.213	0.235
TucanoBR/Tucano-160m	0.229	0.209	0.234	0.222	0.231
Qwen/Qwen3-0.6B-Base	0.23	0.207	0.231	0.222	0.229
google/gemma-3-270m	0.23	0.203	0.231	0.22	0.229

Evaluation on Specific Tasks

This table show the performance of Biatron-345M on specific tasks with few-shot prompting. The code and the datasets used for this evaluation is available on Biatron/few-shot-math-addition

Citation

BibTeX:

@inproceedings{biatron2026,
  title={Biatron: A Parameter-Efficient Small Language Model for Brazilian Portuguese with Integrated Mathematical Reasoning},
  author={Daniel Fazzioni and Maria de Almeida and Anna Moreira and Sávio de Oliveira and Anderson Soares and Fernando Federson},
  booktitle={Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR)},
  year={2026}
}

Acknowledgements

We gratefully acknowledge the TucanoBR project and its authors for their significant contributions to Brazilian Portuguese LLM research. Their release of the GigaVerbo dataset and the Tucano model family provided essential resources that supported this work. We also thank the Center of Excellence in Artificial Intelligence (CEIA) at the Federal University of Goiás (UFG) for providing the computational infrastructure that made this research possible

Downloads last month: 29

Safetensors

Model size

0.3B params

Tensor type

BF16

Biatron
/

biatron-345m