Biatron Model Card
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: [More Information Needed]
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: [More Information Needed]
- Model type: [More Information Needed]
- Language(s) (NLP): [More Information Needed]
- License: [More Information Needed]
- Finetuned from model [optional]: [More Information Needed]
Model Sources [optional]
- Repository: [More Information Needed]
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
pip install git+https://github.com/Fazzioni/Biatron.git
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
from biatron import BiatronConfig, BiatronForCausalLM, BiatronModel
AutoConfig.register("Biatron", BiatronConfig)
AutoModelForCausalLM.register(BiatronConfig, BiatronForCausalLM)
tokenizer = AutoTokenizer.from_pretrained("Fazzioni/biatron-345m")
model = AutoModelForCausalLM.from_pretrained("Fazzioni/biatron-345m", dtype='bfloat16', device_map="auto")
input = ' O Brasil é '
input_ids = tokenizer(input, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids, max_new_tokens=128, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(outputs))
Direct Use
[More Information Needed]
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model.
[More Information Needed]
Training Details
Training Data
The training data used for this model includes a mixture of datasets aimed at enhancing the model's performance in Portuguese language understanding and mathematical reasoning. The datasets used are as follows:
| Batch Proportion | Dataset | Number of Tokens |
|---|---|---|
| 60% | *TucanoBR/GigaVerbo | 135B |
| 30% | cnmoro/reasoning-v1-20m-portuguese | 45B |
| 5% | HuggingFaceTB/finemath | 13.2B |
| 5% | Infiwebmath-4plus | 11.8B |
Note: For the TucanoBR/GigaVerbo dataset, only the highest quality split was utilized for training.
Batch Proportion indicates the proportion of each dataset in the training batches.
Training Procedure
Training Hyperparameters
- Batch size: 512
- Context length: 4096 tokens
- Precision: bf16
- Framework: Megatron-LM
- Total Updates: 152000 (more than 1 epoch)
All training hyperparameters are available on the training script: GitHub
- Training regime:
- Total time to train: 792.72 hours
- Total number of training tokens: 300 billion
- Tokens/second/GPU: 112,129 tokens
- Hardware used: Nvidia H100
- Hours used: 792.72 Hours
- Cloud Provider: Centro de Excelência em Inteligência Artificial (CEIA)
- Compute Region: Brazil
The Wandb report is also available here
Evaluation
The evaluation was performed using LightEval and the code is available on GitHub
General Results
| oab | enem | openai_mmlu | exams | all | |
|---|---|---|---|---|---|
| google/gemma-3-1B-pt | 0.243 | 0.199 | 0.262 | 0.25 | 0.257 |
| TucanoBR/Tucano-630m | 0.247 | 0.197 | 0.254 | 0.226 | 0.249 |
| Fazzioni/biatron-345m | 0.245 | 0.216 | 0.248 | 0.224 | 0.245 |
| HuggingFaceTB/SmolLM2-360M | 0.231 | 0.201 | 0.239 | 0.213 | 0.235 |
| TucanoBR/Tucano-160m | 0.229 | 0.209 | 0.234 | 0.222 | 0.231 |
| Qwen/Qwen3-0.6B-Base | 0.23 | 0.207 | 0.231 | 0.222 | 0.229 |
| google/gemma-3-270m | 0.23 | 0.203 | 0.231 | 0.22 | 0.229 |
Detailed Results
| Fazzioni/biatron-345m | google/gemma-3-270m | google/gemma-3-1B-pt | TucanoBR/Tucano-630m | HuggingFaceTB/SmolLM2-360M | Qwen/Qwen3-0.6B-Base | TucanoBR/Tucano-160m | |
|---|---|---|---|---|---|---|---|
| all | 0.245 | 0.229 | 0.257 | 0.249 | 0.235 | 0.229 | 0.231 |
| enem_por_mcf:_average:0 | 0.216 | 0.203 | 0.199 | 0.197 | 0.201 | 0.207 | 0.209 |
| openai_mmlu_por_mcf:_average:0 | 0.248 | 0.231 | 0.262 | 0.254 | 0.239 | 0.231 | 0.234 |
| exams_por_mcf:_average:0 | 0.224 | 0.22 | 0.25 | 0.226 | 0.213 | 0.222 | 0.222 |
| m3exams_por_mcf:0 | 0.225 | 0.202 | 0.197 | 0.201 | 0.198 | 0.198 | 0.191 |
| enem_por_mcf:2022:0 | 0.212 | 0.196 | 0.207 | 0.179 | 0.196 | 0.207 | 0.212 |
| enem_por_mcf:2023:0 | 0.246 | 0.246 | 0.173 | 0.207 | 0.235 | 0.24 | 0.24 |
| enem_por_mcf:2024:0 | 0.19 | 0.168 | 0.218 | 0.207 | 0.173 | 0.173 | 0.173 |
| openai_mmlu_por_mcf:abstract_algebra:0 | 0.31 | 0.22 | 0.31 | 0.2 | 0.24 | 0.22 | 0.22 |
| openai_mmlu_por_mcf:anatomy:0 | 0.237 | 0.185 | 0.289 | 0.244 | 0.222 | 0.185 | 0.237 |
| openai_mmlu_por_mcf:astronomy:0 | 0.224 | 0.191 | 0.329 | 0.23 | 0.197 | 0.178 | 0.184 |
| openai_mmlu_por_mcf:business_ethics:0 | 0.23 | 0.3 | 0.19 | 0.25 | 0.22 | 0.3 | 0.31 |
| openai_mmlu_por_mcf:clinical_knowledge:0 | 0.245 | 0.211 | 0.242 | 0.264 | 0.204 | 0.215 | 0.223 |
| openai_mmlu_por_mcf:college_biology:0 | 0.264 | 0.243 | 0.243 | 0.243 | 0.236 | 0.257 | 0.271 |
| openai_mmlu_por_mcf:college_chemistry:0 | 0.28 | 0.21 | 0.36 | 0.33 | 0.22 | 0.2 | 0.18 |
| openai_mmlu_por_mcf:college_computer_science:0 | 0.32 | 0.27 | 0.32 | 0.28 | 0.25 | 0.26 | 0.25 |
| openai_mmlu_por_mcf:college_mathematics:0 | 0.24 | 0.21 | 0.27 | 0.26 | 0.25 | 0.21 | 0.18 |
| openai_mmlu_por_mcf:college_medicine:0 | 0.231 | 0.202 | 0.312 | 0.243 | 0.197 | 0.208 | 0.214 |
| openai_mmlu_por_mcf:college_physics:0 | 0.225 | 0.216 | 0.363 | 0.314 | 0.206 | 0.216 | 0.225 |
| openai_mmlu_por_mcf:computer_security:0 | 0.27 | 0.28 | 0.22 | 0.21 | 0.28 | 0.28 | 0.27 |
| openai_mmlu_por_mcf:conceptual_physics:0 | 0.264 | 0.272 | 0.187 | 0.179 | 0.247 | 0.264 | 0.281 |
| openai_mmlu_por_mcf:econometrics:0 | 0.193 | 0.228 | 0.219 | 0.184 | 0.219 | 0.237 | 0.246 |
| openai_mmlu_por_mcf:electrical_engineering:0 | 0.262 | 0.228 | 0.276 | 0.186 | 0.262 | 0.241 | 0.255 |
| openai_mmlu_por_mcf:elementary_mathematics:0 | 0.233 | 0.198 | 0.238 | 0.251 | 0.225 | 0.209 | 0.228 |
| openai_mmlu_por_mcf:formal_logic:0 | 0.222 | 0.27 | 0.302 | 0.254 | 0.294 | 0.286 | 0.254 |
| openai_mmlu_por_mcf:global_facts:0 | 0.26 | 0.19 | 0.34 | 0.21 | 0.24 | 0.18 | 0.18 |
| openai_mmlu_por_mcf:high_school_biology:0 | 0.245 | 0.187 | 0.252 | 0.3 | 0.187 | 0.177 | 0.174 |
| openai_mmlu_por_mcf:high_school_chemistry:0 | 0.202 | 0.153 | 0.286 | 0.256 | 0.177 | 0.153 | 0.212 |
| openai_mmlu_por_mcf:high_school_computer_science:0 | 0.35 | 0.25 | 0.3 | 0.22 | 0.23 | 0.25 | 0.25 |
| openai_mmlu_por_mcf:high_school_european_history:0 | 0.212 | 0.218 | 0.206 | 0.23 | 0.23 | 0.218 | 0.218 |
| openai_mmlu_por_mcf:high_school_geography:0 | 0.348 | 0.182 | 0.273 | 0.303 | 0.202 | 0.177 | 0.187 |
| openai_mmlu_por_mcf:high_school_government_and_politics:0 | 0.275 | 0.197 | 0.301 | 0.306 | 0.228 | 0.197 | 0.197 |
| openai_mmlu_por_mcf:high_school_macroeconomics:0 | 0.279 | 0.21 | 0.226 | 0.287 | 0.231 | 0.203 | 0.205 |
| openai_mmlu_por_mcf:high_school_mathematics:0 | 0.248 | 0.226 | 0.281 | 0.281 | 0.233 | 0.211 | 0.222 |
| openai_mmlu_por_mcf:high_school_microeconomics:0 | 0.252 | 0.206 | 0.248 | 0.324 | 0.223 | 0.21 | 0.206 |
| openai_mmlu_por_mcf:high_school_physics:0 | 0.219 | 0.205 | 0.258 | 0.278 | 0.199 | 0.199 | 0.219 |
| openai_mmlu_por_mcf:high_school_psychology:0 | 0.246 | 0.196 | 0.261 | 0.239 | 0.202 | 0.193 | 0.189 |
| openai_mmlu_por_mcf:high_school_statistics:0 | 0.199 | 0.181 | 0.269 | 0.426 | 0.157 | 0.153 | 0.144 |
| openai_mmlu_por_mcf:high_school_us_history:0 | 0.275 | 0.24 | 0.245 | 0.265 | 0.25 | 0.25 | 0.25 |
| openai_mmlu_por_mcf:high_school_world_history:0 | 0.224 | 0.266 | 0.291 | 0.245 | 0.266 | 0.27 | 0.262 |
| openai_mmlu_por_mcf:human_aging:0 | 0.256 | 0.291 | 0.26 | 0.314 | 0.287 | 0.314 | 0.318 |
| openai_mmlu_por_mcf:human_sexuality:0 | 0.29 | 0.237 | 0.206 | 0.198 | 0.26 | 0.26 | 0.26 |
| openai_mmlu_por_mcf:international_law:0 | 0.198 | 0.24 | 0.14 | 0.215 | 0.207 | 0.24 | 0.24 |
| openai_mmlu_por_mcf:jurisprudence:0 | 0.231 | 0.25 | 0.176 | 0.259 | 0.269 | 0.259 | 0.259 |
| openai_mmlu_por_mcf:logical_fallacies:0 | 0.233 | 0.215 | 0.294 | 0.233 | 0.215 | 0.221 | 0.221 |
| openai_mmlu_por_mcf:machine_learning:0 | 0.259 | 0.277 | 0.214 | 0.205 | 0.33 | 0.312 | 0.33 |
| openai_mmlu_por_mcf:management:0 | 0.233 | 0.184 | 0.252 | 0.204 | 0.204 | 0.175 | 0.175 |
| openai_mmlu_por_mcf:marketing:0 | 0.197 | 0.286 | 0.248 | 0.244 | 0.269 | 0.291 | 0.261 |
| openai_mmlu_por_mcf:medical_genetics:0 | 0.23 | 0.33 | 0.32 | 0.29 | 0.35 | 0.3 | 0.28 |
| openai_mmlu_por_mcf:miscellaneous:0 | 0.258 | 0.24 | 0.268 | 0.208 | 0.262 | 0.238 | 0.25 |
| openai_mmlu_por_mcf:moral_disputes:0 | 0.269 | 0.254 | 0.266 | 0.223 | 0.28 | 0.249 | 0.246 |
| openai_mmlu_por_mcf:moral_scenarios:0 | 0.273 | 0.246 | 0.247 | 0.258 | 0.247 | 0.238 | 0.238 |
| openai_mmlu_por_mcf:nutrition:0 | 0.248 | 0.225 | 0.261 | 0.242 | 0.235 | 0.225 | 0.219 |
| openai_mmlu_por_mcf:philosophy:0 | 0.238 | 0.19 | 0.267 | 0.193 | 0.203 | 0.186 | 0.196 |
| openai_mmlu_por_mcf:prehistory:0 | 0.231 | 0.213 | 0.235 | 0.25 | 0.244 | 0.216 | 0.222 |
| openai_mmlu_por_mcf:professional_accounting:0 | 0.238 | 0.23 | 0.27 | 0.284 | 0.28 | 0.234 | 0.234 |
| openai_mmlu_por_mcf:professional_law:0 | 0.231 | 0.246 | 0.254 | 0.246 | 0.252 | 0.246 | 0.247 |
| openai_mmlu_por_mcf:professional_medicine:0 | 0.228 | 0.188 | 0.272 | 0.25 | 0.199 | 0.184 | 0.18 |
| openai_mmlu_por_mcf:professional_psychology:0 | 0.235 | 0.24 | 0.245 | 0.217 | 0.255 | 0.25 | 0.248 |
| openai_mmlu_por_mcf:public_relations:0 | 0.227 | 0.218 | 0.209 | 0.282 | 0.218 | 0.218 | 0.227 |
| openai_mmlu_por_mcf:security_studies:0 | 0.241 | 0.192 | 0.249 | 0.31 | 0.196 | 0.188 | 0.188 |
| openai_mmlu_por_mcf:sociology:0 | 0.244 | 0.264 | 0.274 | 0.269 | 0.244 | 0.244 | 0.244 |
| openai_mmlu_por_mcf:us_foreign_policy:0 | 0.24 | 0.28 | 0.26 | 0.27 | 0.33 | 0.28 | 0.28 |
| openai_mmlu_por_mcf:virology:0 | 0.259 | 0.283 | 0.223 | 0.211 | 0.259 | 0.283 | 0.289 |
| openai_mmlu_por_mcf:world_religions:0 | 0.281 | 0.31 | 0.298 | 0.316 | 0.287 | 0.322 | 0.333 |
| exams_por_mcf:biology:0 | 0.239 | 0.227 | 0.256 | 0.295 | 0.193 | 0.233 | 0.233 |
| exams_por_mcf:economics:0 | 0.18 | 0.279 | 0.252 | 0.207 | 0.261 | 0.279 | 0.279 |
| exams_por_mcf:geology:0 | 0.276 | 0.241 | 0.224 | 0.233 | 0.233 | 0.241 | 0.241 |
| exams_por_mcf:philosophy:0 | 0.2 | 0.133 | 0.267 | 0.167 | 0.167 | 0.133 | 0.133 |
| oab_exams_por_mcf:0 | 0.245 | 0.23 | 0.243 | 0.247 | 0.231 | 0.23 | 0.229 |
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 103
