Text Generation
Transformers
Safetensors
Portuguese
Biatron
conversational

Biatron Model Card

img

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: [More Information Needed]
  • Funded by [optional]: [More Information Needed]
  • Shared by [optional]: [More Information Needed]
  • Model type: [More Information Needed]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

  • Repository: [More Information Needed]
  • Paper [optional]: [More Information Needed]
  • Demo [optional]: [More Information Needed]

Uses

pip install git+https://github.com/Fazzioni/Biatron.git

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
from biatron import BiatronConfig, BiatronForCausalLM, BiatronModel
AutoConfig.register("Biatron", BiatronConfig)
AutoModelForCausalLM.register(BiatronConfig, BiatronForCausalLM)


tokenizer = AutoTokenizer.from_pretrained("Fazzioni/biatron-345m")
model = AutoModelForCausalLM.from_pretrained("Fazzioni/biatron-345m", dtype='bfloat16', device_map="auto")


input = ' O Brasil é '
input_ids = tokenizer(input, return_tensors="pt").input_ids.to(model.device)
outputs = model.generate(input_ids, max_new_tokens=128, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(tokenizer.batch_decode(outputs))

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

The training data used for this model includes a mixture of datasets aimed at enhancing the model's performance in Portuguese language understanding and mathematical reasoning. The datasets used are as follows:

Batch Proportion Dataset Number of Tokens
60% *TucanoBR/GigaVerbo 135B
30% cnmoro/reasoning-v1-20m-portuguese 45B
5% HuggingFaceTB/finemath 13.2B
5% Infiwebmath-4plus 11.8B

Note: For the TucanoBR/GigaVerbo dataset, only the highest quality split was utilized for training.

Batch Proportion indicates the proportion of each dataset in the training batches.

Training Procedure

Training Hyperparameters

  • Batch size: 512
  • Context length: 4096 tokens
  • Precision: bf16
  • Framework: Megatron-LM
  • Total Updates: 152000 (more than 1 epoch)

All training hyperparameters are available on the training script: GitHub

The Wandb report is also available here

Evaluation

The evaluation was performed using LightEval and the code is available on GitHub

General Results

oab enem openai_mmlu exams all
google/gemma-3-1B-pt 0.243 0.199 0.262 0.25 0.257
TucanoBR/Tucano-630m 0.247 0.197 0.254 0.226 0.249
Fazzioni/biatron-345m 0.245 0.216 0.248 0.224 0.245
HuggingFaceTB/SmolLM2-360M 0.231 0.201 0.239 0.213 0.235
TucanoBR/Tucano-160m 0.229 0.209 0.234 0.222 0.231
Qwen/Qwen3-0.6B-Base 0.23 0.207 0.231 0.222 0.229
google/gemma-3-270m 0.23 0.203 0.231 0.22 0.229

Detailed Results

Fazzioni/biatron-345m google/gemma-3-270m google/gemma-3-1B-pt TucanoBR/Tucano-630m HuggingFaceTB/SmolLM2-360M Qwen/Qwen3-0.6B-Base TucanoBR/Tucano-160m
all 0.245 0.229 0.257 0.249 0.235 0.229 0.231
enem_por_mcf:_average:0 0.216 0.203 0.199 0.197 0.201 0.207 0.209
openai_mmlu_por_mcf:_average:0 0.248 0.231 0.262 0.254 0.239 0.231 0.234
exams_por_mcf:_average:0 0.224 0.22 0.25 0.226 0.213 0.222 0.222
m3exams_por_mcf:0 0.225 0.202 0.197 0.201 0.198 0.198 0.191
enem_por_mcf:2022:0 0.212 0.196 0.207 0.179 0.196 0.207 0.212
enem_por_mcf:2023:0 0.246 0.246 0.173 0.207 0.235 0.24 0.24
enem_por_mcf:2024:0 0.19 0.168 0.218 0.207 0.173 0.173 0.173
openai_mmlu_por_mcf:abstract_algebra:0 0.31 0.22 0.31 0.2 0.24 0.22 0.22
openai_mmlu_por_mcf:anatomy:0 0.237 0.185 0.289 0.244 0.222 0.185 0.237
openai_mmlu_por_mcf:astronomy:0 0.224 0.191 0.329 0.23 0.197 0.178 0.184
openai_mmlu_por_mcf:business_ethics:0 0.23 0.3 0.19 0.25 0.22 0.3 0.31
openai_mmlu_por_mcf:clinical_knowledge:0 0.245 0.211 0.242 0.264 0.204 0.215 0.223
openai_mmlu_por_mcf:college_biology:0 0.264 0.243 0.243 0.243 0.236 0.257 0.271
openai_mmlu_por_mcf:college_chemistry:0 0.28 0.21 0.36 0.33 0.22 0.2 0.18
openai_mmlu_por_mcf:college_computer_science:0 0.32 0.27 0.32 0.28 0.25 0.26 0.25
openai_mmlu_por_mcf:college_mathematics:0 0.24 0.21 0.27 0.26 0.25 0.21 0.18
openai_mmlu_por_mcf:college_medicine:0 0.231 0.202 0.312 0.243 0.197 0.208 0.214
openai_mmlu_por_mcf:college_physics:0 0.225 0.216 0.363 0.314 0.206 0.216 0.225
openai_mmlu_por_mcf:computer_security:0 0.27 0.28 0.22 0.21 0.28 0.28 0.27
openai_mmlu_por_mcf:conceptual_physics:0 0.264 0.272 0.187 0.179 0.247 0.264 0.281
openai_mmlu_por_mcf:econometrics:0 0.193 0.228 0.219 0.184 0.219 0.237 0.246
openai_mmlu_por_mcf:electrical_engineering:0 0.262 0.228 0.276 0.186 0.262 0.241 0.255
openai_mmlu_por_mcf:elementary_mathematics:0 0.233 0.198 0.238 0.251 0.225 0.209 0.228
openai_mmlu_por_mcf:formal_logic:0 0.222 0.27 0.302 0.254 0.294 0.286 0.254
openai_mmlu_por_mcf:global_facts:0 0.26 0.19 0.34 0.21 0.24 0.18 0.18
openai_mmlu_por_mcf:high_school_biology:0 0.245 0.187 0.252 0.3 0.187 0.177 0.174
openai_mmlu_por_mcf:high_school_chemistry:0 0.202 0.153 0.286 0.256 0.177 0.153 0.212
openai_mmlu_por_mcf:high_school_computer_science:0 0.35 0.25 0.3 0.22 0.23 0.25 0.25
openai_mmlu_por_mcf:high_school_european_history:0 0.212 0.218 0.206 0.23 0.23 0.218 0.218
openai_mmlu_por_mcf:high_school_geography:0 0.348 0.182 0.273 0.303 0.202 0.177 0.187
openai_mmlu_por_mcf:high_school_government_and_politics:0 0.275 0.197 0.301 0.306 0.228 0.197 0.197
openai_mmlu_por_mcf:high_school_macroeconomics:0 0.279 0.21 0.226 0.287 0.231 0.203 0.205
openai_mmlu_por_mcf:high_school_mathematics:0 0.248 0.226 0.281 0.281 0.233 0.211 0.222
openai_mmlu_por_mcf:high_school_microeconomics:0 0.252 0.206 0.248 0.324 0.223 0.21 0.206
openai_mmlu_por_mcf:high_school_physics:0 0.219 0.205 0.258 0.278 0.199 0.199 0.219
openai_mmlu_por_mcf:high_school_psychology:0 0.246 0.196 0.261 0.239 0.202 0.193 0.189
openai_mmlu_por_mcf:high_school_statistics:0 0.199 0.181 0.269 0.426 0.157 0.153 0.144
openai_mmlu_por_mcf:high_school_us_history:0 0.275 0.24 0.245 0.265 0.25 0.25 0.25
openai_mmlu_por_mcf:high_school_world_history:0 0.224 0.266 0.291 0.245 0.266 0.27 0.262
openai_mmlu_por_mcf:human_aging:0 0.256 0.291 0.26 0.314 0.287 0.314 0.318
openai_mmlu_por_mcf:human_sexuality:0 0.29 0.237 0.206 0.198 0.26 0.26 0.26
openai_mmlu_por_mcf:international_law:0 0.198 0.24 0.14 0.215 0.207 0.24 0.24
openai_mmlu_por_mcf:jurisprudence:0 0.231 0.25 0.176 0.259 0.269 0.259 0.259
openai_mmlu_por_mcf:logical_fallacies:0 0.233 0.215 0.294 0.233 0.215 0.221 0.221
openai_mmlu_por_mcf:machine_learning:0 0.259 0.277 0.214 0.205 0.33 0.312 0.33
openai_mmlu_por_mcf:management:0 0.233 0.184 0.252 0.204 0.204 0.175 0.175
openai_mmlu_por_mcf:marketing:0 0.197 0.286 0.248 0.244 0.269 0.291 0.261
openai_mmlu_por_mcf:medical_genetics:0 0.23 0.33 0.32 0.29 0.35 0.3 0.28
openai_mmlu_por_mcf:miscellaneous:0 0.258 0.24 0.268 0.208 0.262 0.238 0.25
openai_mmlu_por_mcf:moral_disputes:0 0.269 0.254 0.266 0.223 0.28 0.249 0.246
openai_mmlu_por_mcf:moral_scenarios:0 0.273 0.246 0.247 0.258 0.247 0.238 0.238
openai_mmlu_por_mcf:nutrition:0 0.248 0.225 0.261 0.242 0.235 0.225 0.219
openai_mmlu_por_mcf:philosophy:0 0.238 0.19 0.267 0.193 0.203 0.186 0.196
openai_mmlu_por_mcf:prehistory:0 0.231 0.213 0.235 0.25 0.244 0.216 0.222
openai_mmlu_por_mcf:professional_accounting:0 0.238 0.23 0.27 0.284 0.28 0.234 0.234
openai_mmlu_por_mcf:professional_law:0 0.231 0.246 0.254 0.246 0.252 0.246 0.247
openai_mmlu_por_mcf:professional_medicine:0 0.228 0.188 0.272 0.25 0.199 0.184 0.18
openai_mmlu_por_mcf:professional_psychology:0 0.235 0.24 0.245 0.217 0.255 0.25 0.248
openai_mmlu_por_mcf:public_relations:0 0.227 0.218 0.209 0.282 0.218 0.218 0.227
openai_mmlu_por_mcf:security_studies:0 0.241 0.192 0.249 0.31 0.196 0.188 0.188
openai_mmlu_por_mcf:sociology:0 0.244 0.264 0.274 0.269 0.244 0.244 0.244
openai_mmlu_por_mcf:us_foreign_policy:0 0.24 0.28 0.26 0.27 0.33 0.28 0.28
openai_mmlu_por_mcf:virology:0 0.259 0.283 0.223 0.211 0.259 0.283 0.289
openai_mmlu_por_mcf:world_religions:0 0.281 0.31 0.298 0.316 0.287 0.322 0.333
exams_por_mcf:biology:0 0.239 0.227 0.256 0.295 0.193 0.233 0.233
exams_por_mcf:economics:0 0.18 0.279 0.252 0.207 0.261 0.279 0.279
exams_por_mcf:geology:0 0.276 0.241 0.224 0.233 0.233 0.241 0.241
exams_por_mcf:philosophy:0 0.2 0.133 0.267 0.167 0.167 0.133 0.133
oab_exams_por_mcf:0 0.245 0.23 0.243 0.247 0.231 0.23 0.229

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month
103
Safetensors
Model size
0.3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train Biatron/biatron-345m