Update README.md

dad97dc verified 4 days ago

30.2 kB

	---
	language:
	- pt
	license: apache-2.0
	library_name: transformers
	tags:
	- text-generation-inference
	datasets:
	- Polygl0t/gigaverbo-v2
	- Polygl0t/gigaverbo-v2-synth
	- allenai/big-reasoning-traces
	- HuggingFaceTB/smollm-corpus
	- HuggingFaceTB/finemath
	- HuggingFaceFW/fineweb-edu
	- allenai/math-meta-reasoning-filtered
	- nvidia/OpenScience
	metrics:
	- perplexity
	pipeline_tag: text-generation
	widget:
	- text: "A floresta da Amazônia é conhecida por sua"
	example_title: Exemplo
	- text: "Uma das coisas que Portugal, Angola, Brasil e Moçambique tem em comum é o"
	example_title: Exemplo
	- text: "O Carnaval do Rio de Janeiro é"
	example_title: Exemplo
	inference:
	parameters:
	repetition_penalty: 1.2
	temperature: 0.1
	top_k: 50
	top_p: 1.0
	max_new_tokens: 150
	co2_eq_emissions:
	emissions: 332000
	source: CodeCarbon
	training_type: pre-training
	geographical_location: Germany
	hardware_used: NVIDIA A100-SXM4-80GB
	model-index:
	- name: Tucano2-0.6B-Base
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ARC Challenge
	type: Polygl0t/ARC-poly
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc_norm
	value: 37.01
	name: Acc-norm
	source:
	url: https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese
	name: arc_challenge_poly_pt
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag
	type: Polygl0t/HellaSwag-poly
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc_norm
	value: 47.74
	name: Acc-norm
	source:
	url: https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese
	name: hellaswag_poly_pt
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Calame
	type: Polygl0t/CALAME-PT
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 57.61
	name: Acc
	source:
	url: https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese
	name: calame_pt
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Lambada
	type: Polygl0t/LAMBADA-poly
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 39.45
	name: Acc
	source:
	url: https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese
	name: lambada_poly_pt
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Global PIQA
	type: mrlbenchmarks/global-piqa-nonparallel
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc_norm
	value: 79
	name: Acc-norm
	source:
	url: https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese
	name: global_piqa_completions_por_latn_braz
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU
	type: Polygl0t/MMLU-poly
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 27.18
	name: Acc
	source:
	url: https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese
	name: mmlu_poly_pt
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BELEBELE
	type: facebook/belebele
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc_norm
	value: 26.22
	name: Acc-norm
	source:
	url: https://github.com/Polygl0t/lm-evaluation-harness/tree/polyglot_harness_portuguese
	name: belebele_poly_pt
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 21.14
	name: Acc
	source:
	url: https://github.com/eduagarcia/lm-evaluation-harness-pt
	name: bluex
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 23.58
	name: Acc
	source:
	url: https://github.com/eduagarcia/lm-evaluation-harness-pt
	name: enem_challenge
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 23.28
	name: Acc
	source:
	url: https://github.com/eduagarcia/lm-evaluation-harness-pt
	name: oab_exams
	---

	# Tucano2-0.6B-Base

	<img src="./logo.png" alt="An illustration of a Tucano bird showing vibrant colors like yellow, orange, blue, green, and black." height="200">

	## Model Summary

	[Tucano2-0.6B-Base](https://huggingface.co/Polygl0t/Tucano2-0.6B-Base) is a decoder-transformer natively pretrained in Portuguese and English. Tucano2 is part of the [Polygl0t](https://huggingface.co/Polygl0t) initiative, which aims to advance language models for low-resource languages.

	The model was pretrained on approximately 400 billion tokens and achieves state-of-the-art performance across several benchmarks designed to evaluate Portuguese language models. All data, source code, and recipes used to develop the Tucano2 series are open and fully reproducible.

	## Details

	- Architecture: a Transformer-based model ([`llama`](https://huggingface.co/docs/transformers/main/en/model_doc/llama))
	- Size: 670,127,616 parameters
	- Context length: 4096 tokens
	- Dataset(s):
	- [Polygl0t/gigaverbo-v2](https://huggingface.co/datasets/Polygl0t/gigaverbo-v2)
	- [Polygl0t/gigaverbo-v2-synth](https://huggingface.co/datasets/Polygl0t/gigaverbo-v2-synth)
	- [allenai/big-reasoning-traces](https://huggingface.co/datasets/allenai/big-reasoning-traces)
	- [HuggingFaceTB/smollm-corpus](https://huggingface.co/datasets/HuggingFaceTB/smollm-corpus)
	- [HuggingFaceTB/finemath](https://huggingface.co/datasets/HuggingFaceTB/finemath)
	- [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu)
	- [allenai/math-meta-reasoning-filtered](https://huggingface.co/datasets/allenai/math-meta-reasoning-filtered)
	- [nvidia/OpenScience](https://huggingface.co/datasets/nvidia/OpenScience)
	- Language(s): Portuguese
	- Batch size: 2,097,152 tokens
	- Number of steps: 195,000
	- GPU: 8 NVIDIA A100-SXM4-80GB
	- Training time: ~ 213 hours
	- Emissions: 332 KgCO2 (Germany)
	- Total energy consumption: 872 kWh

	This repository has the [source code](https://github.com/Polygl0t/llm-foundry) used to train this model. The full configuration used for training is available in the following config files:

	- Stage 1 (warmup + stable): [config_stage_1.yaml](config_stage_1.yaml)
	- Stage 2 (stable): [config_stage_2.yaml](config_stage_2.yaml)
	- Stage 3 (1-sqrt decay): [config_stage_3.yaml](config_stage_3.yaml)

	### Checkpoints

	Checkpoints were saved every 5,000 steps, which equates to approximately 10 billion tokens. The main branch of this repository contains the final checkpoint saved at step 195,000. All other checkpoints are available as separate branches. To load a specific checkpoint, you can use the following code snippet:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "Polygl0t/Tucano2-0.6B-Base"
	revision = "step-160000-end-of-stage-2" # Change this to the desired checkpoint branch
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, revision=revision)
	```

	Or, you can access all the revisions for the models via the following code snippet:

	```python
	from huggingface_hub import list_repo_refs
	out = list_repo_refs("Polygl0t/Tucano2-0.6B-Base")
	branches = [b.name for b in out.branches]
	print(branches)
	```

	<details>
	<summary><b>Learning Curves</b></summary>

	![Learning Curves](./.plots/learning_curve.png)

	This plot illustrates the evolution of model performance (measured by loss) as a function of training time, measured in tokens seen during training

	</details>

	<details>
	<summary><b>Gradient Norms (L2)</b></summary>

	![Gradient Norms](./.plots/gradient_norm.png)

	This plot illustrates the evolution of gradient norms as a function of training time, measured in tokens seen during training.

	</details>

	## Intended Uses

	The primary intended use of Tucano2-0.6B-Base is to serve as a foundation for research and development involving Portuguese language modeling. Checkpoints saved during training are designed to provide a controlled setting for performing comparative experiments, specifically regarding the effects of active pretraining on the performance of currently available benchmarks. You may also fine-tune and adapt Tucano2-0.6B-Base for deployment if your use follows the Apache 2.0 license. If you decide to use Tucano2-0.6B-Base as a basis for your fine-tuned model, please conduct your own risk and bias assessment.

	## Out-of-scope Use

	- Tucano2-0.6B-Base is not intended for deployment. It is not an out-of-the-box product and should not be used for human-facing interactions.
	- Tucano2-0.6B-Base is for the Portuguese language only and is unsuitable for text generation tasks in other languages.

	- Tucano2-0.6B-Base has not been fine-tuned for downstream tasks.

	## Basic usage

	```python
	from transformers import GenerationConfig, TextGenerationPipeline, AutoTokenizer, AutoModelForCausalLM
	import torch

	# Specify the model and tokenizer
	model_id = "Polygl0t/Tucano2-0.6B-Base"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)

	# Specify the generation parameters as you like
	generation_config = GenerationConfig(
	**{
	"do_sample": True,
	"max_new_tokens": 150,
	"renormalize_logits": True,
	"repetition_penalty": 1.2,
	"temperature": 0.1,
	"top_k": 50,
	"top_p": 1.0,
	"use_cache": True,
	}
	)

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	generator = TextGenerationPipeline(model=model, task="text-generation", tokenizer=tokenizer, device=device)

	# Generate text
	prompt = "# A floresta da Amazônia: um lugar de Magia\n\n"
	completion = generator(prompt, generation_config=generation_config)
	print(completion[0]['generated_text'])
	```

	## Limitations

	As almost all other language models trained on large text datasets scraped from the web, the Tucano2-0.6B-Base shows behavior that does not make it an out-of-the-box solution to many real-world applications, especially those requiring factual, reliable, and nontoxic text generation. Tucano2-0.6B-Base is subject to the following:

	- Hallucinations: Tucano2-0.6B-Base can produce content that can be mistaken as facts, but is misleading or entirely false, i.e., hallucination.

	- Biases and Toxicity: Tucano2-0.6B-Base inherits the social and historical stereotypes from the data used to train it. Given these biases, the model can produce toxic content, i.e., harmful, offensive, or detrimental to individuals, groups, or communities.

	- Language Limitations: Tucano2-0.6B-Base is primarily designed to interact with Portuguese. Other languages might challenge its comprehension, leading to potential misinterpretations or errors in response.

	- Repetition and Verbosity: Tucano2-0.6B-Base may get stuck on repetition loops (especially if the repetition penalty during generations is set to a meager value) or produce verbose responses unrelated to the prompt it was given.

	Hence, even though Tucano2-0.6B-Base is released under a permissive license, we urge users to perform their own risk analysis before using it for real-world applications.

	## Evaluations

	The table below compares the Tucano2 series against other base models of similar size. We divide our evaluations into two sets:

	- Easy Set: CALAME, GlobalPIQA, LAMBADA, ARC-Challenge, HellaSwag
	- Hard Set: ENEM, BLUEX, OAB Exams, BELEBELE, MMLU

	The NPM (Normalized Performance Metric) provides a balanced view of model performance across tasks, accounting for each task's inherent difficulty by normalizing its evaluation score relative to its random baseline.

	\| \| Total Avg. \| Easy Set (NPM) \| Hard Set (NPM) \|
	\| -------------------------- \| ---------- \| -------------- \| -------------- \|
	\| Tucano2-qwen-3.7B-Base \| 59.21 \| 57.41 \| 61 \|
	\| Qwen2.5-7B \| 57.97 \| 54.12 \| 61.83 \|
	\| Qwen3-4B-Base \| 57.86 \| 52.52 \| 63.2 \|
	\| SmolLM3-3B-Base \| 50.25 \| 54.06 \| 46.44 \|
	\| Qwen2.5-3B \| 50.16 \| 47.69 \| 52.62 \|
	\| Tucano2-qwen-1.5B-Base \| 47.9 \| 47.97 \| 47.82 \|
	\| Curio-edu-7b \| 45.66 \| 57.46 \| 33.87 \|
	\| Qwen3-1.7B-Base \| 44.48 \| 40.94 \| 48.03 \|
	\| Curio-7b \| 42.79 \| 58.97 \| 26.6 \|
	\| Llama-3.2-3B \| 40.5 \| 43.79 \| 37.21 \|
	\| granite-3.3-2b-base \| 39.97 \| 45.31 \| 34.63 \|
	\| Tucano2-qwen-0.5B-Base \| 35.36 \| 39.93 \| 30.79 \|
	\| Qwen3-0.6B-Base \| 29.4 \| 26.41 \| 32.38 \|
	\| Llama-2-7b-hf \| 29.36 \| 42.69 \| 16.03 \|
	\| Tucano2-0.6B-Base \| 20.64 \| 40.28 \| 0.99 \|
	\| Qwen2.5-0.5B \| 19.89 \| 18.7 \| 21.09 \|
	\| Curio-1.1b \| 19.23 \| 39.16 \| -0.69 \|
	\| Tucano-2b4 \| 17.88 \| 33.55 \| 2.2 \|
	\| Curio-edu-1b1 \| 17.72 \| 34.77 \| 0.67 \|
	\| Llama-3.2-1B \| 16.57 \| 28.32 \| 4.83 \|
	\| Tucano-1b1 \| 15.44 \| 29.12 \| 1.76 \|
	\| Tucano-630m \| 14.9 \| 26.99 \| 2.8 \|
	\| Carvalho_pt-gl-1.3B \| 12.54 \| 26.75 \| -1.66 \|
	\| TeenyTinyLlama-460m \| 11.18 \| 19.65 \| 2.72 \|
	\| Tucano-160m \| 8.78 \| 19.12 \| -1.56 \|
	\| TeenyTinyLlama-160m \| 7.72 \| 15.75 \| -0.31 \|
	\| GlorIA-1.3B \| 5.93 \| 27.27 \| -15.42 \|

	<details>
	<summary><b>Evaluation Suite</b></summary>

	\| Benchmark \| n-shot \| Type \| Baseline \| Metric \|
	\| --------------- \| ---------- \| ------------------ \| ------------ \| ---------- \|
	\| Easy Set \| \| \| \| \|
	\| CALAME \| 5-shot \| Completion \| 0 \| `acc` \|
	\| GlobalPIQA \| 5-shot \| Completion \| 50 \| `acc_norm` \|
	\| LAMBADA \| 5-shot \| Completion \| 0 \| `acc` \|
	\| ARC-Challenge \| 5-shot \| MC-Q&A \| 25 \| `acc_norm` \|
	\| HellaSwag \| 5-shot \| Completion \| 25 \| `acc_norm` \|
	\| Hard Set \| \| \| \| \|
	\| ENEM \| 3-shot \| MC-Q&A \| 20 \| `acc` \|
	\| BLUEX \| 3-shot \| MC-Q&A \| 22.5 \| `acc` \|
	\| OAB Exams \| 3-shot \| MC-Q&A \| 25 \| `acc` \|
	\| BELEBELE \| 5-shot \| MC-Q&A \| 25 \| `acc_norm` \|
	\| MMLU \| 5-shot \| MC-Q&A \| 25 \| `acc` \|

	</details>

	<details>
	<summary><b>Individual Benchmarks</b></summary>

	\| \| BLUEX \| ENEM \| OAB \| ARC Challenge \| BELEBELE \| CALAME \| Global PIQA \| HellaSwag \| LAMBADA \| MMLU \|
	\| -------------------------- \| ----- \| ----- \| ----- \| ------------- \| -------- \| ------ \| ----------- \| --------- \| ------- \| ----- \|
	\| Tucano2-qwen-3.7B-Base \| 66.2 \| 77.54 \| 58.45 \| 57.78 \| 83.67 \| 61.08 \| 83 \| 65.32 \| 62.53 \| 65.4 \|
	\| Qwen2.5-7B \| 65.92 \| 75.02 \| 55.03 \| 54.19 \| 89.67 \| 58.96 \| 78 \| 67.92 \| 59.52 \| 68.55 \|
	\| Qwen3-4B-Base \| 69.96 \| 77.61 \| 55.58 \| 54.53 \| 87.89 \| 57.95 \| 77 \| 63.19 \| 60.37 \| 68.59 \|
	\| SmolLM3-3B-Base \| 54.52 \| 61.37 \| 45.51 \| 51.37 \| 77.67 \| 59.15 \| 81 \| 65.57 \| 59.89 \| 56.19 \|
	\| Qwen2.5-3B \| 58.28 \| 67.32 \| 50.34 \| 45.21 \| 83.22 \| 58.38 \| 75 \| 59.44 \| 57.17 \| 59.79 \|
	\| Tucano2-qwen-1.5B-Base \| 55.91 \| 68.72 \| 48.29 \| 48.21 \| 74 \| 59.06 \| 77 \| 56.25 \| 54.2 \| 54.04 \|
	\| Curio-edu-7b \| 47.15 \| 58.64 \| 43.78 \| 50.94 \| 53 \| 60.79 \| 86 \| 66.48 \| 64.62 \| 45.14 \|
	\| Qwen3-1.7B-Base \| 57.16 \| 65.22 \| 45.79 \| 47.18 \| 77.89 \| 53.56 \| 67 \| 52.55 \| 50.81 \| 55.49 \|
	\| Curio-7b \| 43.39 \| 50.59 \| 39.68 \| 48.03 \| 45.33 \| 63.44 \| 89 \| 67.58 \| 65.94 \| 40.83 \|
	\| Llama-3.2-3B \| 50.35 \| 53.04 \| 39.45 \| 41.11 \| 68.89 \| 54.48 \| 69 \| 59.14 \| 59.48 \| 48.28 \|
	\| granite-3.3-2b-base \| 45.34 \| 54.02 \| 39.54 \| 41.37 \| 65.67 \| 58.77 \| 70 \| 60.81 \| 58.22 \| 45.63 \|
	\| Tucano2-qwen-0.5B-Base \| 46.87 \| 55.14 \| 40.36 \| 37.44 \| 53.89 \| 58.67 \| 74 \| 48.43 \| 45.14 \| 39.68 \|
	\| Qwen3-0.6B-Base \| 42.98 \| 49.48 \| 40.46 \| 36.92 \| 65 \| 45.95 \| 54 \| 40.33 \| 41.78 \| 43.54 \|
	\| Llama-2-7b-hf \| 31.29 \| 31.77 \| 35.49 \| 42.14 \| 41.44 \| 54.53 \| 67 \| 56.76 \| 59.73 \| 38.64 \|
	\| Tucano2-0.6B-Base \| 21.14 \| 23.58 \| 23.28 \| 37.01 \| 26.22 \| 57.61 \| 79 \| 47.74 \| 39.45 \| 27.18 \|
	\| Qwen2.5-0.5B \| 32.55 \| 38.91 \| 35.9 \| 28.46 \| 49.56 \| 44.89 \| 44 \| 37.7 \| 39.08 \| 41.17 \|
	\| Curio-1.1b \| 21.56 \| 21.06 \| 23.1 \| 30.43 \| 22.89 \| 59.25 \| 75 \| 49.45 \| 46.69 \| 26.35 \|
	\| Tucano-2b4 \| 25.45 \| 21.62 \| 26.74 \| 30.43 \| 25.89 \| 50.34 \| 73 \| 48.85 \| 32.39 \| 26.24 \|
	\| Curio-edu-1b1 \| 23.5 \| 19.87 \| 25.01 \| 32.22 \| 26.22 \| 54.91 \| 69 \| 46.3 \| 42.93 \| 25.43 \|
	\| Llama-3.2-1B \| 24.06 \| 23.93 \| 26.06 \| 31.71 \| 33.33 \| 50 \| 55 \| 45.27 \| 45.6 \| 28.51 \|
	\| Tucano-1b1 \| 25.45 \| 21.55 \| 26.38 \| 30.09 \| 25.67 \| 48.94 \| 68 \| 44.1 \| 28.43 \| 25.26 \|
	\| Tucano-630m \| 26.7 \| 21.69 \| 26.92 \| 28.72 \| 27.33 \| 47.3 \| 68 \| 40.37 \| 26.2 \| 25.6 \|
	\| Carvalho_pt-gl-1.3B \| 19.33 \| 18.12 \| 22.32 \| 27.01 \| 26.44 \| 53.42 \| 63 \| 38.53 \| 33.59 \| 24.82 \|
	\| TeenyTinyLlama-460m \| 25.87 \| 20.15 \| 27.02 \| 27.35 \| 28.11 \| 42.49 \| 59 \| 34.81 \| 21.56 \| 26.65 \|
	\| Tucano-160m \| 24.76 \| 20.57 \| 17.22 \| 25.56 \| 23.44 \| 43.59 \| 59 \| 33.73 \| 21.64 \| 25.77 \|
	\| TeenyTinyLlama-160m \| 22.53 \| 18.89 \| 22.32 \| 24.02 \| 26.78 \| 39.79 \| 58 \| 29.89 \| 17.74 \| 25.74 \|
	\| GlorIA-1.3B \| 4.31 \| 2.52 \| 4.69 \| 26.41 \| 22.78 \| 54.67 \| 64 \| 36.35 \| 36.68 \| 23.69 \|

	</details>

	<details>
	<summary><b>NPM Evolution During Pretraining</b></summary>

	Below, we display the performance of Tucano2-0.6B-Base across all benchmarks in our evaluation suite, aggregated using an NPM (Normalized Performance Metric) score. Tucano2-0.6B-Base is compared against two baseline models: Qwen2.5-0.5B and Qwen3-0.6B-Base, which are state-of-the-art multilingual models.

	All individual benchmark scores and their evolution across training time can be found in the [.plots](https://huggingface.co/Polygl0t/Tucano2-0.6B-Base/tree/main/.plots/) folder.

	Easy Set: CALAME, GlobalPIQA, LAMBADA, ARC-Challenge, HellaSwag

	![Benchmarks Easy](./.plots/npm_easy.png)

	Hard Set: ENEM, BLUEX, OAB Exams, BELEBELE, MMLU

	![Benchmarks Hard](./.plots/npm_hard.png)

	</details>

	## Performance and Compute

	Below, we compare the performance of Tucano2-0.6B-Base with Curio-edu-1b1 (continually pretrained) and Tucano-1b1 (pretrained), which are two baselines in the 1B parameter range. All other plots can be found in the [.plots](https://huggingface.co/Polygl0t/Tucano2-qwen-3.7B-Instruct/tree/main/.plots/) folder.

	Tucano2-0.6B-Base vs Curio-edu-1b1 vs Tucano-1b1

	![Performance Comparison](./.plots/model_comparison.png)

	This plot compares the compute requirements (measured as C = 6 \* N \* D, where N is the number of parameters and D is the number of tokens processed) against the performance of each model (measured by the NPM score).

	![NPM vs Compute](./.plots/performance_vs_compute.png)

	<details>
	<summary><b>Performance and Compute Details</b></summary>

	\| \| Parameters (B) \| Pretraining Tokens (B) \| Continual Pretraining Tokens (B) \| Total Tokens (B) \| Pretraining Compute (FLOPs) \| Continual Pretraining Compute (FLOPs) \| Total Compute (FLOPs) \| NPM Score \|
	\|----------------------------\|----------------\|------------------------\|----------------------------------\|------------------\|-----------------------------\|---------------------------------------\|-----------------------\|-----------\|
	\| Tucano2-qwen-3.7B-Base \| 3.7 \| 36000 \| 50 \| 36050 \| 8.64e+23 \| 1.11e+21 \| 8.65e+23 \| 59.2 \|
	\| Qwen2.5-7B \| 7 \| 18000 \| - \| 18000 \| 7.56e+23 \| - \| 7.56e+23 \| 57.97 \|
	\| Qwen3-4B-Base \| 4 \| 36000 \| - \| 36000 \| 8.64e+23 \| - \| 8.64e+23 \| 57.86 \|
	\| SmolLM3-3B-Base \| 3 \| 11200 \| - \| 11200 \| 2.02e+23 \| - \| 2.02e+23 \| 50.25 \|
	\| Qwen2.5-3B \| 3 \| 18000 \| - \| 18000 \| 3.24e+23 \| - \| 3.24e+23 \| 50.15 \|
	\| Tucano2-qwen-1.5B-Base \| 1.5 \| 36000 \| 100 \| 36100 \| 3.67e+23 \| 9e+20 \| 3.68e+23 \| 47.89 \|
	\| Curio-edu-7b \| 7 \| 2000 \| 20 \| 2020 \| 8.4e+22 \| 8.4e+20 \| 8.48e+22 \| 45.66 \|
	\| Qwen3-1.7B-Base \| 1.7 \| 36000 \| - \| 36000 \| 3.67e+23 \| - \| 3.67e+23 \| 44.48 \|
	\| Curio-7b \| 7 \| 2000 \| 150 \| 2150 \| 8.4e+22 \| 6.3e+21 \| 9.03e+22 \| 42.78 \|
	\| Llama-3.2-3B \| 3 \| 9000 \| - \| 9000 \| 1.62e+23 \| - \| 1.62e+23 \| 40.5 \|
	\| granite-3.3-2b-base \| 2 \| 12000 \| - \| 12000 \| 1.44e+23 \| - \| 1.44e+23 \| 39.96 \|
	\| Tucano2-qwen-0.5B-Base \| 0.5 \| 36000 \| 50 \| 36050 \| 1.3e+23 \| 1.5e+20 \| 1.3e+23 \| 35.35 \|
	\| Qwen3-0.6B-Base \| 0.6 \| 36000 \| - \| 36000 \| 1.3e+23 \| - \| 1.3e+23 \| 29.39 \|
	\| Llama-2-7b-hf \| 7 \| 2000 \| - \| 2000 \| 8.4e+22 \| - \| 8.4e+22 \| 29.36 \|
	\| Tucano2-0.6B-Base \| 0.6 \| 408 \| - \| 408 \| 1.47e+21 \| - \| 1.47e+21 \| 20.63 \|
	\| Qwen2.5-0.5B \| 0.5 \| 18000 \| - \| 18000 \| 5.4e+22 \| - \| 5.4e+22 \| 19.89 \|
	\| Curio-1.1b \| 1.1 \| 1000 \| 150 \| 1150 \| 6.6e+21 \| 9.9e+20 \| 7.59e+21 \| 19.23 \|
	\| Tucano-2b4 \| 2.4 \| 515 \| - \| 515 \| 7.42e+21 \| - \| 7.42e+21 \| 17.87 \|
	\| Curio-edu-1b1 \| 1.1 \| 1000 \| 20 \| 1020 \| 6.6e+21 \| 1.32e+20 \| 6.73e+21 \| 17.72 \|
	\| Llama-3.2-1B \| 1 \| 9000 \| - \| 9000 \| 5.4e+22 \| - \| 5.4e+22 \| 16.57 \|
	\| Tucano-1b1 \| 1.1 \| 250 \| - \| 250 \| 1.65e+21 \| - \| 1.65e+21 \| 15.44 \|
	\| Tucano-630m \| 0.63 \| 211 \| - \| 211 \| 7.98e+20 \| - \| 7.98e+20 \| 14.89 \|
	\| Carvalho_pt-gl-1.3B \| 1.3 \| 26 \| 5 \| 31 \| 2.03e+20 \| 3.9e+19 \| 2.42e+20 \| 12.54 \|
	\| TeenyTinyLlama-460m \| 0.46 \| 6.2 \| - \| 6.2 \| 1.71e+19 \| - \| 1.71e+19 \| 11.18 \|
	\| Tucano-160m \| 0.16 \| 169 \| - \| 169 \| 1.62e+20 \| - \| 1.62e+20 \| 8.78 \|
	\| TeenyTinyLlama-160m \| 0.16 \| 6.2 \| - \| 6.2 \| 5.95e+18 \| - \| 5.95e+18 \| 7.71 \|
	\| GlorIA-1.3B \| 1.3 \| 35 \| - \| 35 \| 2.73e+20 \| - \| 2.73e+20 \| 5.92 \|

	</details>

	## Cite as 🤗

	```latex
	@misc{correa2026tucano2cool,
	title={{Tucano 2 Cool: Better Open Source LLMs for Portuguese}},
	author={Nicholas Kluge Corr{\^e}a and Aniket Sen and Shiza Fatimah and Sophia Falk and Lennard Landgraf and Julia Kastner and Lucie Flek},
	year={2026},
	eprint={2603.03543},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2603.03543},
	}
	```

	## Aknowlegments

	Polyglot is a project funded by the Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MWK) as part of TRA Sustainable Futures (University of Bonn) and the Excellence Strategy of the federal and state governments.

	We also gratefully acknowledge the granted access to the [Marvin cluster](https://www.hpc.uni-bonn.de/en/systems/marvin) hosted by [University of Bonn](https://www.uni-bonn.de/en) along with the support provided by its High Performance Computing & Analytics Lab.

	## License

	Tucano2-0.6B-Base is licensed under the Apache License, Version 2.0. For more details, see the [LICENSE](LICENSE) file.