metadata
language:
- pt
license: cc
model-index:
- name: CabraLlama3-8b
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: ENEM Challenge (No Images)
type: eduagarcia/enem_challenge
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 74.67
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BLUEX (No Images)
type: eduagarcia-temp/BLUEX_without_images
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 56.88
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: OAB Exams
type: eduagarcia/oab_exams
split: train
args:
num_few_shot: 3
metrics:
- type: acc
value: 49.29
name: accuracy
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Assin2 RTE
type: assin2
split: test
args:
num_few_shot: 15
metrics:
- type: f1_macro
value: 90.44
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Assin2 STS
type: eduagarcia/portuguese_benchmark
split: test
args:
num_few_shot: 15
metrics:
- type: pearson
value: 69.85
name: pearson
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: FaQuAD NLI
type: ruanchaves/faquad-nli
split: test
args:
num_few_shot: 15
metrics:
- type: f1_macro
value: 70.38
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HateBR Binary
type: ruanchaves/hatebr
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 85.05
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: PT Hate Speech Binary
type: hate_speech_portuguese
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 60.1
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: tweetSentBR
type: eduagarcia/tweetsentbr_fewshot
split: test
args:
num_few_shot: 25
metrics:
- type: f1_macro
value: 68.08
name: f1-macro
source:
url: >-
https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=botbot-ai/CabraLlama3-8b
name: Open Portuguese LLM Leaderboard
Llama 3 8b Instruct finetuned with Cabra 30k.
Evals
| Tasks | Metric | LLAMA3 Base Instruct Value | Stderr | Finetune Cabra 1 Value | Stderr |
|---|---|---|---|---|---|
| assin2_rte | f1_macro | 0.9091 | 0.0041 | 0.9036 | 0.0042 |
| acc | 0.9093 | 0.0041 | 0.9036 | 0.0042 | |
| assin2_sts | pearson | 0.7517 | 0.0074 | 0.6989 | 0.0082 |
| mse | 0.5985 | N/A | 0.6958 | N/A | |
| bluex | acc | 0.5786 | 0.0106 | 0.5786 | 0.0106 |
| ... | ... | ... | ... | ... | |
| enem | acc | 0.7012 | 0.0070 | 0.7439 | 0.0067 |
| faquad_nli | f1_macro | 0.7516 | 0.0132 | 0.6988 | 0.0139 |
| acc | 0.7938 | 0.0112 | 0.7508 | 0.0120 | |
| hatebr_offensive_binary | f1_macro | 0.8699 | 0.0064 | 0.8528 | 0.0067 |
| acc | 0.8700 | 0.0064 | 0.8536 | 0.0067 | |
| oab_exams | acc | 0.5062 | 0.0062 | 0.4911 | 0.0062 |
| portuguese_hate_speech_binary | f1_macro | 0.5982 | 0.0120 | 0.5954 | 0.0120 |
| acc | 0.5993 | 0.0119 | 0.5993 | 0.0119 |
Open Portuguese LLM Leaderboard Evaluation Results
Detailed results can be found here and on the 🚀 Open Portuguese LLM Leaderboard
| Metric | Value |
|---|---|
| Average | 69.42 |
| ENEM Challenge (No Images) | 74.67 |
| BLUEX (No Images) | 56.88 |
| OAB Exams | 49.29 |
| Assin2 RTE | 90.44 |
| Assin2 STS | 69.85 |
| FaQuAD NLI | 70.38 |
| HateBR Binary | 85.05 |
| PT Hate Speech Binary | 60.10 |
| tweetSentBR | 68.08 |