Raphael Scheible-Schmitt commited on
Commit
94f6cf2
·
verified ·
1 Parent(s): d5b58cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -3
README.md CHANGED
@@ -1,3 +1,84 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - pt
5
+ tags:
6
+ - roberta
7
+ - masked-language-modeling
8
+ - portuguese
9
+ - portbert
10
+ - portbert-base
11
+ - downstream-evaluation
12
+ - extraGLUE
13
+ datasets:
14
+ - extraGLUE
15
+ - uonlp/CulturaX
16
+ pipeline_tag: fill-mask
17
+ ---
18
+
19
+ # PortBERT: Navigating the Depths of Portuguese Language Models
20
+
21
+ **PortBERT** is a family of RoBERTa-based language models pre-trained from scratch on the Portuguese portion of OSCAR23 and MC4 (deduplicated variants of CulturaX). The models are designed to offer strong downstream performance in Portuguese NLP tasks, while providing insights into the cost-performance tradeoffs of training across hardware backends.
22
+
23
+ We release two variants:
24
+
25
+ - `PortBERT-base`: 126M parameters, trained on 8× A40 GPUs (fp32)
26
+ - `PortBERT-large`: 357M parameters, trained on TPUv4-128 pod (fp32)
27
+
28
+ ---
29
+
30
+ ## Model Details
31
+
32
+ | Detail | PortBERT-base | PortBERT-large |
33
+ |-------------------|---------------------------------------------|----------------|
34
+ | Architecture | RoBERTa-base | RoBERTa-large |
35
+ | Parameters | ~126M | ~357M |
36
+ | Tokenizer | GPT-2 style (52k vocab) | Same |
37
+ | Pretraining corpus | deduplicated mC4 and OSCAR 23 from CulturaX | Same |
38
+ | Objective | Masked Language Modeling | Same |
39
+ | Training time | ~27 days on 8× A40 | ~6.2 days on TPUv4-128 pod |
40
+ | Precision | fp32 | fp32 |
41
+ | Framework | fairseq | fairseq |
42
+
43
+ ---
44
+
45
+ ## Downstream Evaluation (ExtraGLUE)
46
+
47
+ We evaluate PortBERT on **ExtraGLUE**, a Portuguese adaptation of the GLUE benchmark. Fine-tuning was conducted using HuggingFace Transformers, with NNI-based grid search over batch size and learning rate (28 configurations per task). Each task was fine-tuned for up to 10 epochs. Metrics were computed on validation sets due to the lack of held-out test sets.
48
+
49
+ **AVG score** averages the following metrics:
50
+ - STSB Spearman
51
+ - STSB Pearson
52
+ - RTE Accuracy
53
+ - WNLI Accuracy
54
+ - MRPC Accuracy
55
+ - MRPC F1
56
+
57
+ ### 🧪 Evaluation Results
58
+
59
+ **Legend**: *Bold = best*, *Underline = second-best* per model size.
60
+
61
+ | Model | STSB_Sp | STSB_Pe | STSB_Mean | RTE_Acc | WNLI_Acc | MRPC_Acc | MRPC_F1 | AVG |
62
+ |------------------------|----------|----------|------------|----------|----------|----------|----------|-----------|
63
+ | **Large models** | | | | | | | | |
64
+ | XLM-RoBERTa_large | **90.00**| **90.27**| **90.14** | **82.31**| 57.75 | *90.44* | *93.31* | **84.01** |
65
+ | EuroBERT-610m | 88.46 | 88.59 | 88.52 | *78.34* | *59.15* | **91.91**| **94.20**| *83.44* |
66
+ | PortBERT_large | 88.53 | 88.68 | 88.60 | 72.56 | **61.97**| 89.46 | 92.39 | 82.26 |
67
+ | BERTimbau_large | *89.40* | *89.61* | *89.50* | 75.45 | *59.15* | 88.24 | 91.55 | 82.23 |
68
+ | **Base models** | | | | | | | | |
69
+ | RoBERTaLexPT_base | 86.68 | 86.86 | 86.77 | 69.31 | *59.15* | **89.46**| **92.34**| **80.63** |
70
+ | PortBERT_base | *87.39* | *87.65* | *87.52* | 68.95 | **60.56**| 87.75 | *91.13* | *80.57* |
71
+ | RoBERTaCrawlPT_base | 87.34 | 87.45 | 87.39 | **72.56**| 56.34 | *87.99* | 91.20 | 80.48 |
72
+ | BERTimbau_base | **88.39**| **88.60**| **88.50** | *70.40* | 56.34 | 87.25 | 90.97 | 80.32 |
73
+ | XLM-RoBERTa_base | 85.75 | 86.09 | 85.92 | 68.23 | **60.56**| 87.75 | 91.32 | 79.95 |
74
+ | EuroBERT-210m | 86.54 | 86.62 | 86.58 | 65.70 | 57.75 | 87.25 | 91.00 | 79.14 |
75
+ | AlBERTina 100M PTPT | 86.52 | 86.51 | 86.52 | 70.04 | 56.34 | 85.05 | 89.57 | 79.01 |
76
+ | AlBERTina 100M PTBR | 85.97 | 85.99 | 85.98 | 68.59 | 56.34 | 85.78 | 89.82 | 78.75 |
77
+ | AiBERTa | 83.56 | 83.73 | 83.65 | 64.98 | 56.34 | 82.11 | 86.99 | 76.29 |
78
+ | roBERTa PT | 48.06 | 48.51 | 48.29 | 56.68 | *59.15* | 72.06 | 81.79 | 61.04 |
79
+
80
+ ---
81
+
82
+ ## 📜 License
83
+
84
+ MIT License