albert-base-v2-sq / README.md
edisnord's picture
Update README.md
86c1e0d verified
metadata
license: mit
datasets:
  - uonlp/CulturaX
language:
  - sq
pipeline_tag: fill-mask

Create README.md Albanian ALBERT model pretrained on around 16GB of text (I used uonlp/CulturaX's sq configuration) and 1.1 million training steps, using only the masked language modelling task. Trained on a TPU v4-32 pod, made possible through the Google TPU Research Cloud.

Hyperparameters:

  • Optimizer: LAMB
  • LR: 0.0006
  • β1 \beta_1 : 0.9
  • β2 \beta_2 : 0.999
  • ϵ \epsilon : 1e-8
  • Batch size: 1024
  • Num. steps: 1.1 million
  • dtype: bfloat16
  • max. seq. length: 512

Going to post the model's performance evaluated on different Albanian downstream tasks once I'm done evaluating the model.

Classification Tasks

Task Learning Rate Number of epochs Accuracy Precision Recall F1 score
AlbMoRe[1]. 1e-05 10 0.98 0.97 0.99 0.98

Regression Tasks

TODO

References

[1] Çano, E. (2023). Albmore: A corpus of movie reviews for sentiment analysis in albanian. arXiv preprint arXiv:2306.08526.