chakma-bert
This model is a fine-tuned version of bert-base-multilingual-cased on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.8670
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 18
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 3.1287 | 1.0 | 59 | 2.8297 |
| 2.7974 | 2.0 | 118 | 2.5401 |
| 2.5044 | 3.0 | 177 | 2.4909 |
| 2.432 | 4.0 | 236 | 2.2127 |
| 2.3317 | 5.0 | 295 | 2.2474 |
| 2.1859 | 6.0 | 354 | 2.2014 |
| 2.1582 | 7.0 | 413 | 2.1784 |
| 2.0694 | 8.0 | 472 | 2.0472 |
| 2.0662 | 9.0 | 531 | 2.0296 |
| 2.0164 | 10.0 | 590 | 2.0319 |
| 1.9955 | 11.0 | 649 | 1.9045 |
| 1.8953 | 12.0 | 708 | 1.9912 |
| 1.9059 | 13.0 | 767 | 1.9668 |
| 1.8877 | 14.0 | 826 | 1.8459 |
| 1.8675 | 15.0 | 885 | 1.8975 |
| 1.8644 | 16.0 | 944 | 1.8578 |
| 1.8482 | 17.0 | 1003 | 1.9558 |
| 1.8276 | 18.0 | 1062 | 1.8670 |
Framework versions
- Transformers 4.56.1
- Pytorch 2.8.0+cu126
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 1
Model tree for adity12345/chakma-bert
Base model
google-bert/bert-base-multilingual-cased