SNAC-Denoiser-LLaMA-500M-snac_v3_test_1gpu
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 7.4929
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 124
- training_steps: 6248
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 9.2855 | 0.0160 | 100 | 9.2853 |
| 9.1344 | 0.0320 | 200 | 9.1912 |
| 9.0779 | 0.0480 | 300 | 9.1123 |
| 8.8923 | 0.0640 | 400 | 8.8849 |
| 8.6169 | 0.0800 | 500 | 8.6110 |
| 8.4296 | 0.0960 | 600 | 8.4493 |
| 8.3297 | 0.1120 | 700 | 8.3578 |
| 8.2512 | 0.1280 | 800 | 8.2896 |
| 8.1568 | 0.1440 | 900 | 8.2190 |
| 8.0756 | 0.1600 | 1000 | 8.1627 |
| 8.0368 | 0.1760 | 1100 | 8.1142 |
| 7.9843 | 0.1920 | 1200 | 8.0759 |
| 7.9684 | 0.2080 | 1300 | 8.0442 |
| 7.919 | 0.2240 | 1400 | 8.0067 |
| 7.897 | 0.2400 | 1500 | 7.9785 |
| 7.8377 | 0.2560 | 1600 | 7.9468 |
| 7.8362 | 0.2720 | 1700 | 7.9263 |
| 7.7771 | 0.2881 | 1800 | 7.8941 |
| 7.7576 | 0.3041 | 1900 | 7.8713 |
| 7.7211 | 0.3201 | 2000 | 7.8481 |
| 7.7182 | 0.3361 | 2100 | 7.8241 |
| 7.7132 | 0.3521 | 2200 | 7.8073 |
| 7.675 | 0.3681 | 2300 | 7.7893 |
| 7.6257 | 0.3841 | 2400 | 7.7664 |
| 7.6289 | 0.4001 | 2500 | 7.7529 |
| 7.6152 | 0.4161 | 2600 | 7.7354 |
| 7.5542 | 0.4321 | 2700 | 7.7168 |
| 7.551 | 0.4481 | 2800 | 7.7024 |
| 7.5289 | 0.4641 | 2900 | 7.6855 |
| 7.5265 | 0.4801 | 3000 | 7.6714 |
| 7.4856 | 0.4961 | 3100 | 7.6539 |
| 7.4539 | 0.5121 | 3200 | 7.6411 |
| 7.462 | 0.5281 | 3300 | 7.6277 |
| 7.4749 | 0.5441 | 3400 | 7.6173 |
| 7.4562 | 0.5601 | 3500 | 7.6065 |
| 7.4682 | 0.5761 | 3600 | 7.5937 |
| 7.4372 | 0.5921 | 3700 | 7.5834 |
| 7.389 | 0.6081 | 3800 | 7.5721 |
| 7.3654 | 0.6241 | 3900 | 7.5634 |
| 7.3942 | 0.6401 | 4000 | 7.5573 |
| 7.4089 | 0.6561 | 4100 | 7.5477 |
| 7.3928 | 0.6721 | 4200 | 7.5431 |
| 7.3939 | 0.6881 | 4300 | 7.5341 |
| 7.3677 | 0.7041 | 4400 | 7.5271 |
| 7.3579 | 0.7201 | 4500 | 7.5234 |
| 7.3494 | 0.7361 | 4600 | 7.5187 |
| 7.3404 | 0.7521 | 4700 | 7.5138 |
| 7.3378 | 0.7681 | 4800 | 7.5102 |
| 7.3622 | 0.7841 | 4900 | 7.5077 |
| 7.3294 | 0.8001 | 5000 | 7.5056 |
| 7.3326 | 0.8161 | 5100 | 7.5024 |
| 7.3444 | 0.8321 | 5200 | 7.4992 |
| 7.3385 | 0.8482 | 5300 | 7.4995 |
| 7.3636 | 0.8642 | 5400 | 7.4961 |
| 7.3138 | 0.8802 | 5500 | 7.4957 |
| 7.3213 | 0.8962 | 5600 | 7.4956 |
| 7.3541 | 0.9122 | 5700 | 7.4941 |
| 7.2924 | 0.9282 | 5800 | 7.4938 |
| 7.3449 | 0.9442 | 5900 | 7.4931 |
| 7.347 | 0.9602 | 6000 | 7.4931 |
| 7.2718 | 0.9762 | 6100 | 7.4930 |
| 7.3641 | 0.9922 | 6200 | 7.4929 |
Framework versions
- Transformers 4.56.1
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 9