SNAC-Denoiser-LLaMA-500M-snac_v2_test_1gpu
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 7.5450
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 62
- training_steps: 3124
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 9.3455 | 0.0160 | 50 | 9.3244 |
| 9.2001 | 0.0320 | 100 | 9.1875 |
| 9.1377 | 0.0480 | 150 | 9.1356 |
| 9.1092 | 0.0640 | 200 | 9.0910 |
| 9.0062 | 0.0800 | 250 | 8.9681 |
| 8.8554 | 0.0960 | 300 | 8.8329 |
| 8.718 | 0.1120 | 350 | 8.6753 |
| 8.5591 | 0.1280 | 400 | 8.5258 |
| 8.4367 | 0.1440 | 450 | 8.4062 |
| 8.349 | 0.1600 | 500 | 8.3305 |
| 8.2861 | 0.1760 | 550 | 8.2693 |
| 8.2332 | 0.1920 | 600 | 8.2187 |
| 8.2062 | 0.2080 | 650 | 8.1720 |
| 8.1408 | 0.2240 | 700 | 8.1226 |
| 8.1249 | 0.2400 | 750 | 8.0816 |
| 8.0804 | 0.2560 | 800 | 8.0395 |
| 8.0492 | 0.2720 | 850 | 8.0060 |
| 8.0164 | 0.2881 | 900 | 7.9712 |
| 7.9843 | 0.3041 | 950 | 7.9449 |
| 7.9496 | 0.3201 | 1000 | 7.9181 |
| 7.9486 | 0.3361 | 1050 | 7.8922 |
| 7.9317 | 0.3521 | 1100 | 7.8683 |
| 7.913 | 0.3681 | 1150 | 7.8490 |
| 7.8754 | 0.3841 | 1200 | 7.8260 |
| 7.8618 | 0.4001 | 1250 | 7.8044 |
| 7.8301 | 0.4161 | 1300 | 7.7877 |
| 7.7919 | 0.4321 | 1350 | 7.7684 |
| 7.7967 | 0.4481 | 1400 | 7.7496 |
| 7.7759 | 0.4641 | 1450 | 7.7350 |
| 7.7685 | 0.4801 | 1500 | 7.7192 |
| 7.7523 | 0.4961 | 1550 | 7.7041 |
| 7.7205 | 0.5121 | 1600 | 7.6902 |
| 7.7153 | 0.5281 | 1650 | 7.6767 |
| 7.7194 | 0.5441 | 1700 | 7.6648 |
| 7.702 | 0.5601 | 1750 | 7.6552 |
| 7.7038 | 0.5761 | 1800 | 7.6431 |
| 7.694 | 0.5921 | 1850 | 7.6317 |
| 7.6717 | 0.6081 | 1900 | 7.6254 |
| 7.6509 | 0.6241 | 1950 | 7.6156 |
| 7.6552 | 0.6401 | 2000 | 7.6098 |
| 7.669 | 0.6561 | 2050 | 7.6022 |
| 7.663 | 0.6721 | 2100 | 7.5952 |
| 7.6476 | 0.6881 | 2150 | 7.5876 |
| 7.6415 | 0.7041 | 2200 | 7.5823 |
| 7.6386 | 0.7201 | 2250 | 7.5776 |
| 7.6233 | 0.7361 | 2300 | 7.5731 |
| 7.6342 | 0.7521 | 2350 | 7.5678 |
| 7.6028 | 0.7681 | 2400 | 7.5634 |
| 7.6125 | 0.7841 | 2450 | 7.5607 |
| 7.6175 | 0.8001 | 2500 | 7.5571 |
| 7.6081 | 0.8161 | 2550 | 7.5561 |
| 7.6117 | 0.8321 | 2600 | 7.5529 |
| 7.5922 | 0.8482 | 2650 | 7.5512 |
| 7.6261 | 0.8642 | 2700 | 7.5498 |
| 7.5985 | 0.8802 | 2750 | 7.5485 |
| 7.6093 | 0.8962 | 2800 | 7.5474 |
| 7.6015 | 0.9122 | 2850 | 7.5466 |
| 7.5797 | 0.9282 | 2900 | 7.5460 |
| 7.621 | 0.9442 | 2950 | 7.5456 |
| 7.6041 | 0.9602 | 3000 | 7.5452 |
| 7.5733 | 0.9762 | 3050 | 7.5451 |
| 7.613 | 0.9922 | 3100 | 7.5450 |
Framework versions
- Transformers 4.56.1
- Pytorch 2.8.0+cu128
- Datasets 4.0.0
- Tokenizers 0.22.0
- Downloads last month
- 11