You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SNAC-Denoiser-LLaMA-500M-snac_v3_test_1gpu

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 7.4929

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 124
  • training_steps: 6248

Training results

Training Loss Epoch Step Validation Loss
9.2855 0.0160 100 9.2853
9.1344 0.0320 200 9.1912
9.0779 0.0480 300 9.1123
8.8923 0.0640 400 8.8849
8.6169 0.0800 500 8.6110
8.4296 0.0960 600 8.4493
8.3297 0.1120 700 8.3578
8.2512 0.1280 800 8.2896
8.1568 0.1440 900 8.2190
8.0756 0.1600 1000 8.1627
8.0368 0.1760 1100 8.1142
7.9843 0.1920 1200 8.0759
7.9684 0.2080 1300 8.0442
7.919 0.2240 1400 8.0067
7.897 0.2400 1500 7.9785
7.8377 0.2560 1600 7.9468
7.8362 0.2720 1700 7.9263
7.7771 0.2881 1800 7.8941
7.7576 0.3041 1900 7.8713
7.7211 0.3201 2000 7.8481
7.7182 0.3361 2100 7.8241
7.7132 0.3521 2200 7.8073
7.675 0.3681 2300 7.7893
7.6257 0.3841 2400 7.7664
7.6289 0.4001 2500 7.7529
7.6152 0.4161 2600 7.7354
7.5542 0.4321 2700 7.7168
7.551 0.4481 2800 7.7024
7.5289 0.4641 2900 7.6855
7.5265 0.4801 3000 7.6714
7.4856 0.4961 3100 7.6539
7.4539 0.5121 3200 7.6411
7.462 0.5281 3300 7.6277
7.4749 0.5441 3400 7.6173
7.4562 0.5601 3500 7.6065
7.4682 0.5761 3600 7.5937
7.4372 0.5921 3700 7.5834
7.389 0.6081 3800 7.5721
7.3654 0.6241 3900 7.5634
7.3942 0.6401 4000 7.5573
7.4089 0.6561 4100 7.5477
7.3928 0.6721 4200 7.5431
7.3939 0.6881 4300 7.5341
7.3677 0.7041 4400 7.5271
7.3579 0.7201 4500 7.5234
7.3494 0.7361 4600 7.5187
7.3404 0.7521 4700 7.5138
7.3378 0.7681 4800 7.5102
7.3622 0.7841 4900 7.5077
7.3294 0.8001 5000 7.5056
7.3326 0.8161 5100 7.5024
7.3444 0.8321 5200 7.4992
7.3385 0.8482 5300 7.4995
7.3636 0.8642 5400 7.4961
7.3138 0.8802 5500 7.4957
7.3213 0.8962 5600 7.4956
7.3541 0.9122 5700 7.4941
7.2924 0.9282 5800 7.4938
7.3449 0.9442 5900 7.4931
7.347 0.9602 6000 7.4931
7.2718 0.9762 6100 7.4930
7.3641 0.9922 6200 7.4929

Framework versions

  • Transformers 4.56.1
  • Pytorch 2.8.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.0
Downloads last month
9
Safetensors
Model size
0.5B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results