You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

SNAC-Denoiser-LLaMA-500M-snac_v2_test_1gpu

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 32
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 62
training_steps: 3124

Training Loss	Epoch	Step	Validation Loss
9.3455	0.0160	50	9.3244
9.2001	0.0320	100	9.1875
9.1377	0.0480	150	9.1356
9.1092	0.0640	200	9.0910
9.0062	0.0800	250	8.9681
8.8554	0.0960	300	8.8329
8.718	0.1120	350	8.6753
8.5591	0.1280	400	8.5258
8.4367	0.1440	450	8.4062
8.349	0.1600	500	8.3305
8.2861	0.1760	550	8.2693
8.2332	0.1920	600	8.2187
8.2062	0.2080	650	8.1720
8.1408	0.2240	700	8.1226
8.1249	0.2400	750	8.0816
8.0804	0.2560	800	8.0395
8.0492	0.2720	850	8.0060
8.0164	0.2881	900	7.9712
7.9843	0.3041	950	7.9449
7.9496	0.3201	1000	7.9181
7.9486	0.3361	1050	7.8922
7.9317	0.3521	1100	7.8683
7.913	0.3681	1150	7.8490
7.8754	0.3841	1200	7.8260
7.8618	0.4001	1250	7.8044
7.8301	0.4161	1300	7.7877
7.7919	0.4321	1350	7.7684
7.7967	0.4481	1400	7.7496
7.7759	0.4641	1450	7.7350
7.7685	0.4801	1500	7.7192
7.7523	0.4961	1550	7.7041
7.7205	0.5121	1600	7.6902
7.7153	0.5281	1650	7.6767
7.7194	0.5441	1700	7.6648
7.702	0.5601	1750	7.6552
7.7038	0.5761	1800	7.6431
7.694	0.5921	1850	7.6317
7.6717	0.6081	1900	7.6254
7.6509	0.6241	1950	7.6156
7.6552	0.6401	2000	7.6098
7.669	0.6561	2050	7.6022
7.663	0.6721	2100	7.5952
7.6476	0.6881	2150	7.5876
7.6415	0.7041	2200	7.5823
7.6386	0.7201	2250	7.5776
7.6233	0.7361	2300	7.5731
7.6342	0.7521	2350	7.5678
7.6028	0.7681	2400	7.5634
7.6125	0.7841	2450	7.5607
7.6175	0.8001	2500	7.5571
7.6081	0.8161	2550	7.5561
7.6117	0.8321	2600	7.5529
7.5922	0.8482	2650	7.5512
7.6261	0.8642	2700	7.5498
7.5985	0.8802	2750	7.5485
7.6093	0.8962	2800	7.5474
7.6015	0.9122	2850	7.5466
7.5797	0.9282	2900	7.5460
7.621	0.9442	2950	7.5456
7.6041	0.9602	3000	7.5452
7.5733	0.9762	3050	7.5451
7.613	0.9922	3100	7.5450

Safetensors

Model size

0.5B params

Tensor type

F32