You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

dense_hom_100m

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

Loss: 4.5102

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 66788
training_steps: 667880
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
8.4525	0.1497	10000	8.4304
7.3336	0.2995	20000	7.3022
6.4132	0.4492	30000	6.3862
5.8988	0.5989	40000	5.8711
5.6228	0.7486	50000	5.6064
5.4744	0.8984	60000	5.4458
5.2808	1.0481	70000	5.3160
5.1593	1.1978	80000	5.1888
5.1095	1.3475	90000	5.0874
5.0067	1.4973	100000	5.0072
4.9448	1.6470	110000	4.9405
4.8901	1.7967	120000	4.8872
4.8371	1.9464	130000	4.8377
4.6843	2.0962	140000	4.8066
4.6858	2.2459	150000	4.7772
4.654	2.3956	160000	4.7471
4.6345	2.5453	170000	4.7199
4.6339	2.6951	180000	4.6928
4.6157	2.8448	190000	4.6695
4.5953	2.9945	200000	4.6452
4.433	3.1442	210000	4.6433
4.4471	3.2940	220000	4.6301
4.4507	3.4437	230000	4.6134
4.462	3.5934	240000	4.5953
4.4476	3.7431	250000	4.5798
4.4127	3.8929	260000	4.5641
4.221	4.0426	270000	4.5716
4.264	4.1923	280000	4.5673
4.2815	4.3420	290000	4.5543
4.2952	4.4918	300000	4.5408
4.3095	4.6415	310000	4.5279
4.3148	4.7912	320000	4.5176
4.3125	4.9409	330000	4.5053
4.09	5.0907	340000	4.5283
4.1335	5.2405	350000	4.5244
4.1502	5.3902	360000	4.5136
4.1655	5.5399	370000	4.5057
4.1605	5.6896	380000	4.4929
4.177	5.8394	390000	4.4838
4.1474	5.9891	400000	4.4757
3.9881	6.1388	410000	4.5119
4.0034	6.2886	420000	4.5069
4.0274	6.4383	430000	4.4966
4.0535	6.5880	440000	4.4878
4.0514	6.7377	450000	4.4785
4.0476	6.8875	460000	4.4674
3.8266	7.0372	470000	4.5037
3.8644	7.1869	480000	4.5106
3.9039	7.3366	490000	4.5029
3.9142	7.4864	500000	4.4955
3.9112	7.6361	510000	4.4856
3.9333	7.7858	520000	4.4762
3.9188	7.9355	530000	4.4689
3.7217	8.0853	540000	4.5152
3.7674	8.2350	550000	4.5160
3.7844	8.3847	560000	4.5106
3.7862	8.5345	570000	4.5055
3.7891	8.6842	580000	4.4996
3.7912	8.8339	590000	4.4929
3.7521	8.9836	600000	4.4885
3.6301	9.1334	610000	4.5250
3.6341	9.2831	620000	4.5243
3.6515	9.4328	630000	4.5208
3.6546	9.5826	640000	4.5171
3.6662	9.7323	650000	4.5132
3.6615	9.8820	660000	4.5115

Framework versions

Transformers 4.51.0
Pytorch 2.7.0+cu126
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Safetensors

Model size

0.2B params

Tensor type

F32