| | --- |
| | base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
| | library_name: peft |
| | --- |
| | |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | gretelai/synthetic_text_to_sql |
| | https://huggingface.co/datasets/gretelai/synthetic_text_to_sql |
| | gretelai/synthetic_text_to_sql is a rich dataset of high quality synthetic Text-to-SQL samples. The dataset includes 105,851 records partitioned into 100,000 train and 5,851 test records. But i used only 50k records for my training. |
| | ### Training Result |
| | |
| | |
| | Step Training Loss |
| | |
| | 10 1.296000 |
| | |
| | 20 1.331600 |
| | |
| | 30 1.279400 |
| | |
| | 40 1.312900 |
| | |
| | 50 1.274100 |
| | |
| | 60 1.271700 |
| | |
| | 70 1.209100 |
| | |
| | 80 1.192600 |
| | |
| | 90 1.176700 |
| | |
| | 100 1.118300 |
| | |
| | 110 1.086800 |
| | |
| | 120 1.048000 |
| | |
| | 130 1.019500 |
| | |
| | 140 1.001400 |
| | |
| | 150 0.994300 |
| | |
| | 160 0.934900 |
| | |
| | 170 0.904500 |
| | |
| | 180 0.879900 |
| | |
| | 190 0.850400 |
| | |
| | 200 0.828000 |
| | |
| | 210 0.811400 |
| | |
| | 220 0.846000 |
| | |
| | 230 0.791100 |
| | |
| | 240 0.766900 |
| | |
| | 250 0.782000 |
| | |
| | 260 0.718300 |
| | |
| | 270 0.701800 |
| | |
| | 280 0.720000 |
| | |
| | 290 0.693600 |
| | |
| | 300 0.676500 |
| | |
| | 310 0.679900 |
| | |
| | 320 0.673200 |
| | |
| | 330 0.669500 |
| | |
| | 340 0.692800 |
| | |
| | 350 0.662200 |
| | |
| | 360 0.761200 |
| | |
| | 370 0.659600 |
| | |
| | 380 0.683700 |
| | |
| | 390 0.681200 |
| | |
| | 400 0.674000 |
| | |
| | 410 0.651800 |
| | |
| | 420 0.641800 |
| | |
| | 430 0.646500 |
| | |
| | 440 0.664200 |
| | |
| | 450 0.633600 |
| | |
| | 460 0.646900 |
| | |
| | 470 0.643400 |
| | |
| | 480 0.658800 |
| | |
| | 490 0.631500 |
| | |
| | 500 0.678200 |
| | |
| | 510 0.633400 |
| | |
| | 520 0.623300 |
| | |
| | 530 0.655700 |
| | |
| | 540 0.631500 |
| | |
| | 550 0.617700 |
| | |
| | 560 0.644000 |
| | |
| | 570 0.650200 |
| | |
| | 580 0.618500 |
| | |
| | 590 0.615400 |
| | |
| | 600 0.614000 |
| | |
| | 610 0.612800 |
| | |
| | 620 0.616900 |
| | |
| | 630 0.640200 |
| | |
| | 640 0.613000 |
| | |
| | 650 0.611400 |
| | |
| | 660 0.617000 |
| | |
| | 670 0.629800 |
| | |
| | 680 0.648800 |
| | |
| | 690 0.608800 |
| | |
| | 700 0.603200 |
| | |
| | 710 0.628200 |
| | |
| | 720 0.629700 |
| | |
| | 730 0.604400 |
| | |
| | 740 0.610700 |
| | |
| | 750 0.621300 |
| | |
| | 760 0.617900 |
| | |
| | 770 0.596500 |
| | |
| | 780 0.612800 |
| | |
| | 790 0.611700 |
| | |
| | 800 0.618600 |
| | |
| | 810 0.590900 |
| | |
| | 820 0.590300 |
| | |
| | 830 0.592900 |
| | |
| | 840 0.611700 |
| | |
| | 850 0.628300 |
| | |
| | 860 0.590100 |
| | |
| | 870 0.584800 |
| | |
| | 880 0.591200 |
| | |
| | 890 0.585900 |
| | |
| | 900 0.607000 |
| | |
| | 910 0.578800 |
| | |
| | 920 0.576600 |
| | |
| | 930 0.597600 |
| | |
| | 940 0.602100 |
| | |
| | 950 0.579000 |
| | |
| | 960 0.597900 |
| | |
| | 970 0.590600 |
| | |
| | 980 0.606100 |
| | |
| | 990 0.577600 |
| | |
| | 1000 0.584000 |
| | |
| | 1010 0.569300 |
| | |
| | 1020 0.594000 |
| | |
| | 1030 0.596100 |
| | |
| | 1040 0.590600 |
| | |
| | 1050 0.570300 |
| | |
| | 1060 0.572800 |
| | |
| | 1070 0.572200 |
| | |
| | 1080 0.569900 |
| | |
| | 1090 0.587200 |
| | |
| | 1100 0.572200 |
| | |
| | 1110 0.569700 |
| | |
| | 1120 0.612500 |
| | |
| | 1130 0.587800 |
| | |
| | 1140 0.568100 |
| | |
| | 1150 0.573100 |
| | |
| | 1160 0.568300 |
| | |
| | 1170 0.620800 |
| | |
| | 1180 0.570600 |
| | |
| | 1190 0.561500 |
| | |
| | 1200 0.560200 |
| | |
| | 1210 0.592400 |
| | |
| | 1220 0.580500 |
| | |
| | 1230 0.578300 |
| | |
| | 1240 0.573400 |
| | |
| | 1250 0.568800 |
| | |
| | 1260 0.600500 |
| | |
| | 1270 0.578800 |
| | |
| | 1280 0.561300 |
| | |
| | 1290 0.570900 |
| | |
| | 1300 0.567700 |
| | |
| | 1310 0.589800 |
| | |
| | 1320 0.598200 |
| | |
| | 1330 0.564900 |
| | |
| | 1340 0.577500 |
| | |
| | 1350 0.565700 |
| | |
| | 1360 0.581400 |
| | |
| | 1370 0.562000 |
| | |
| | 1380 0.588200 |
| | |
| | 1390 0.603800 |
| | |
| | 1400 0.560300 |
| | |
| | 1410 0.559600 |
| | |
| | 1420 0.567000 |
| | |
| | 1430 0.562700 |
| | |
| | 1440 0.564200 |
| | |
| | 1450 0.563700 |
| | |
| | 1460 0.561100 |
| | |
| | 1470 0.561100 |
| | |
| | 1480 0.561600 |
| | |
| | 1490 0.564800 |
| | |
| | 1500 0.579100 |
| | |
| | 1510 0.564100 |
| | |
| | 1520 0.562900 |
| | |
| | 1530 0.569800 |
| | |
| | 1540 0.566200 |
| | |
| | 1550 0.599100 |
| | |
| | 1560 0.562000 |
| | |
| | 1570 0.580600 |
| | |
| | 1580 0.564900 |
| | |
| | 1590 0.571900 |
| | |
| | 1600 0.580000 |
| | |
| | 1610 0.559200 |
| | |
| | 1620 0.566900 |
| | |
| | 1630 0.556100 |
| | |
| | |
| |  |
| | |
| | #### Training Hyperparameters |
| | |
| | The following hyperparameters were used during training: |
| | num_train_epochs=3, |
| | per_device_train_batch_size=2, |
| | gradient_accumulation_steps=4, |
| | optim="adamw_torch_fused", |
| | learning_rate=2e-4, |
| | max_grad_norm=0.3, |
| | weight_decay=0.01, |
| | lr_scheduler_type="cosine", |
| | warmup_steps=50, |
| | bf16=True, |
| | tf32=True, |
| | ) |
| | |
| |
|