Fine-tuned TinyLlama-1.1B for the HolyC Task
- Model Description
This document provides a detailed analysis of a fine-tuned TinyLlama-1.1B model, which has been specifically adapted for a task designated as "HolyC". The following sections detail the model's training procedure, offer a quantitative analysis of its performance metrics, and present the key outcomes derived from the fine-tuning process.
The base model, TinyLlama-1.1B, is a compact yet powerful language model known for its computational efficiency. By fine-tuning this foundation, we have engineered a specialized tool tailored to the unique demands of the HolyC task. The methodical training process, outlined below, was instrumental in shaping the model's final performance characteristics and will be explored in detail.
- Training Procedure
A well-defined training procedure is of strategic importance for achieving reproducible and effective model performance. This section outlines the key parameters and dynamics of the fine-tuning process, which was designed to systematically enhance the model's capabilities on the target task.
The training run was configured for a total of 1915 steps. The available logs cover the process up to step 560, representing the completion of approximately 1.46 epochs. This data provides a clear view into the model's learning trajectory from initialization through a significant portion of the training cycle.
A key component of this strategy was the learning rate schedule, which employed an initial warm-up phase followed by a cosine decay. This design is intentional: the warm-up, which increased the learning rate to a peak of approximately 0.0002 around step 60, allows the model to take aggressive optimization steps early on, helping it move quickly out of poor local minima. The subsequent gradual decay enables more stable, fine-grained weight adjustments as the model approaches an optimal solution, promoting stable convergence. This approach balances aggressive initial exploration with later-stage refinement, a balance whose effectiveness is reflected in the stability metrics discussed later.
The described procedure resulted in a consistent and measurable improvement in model performance, which is quantified in the following section.
- Training Performance & Metrics
This section provides a quantitative analysis of the model's performance during the fine-tuning process. Tracking these metrics is crucial for validating the effectiveness of the training methodology and for developing a clear understanding of the model's final capabilities.
3.1. Key Performance Summary
The following table summarizes the most critical performance indicators, capturing the model's state at the beginning and end of the logged training run, as well as the best performance achieved.
Metric Value Initial Loss 1.4824 Final Loss 0.6263 Best Loss 0.4445 Initial Accuracy 0.6967 Final Accuracy 0.8483 Best Accuracy 0.8850
3.2. Loss and Accuracy Analysis
The inverse relationship between training loss and mean token accuracy serves as a primary indicator of successful learning. During this run, the model exhibited a strong and consistent learning curve, though the rate of improvement evolved over time.
As expected, training loss consistently decreased from 1.4824 to a final value of 0.6263, while mean token accuracy rose from 0.6967 to 0.8483. A deeper look at the trends reveals that the most significant gains occurred early in the run, particularly before epoch 0.6, where the loss curve is at its steepest descent. This indicates the model learned the primary patterns in the data very quickly. The subsequent, more marginal improvements suggest the model entered a refinement phase for the remainder of the logged steps. It is a powerful sign of a well-behaved training process that the best loss (0.4445) and peak accuracy (0.8850) were achieved at the exact same point: step 530.
3.3. Model Confidence and Stability
Beyond primary metrics, entropy and gradient norm provide deeper insights into the training dynamics and model's internal state.
Entropy, a measure of the model's predictive uncertainty, showed a consistent downward trend. This reduction in entropy is a key indicator that the model is successfully generalizing from the training data, becoming progressively more confident in its predictions. This growing certainty directly correlates with correctness; as the model's entropy decreased, its mean token accuracy increased in lockstep, demonstrating that its confidence was well-founded.
The gradient norm, a proxy for training stability, exhibited a fluctuating and spiky pattern, which is typical. Notably, the largest spikes occurred around epoch 0.1, coinciding with a volatile period in the loss and accuracy curves as the model began its initial, aggressive learning phase. The model's ability to recover from these spikes without crashing demonstrates the robustness of the training configuration. The fact that the gradient norm's magnitude remained within a stable range prevented training collapse and allowed for the consistent convergence seen in the loss and accuracy metrics.
The high-level metrics analyzed above are derived from the detailed, step-by-step training data presented in the next section.
- Detailed Training Logs
This section provides the granular, step-by-step data from the training process for full transparency and deeper analysis. The table below contains a comprehensive log of key metrics recorded at 10-step intervals during the fine-tuning run.
percentage step total_steps loss grad_norm learning_rate entropy num_tokens mean_token_accuracy epoch
1 10 1915 1.4824 0.45101499557495117 3.103448275862069e-05 1.321715521812439 81920.0 0.6967375308275223 0.03
1 20 1915 1.2781 0.39225730299949646 6.551724137931034e-05 1.2251619577407837 163840.0 0.7306818246841431 0.05
2 30 1915 1.3996 0.6645015478134155 0.0001 1.430495947599411 242396.0 0.7060131698846817 0.08
2 40 1915 1.5931 0.9289945960044861 0.00013448275862068965 1.5960611820220947 288417.0 0.6799979716539383 0.1
3 50 1915 1.7604 2.645050287246704 0.00016896551724137932 1.7315880894660949 305226.0 0.648864497244358 0.13
3 60 1915 1.2603 0.39722567796707153 0.00019999985689795547 1.237766793370247 387146.0 0.7256842613220215 0.16
4 70 1915 1.1531 0.3355954885482788 0.00019998268514817867 1.1732302904129028 469066.0 0.7462243437767029 0.18
4 80 1915 1.138 0.6724109053611755 0.00019993689862073447 1.138554748892784 542578.0 0.7465290904045105 0.21
5 90 1915 1.3104 0.687850832939148 0.00019986251041960483 1.3193124264478684 581743.0 0.7222554683685303 0.23
5 100 1915 1.3607 2.1803719997406006 0.00019975954183449446 1.3636296361684799 598988.0 0.7111111372709275 0.26
6 110 1915 1.0416 0.39899200201034546 0.00019962802233473753 1.0378663748502732 680908.0 0.7626832842826843 0.29
6 120 1915 1.0051 0.3992024064064026 0.000199467989560864 1.0225067436695099 762828.0 0.7680718511343002 0.31
7 130 1915 1.0186 0.5392997860908508 0.00019927948931382656 1.0269844591617585 832213.0 0.7732914119958878 0.34
7 140 1915 1.128 1.0151463747024536 0.00019906257554189297 1.1342894285917282 870177.0 0.7567992866039276 0.37
8 150 1915 1.1032 1.460560917854309 0.0001988173103252058 1.1415788918733596 884666.0 0.7649768382310868 0.39
8 160 1915 1.1305 0.4438685178756714 0.00019854376385801558 1.0764753699302674 966586.0 0.7493401795625687 0.42
9 170 1915 0.916 0.3598199188709259 0.0001982420144285912 0.9625475347042084 1048506.0 0.7849951088428497 0.44
9 180 1915 0.8729 0.5134291052818298 0.00019791214839681408 0.8617467492818832 1123815.0 0.7960918992757797 0.47
10 190 1915 1.0748 0.7894044518470764 0.0001975542601694622 1.1130005449056626 1166919.0 0.7604668140411377 0.5
10 200 1915 1.0278 1.1817182302474976 0.00019716845217319118 1.0831171095371246 1182872.0 0.7665993243455886 0.52
11 210 1915 0.9612 0.4200129806995392 0.00019675483482521993 0.9298439174890518 1264792.0 0.7776515126228333 0.55
11 220 1915 0.8514 0.4009917974472046 0.0001963135265017296 0.8816195160150528 1346712.0 0.797873905301094 0.57
12 230 1915 0.9058 0.5374516248703003 0.00019584465350398465 0.9023019880056381 1418607.0 0.7881518036127091 0.6
13 240 1915 0.9315 0.9440836906433105 0.00019534835002218585 0.9686785072088242 1459648.0 0.7887694537639618 0.63
13 250 1915 0.8801 3.436066150665283 0.00019482475809706512 0.9381696835160256 1476475.0 0.7897573739290238 0.65
14 260 1915 0.9189 0.4213131368160248 0.0001942740275792342 0.886342903971672 1558395.0 0.7877566039562225 0.68
14 270 1915 0.8936 0.4579828679561615 0.0001936963160862975 0.9317364454269409 1640315.0 0.78873410820961 0.7
15 280 1915 0.8966 0.5316987633705139 0.00019309178895774261 0.9148368030786515 1709554.0 0.7925915122032166 0.73
15 290 1915 0.8024 0.8919654488563538 0.00019246061920762046 0.8256901234388352 1749867.0 0.8177113652229309 0.76
16 300 1915 0.8152 1.2278205156326294 0.00019180298747502908 0.90315712839365 1766864.0 0.8097490608692169 0.78
16 310 1915 0.893 0.45286622643470764 0.00019111908197241536 0.8510008156299591 1848784.0 0.7878421276807785 0.81
17 320 1915 0.8047 0.41520369052886963 0.00019040909843170902 0.8477382659912109 1930704.0 0.8079545468091964 0.84
17 330 1915 0.8271 0.6131967902183533 0.00018967324004830468 0.8447065323591232 2006582.0 0.8029349774122239 0.86
18 340 1915 0.8217 0.9027134776115417 0.00018891171742290794 0.8573419392108917 2048981.0 0.8086616486310959 0.89
18 350 1915 0.7487 1.9160085916519165 0.00018812474850126188 0.842904993891716 2063934.0 0.8266607314348221 0.91
19 360 1915 0.782 0.5186044573783875 0.0001873125585117717 0.7414158508181572 2145854.0 0.8104838639497757 0.94
19 370 1915 0.8292 0.578879177570343 0.00018647537990104494 0.8780058741569519 2221436.0 0.8054068237543106 0.97
20 380 1915 0.7878 1.40608811378479 0.0001856134522673658 0.8799103111028671 2254509.0 0.8155989795923233 0.99
20 390 1915 0.7148 0.6034770011901855 0.00018472702229212288 0.7155006736516952 2314497.0 0.8257474452257156 1.02
21 400 1915 0.7255 0.4438634216785431 0.00018381634366920947 0.7692175805568695 2396417.0 0.8245845586061478 1.04
21 410 1915 0.6972 0.5070385336875916 0.0001828816770324171 0.7631408378481865 2478174.0 0.8336369872093201 1.07
22 420 1915 0.6937 0.7040484547615051 0.0001819232898808429 0.7366810888051987 2530997.0 0.8383352130651474 1.1
22 430 1915 0.5948 1.170653223991394 0.00018094145650233208 0.7168630555272102 2556310.0 0.8503484964370728 1.12
23 440 1915 0.5954 0.5066709518432617 0.00017993645789497734 0.5565064057707787 2615977.0 0.8543228834867478 1.15
23 450 1915 0.696 0.4153716266155243 0.00017890858168669802 0.7587181478738785 2697897.0 0.8316959947347641 1.17
24 460 1915 0.5781 0.7581884860992432 0.00017785812205292195 0.6157770216464996 2777788.0 0.8544550746679306 1.2
25 470 1915 0.6985 0.681121289730072 0.0001767853796323929 0.7583412379026413 2828315.0 0.8360416829586029 1.23
25 480 1915 0.513 1.6988823413848877 0.00017569066144112886 0.6617145895957947 2853288.0 0.8719803839921951 1.25
26 490 1915 0.5311 0.5656732320785522 0.0001745742807845547 0.48689740225672723 2912719.0 0.8682951658964158 1.28
26 500 1915 0.6274 0.5274895429611206 0.00017343655716783505 0.6705890193581581 2994639.0 0.8467130988836289 1.31
27 510 1915 0.6227 0.9059549570083618 0.00017227781620443282 0.6891606166958809 3071356.0 0.8481803476810456 1.33
27 520 1915 0.6094 0.8234562277793884 0.0001710983895229194 0.6939398571848869 3117773.0 0.8540870279073716 1.36
28 530 1915 0.4445 1.322658896446228 0.00016989861467206355 0.5844679683446884 3141027.0 0.8850283622741699 1.38
28 540 1915 0.5367 0.6937307119369507 0.00016867883502422575 0.5174223408102989 3200604.0 0.868792188167572 1.41
29 550 1915 0.6702 0.5078147649765015 0.000167439399677086 0.7037177085876465 3282524.0 0.8386730194091797 1.44
29 560 1915 0.6263 0.6944918036460876 0.000166180663353733 0.6936090737581253 3361104.0 0.8482726097106934 1.46
Model tree for Aptlantis/TinyLlama-1.1B-HolyC
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0