TinyLlama

Fine-tuned TinyLlama-1.1B for the HolyC Task

  1. Model Description

This document provides a detailed analysis of a fine-tuned TinyLlama-1.1B model, which has been specifically adapted for a task designated as "HolyC". The following sections detail the model's training procedure, offer a quantitative analysis of its performance metrics, and present the key outcomes derived from the fine-tuning process.

The base model, TinyLlama-1.1B, is a compact yet powerful language model known for its computational efficiency. By fine-tuning this foundation, we have engineered a specialized tool tailored to the unique demands of the HolyC task. The methodical training process, outlined below, was instrumental in shaping the model's final performance characteristics and will be explored in detail.

  1. Training Procedure

A well-defined training procedure is of strategic importance for achieving reproducible and effective model performance. This section outlines the key parameters and dynamics of the fine-tuning process, which was designed to systematically enhance the model's capabilities on the target task.

The training run was configured for a total of 1915 steps. The available logs cover the process up to step 560, representing the completion of approximately 1.46 epochs. This data provides a clear view into the model's learning trajectory from initialization through a significant portion of the training cycle.

A key component of this strategy was the learning rate schedule, which employed an initial warm-up phase followed by a cosine decay. This design is intentional: the warm-up, which increased the learning rate to a peak of approximately 0.0002 around step 60, allows the model to take aggressive optimization steps early on, helping it move quickly out of poor local minima. The subsequent gradual decay enables more stable, fine-grained weight adjustments as the model approaches an optimal solution, promoting stable convergence. This approach balances aggressive initial exploration with later-stage refinement, a balance whose effectiveness is reflected in the stability metrics discussed later.

The described procedure resulted in a consistent and measurable improvement in model performance, which is quantified in the following section.

  1. Training Performance & Metrics

This section provides a quantitative analysis of the model's performance during the fine-tuning process. Tracking these metrics is crucial for validating the effectiveness of the training methodology and for developing a clear understanding of the model's final capabilities.

3.1. Key Performance Summary

TinyLlama-Metrics

The following table summarizes the most critical performance indicators, capturing the model's state at the beginning and end of the logged training run, as well as the best performance achieved.

Metric Value Initial Loss 1.4824 Final Loss 0.6263 Best Loss 0.4445 Initial Accuracy 0.6967 Final Accuracy 0.8483 Best Accuracy 0.8850

3.2. Loss and Accuracy Analysis

The inverse relationship between training loss and mean token accuracy serves as a primary indicator of successful learning. During this run, the model exhibited a strong and consistent learning curve, though the rate of improvement evolved over time.

As expected, training loss consistently decreased from 1.4824 to a final value of 0.6263, while mean token accuracy rose from 0.6967 to 0.8483. A deeper look at the trends reveals that the most significant gains occurred early in the run, particularly before epoch 0.6, where the loss curve is at its steepest descent. This indicates the model learned the primary patterns in the data very quickly. The subsequent, more marginal improvements suggest the model entered a refinement phase for the remainder of the logged steps. It is a powerful sign of a well-behaved training process that the best loss (0.4445) and peak accuracy (0.8850) were achieved at the exact same point: step 530.

3.3. Model Confidence and Stability

Beyond primary metrics, entropy and gradient norm provide deeper insights into the training dynamics and model's internal state.

Entropy, a measure of the model's predictive uncertainty, showed a consistent downward trend. This reduction in entropy is a key indicator that the model is successfully generalizing from the training data, becoming progressively more confident in its predictions. This growing certainty directly correlates with correctness; as the model's entropy decreased, its mean token accuracy increased in lockstep, demonstrating that its confidence was well-founded.

The gradient norm, a proxy for training stability, exhibited a fluctuating and spiky pattern, which is typical. Notably, the largest spikes occurred around epoch 0.1, coinciding with a volatile period in the loss and accuracy curves as the model began its initial, aggressive learning phase. The model's ability to recover from these spikes without crashing demonstrates the robustness of the training configuration. The fact that the gradient norm's magnitude remained within a stable range prevented training collapse and allowed for the consistent convergence seen in the loss and accuracy metrics.

The high-level metrics analyzed above are derived from the detailed, step-by-step training data presented in the next section.

  1. Detailed Training Logs

This section provides the granular, step-by-step data from the training process for full transparency and deeper analysis. The table below contains a comprehensive log of key metrics recorded at 10-step intervals during the fine-tuning run.

TinyLlama-Summary
percentage	step	total_steps	loss	grad_norm	learning_rate	entropy	num_tokens	mean_token_accuracy	epoch

1	10	1915	1.4824	0.45101499557495117	3.103448275862069e-05	1.321715521812439	81920.0	0.6967375308275223	0.03
1	20	1915	1.2781	0.39225730299949646	6.551724137931034e-05	1.2251619577407837	163840.0	0.7306818246841431	0.05
2	30	1915	1.3996	0.6645015478134155	0.0001	1.430495947599411	242396.0	0.7060131698846817	0.08
2	40	1915	1.5931	0.9289945960044861	0.00013448275862068965	1.5960611820220947	288417.0	0.6799979716539383	0.1
3	50	1915	1.7604	2.645050287246704	0.00016896551724137932	1.7315880894660949	305226.0	0.648864497244358	0.13
3	60	1915	1.2603	0.39722567796707153	0.00019999985689795547	1.237766793370247	387146.0	0.7256842613220215	0.16
4	70	1915	1.1531	0.3355954885482788	0.00019998268514817867	1.1732302904129028	469066.0	0.7462243437767029	0.18
4	80	1915	1.138	0.6724109053611755	0.00019993689862073447	1.138554748892784	542578.0	0.7465290904045105	0.21
5	90	1915	1.3104	0.687850832939148	0.00019986251041960483	1.3193124264478684	581743.0	0.7222554683685303	0.23
5	100	1915	1.3607	2.1803719997406006	0.00019975954183449446	1.3636296361684799	598988.0	0.7111111372709275	0.26
6	110	1915	1.0416	0.39899200201034546	0.00019962802233473753	1.0378663748502732	680908.0	0.7626832842826843	0.29
6	120	1915	1.0051	0.3992024064064026	0.000199467989560864	1.0225067436695099	762828.0	0.7680718511343002	0.31
7	130	1915	1.0186	0.5392997860908508	0.00019927948931382656	1.0269844591617585	832213.0	0.7732914119958878	0.34
7	140	1915	1.128	1.0151463747024536	0.00019906257554189297	1.1342894285917282	870177.0	0.7567992866039276	0.37
8	150	1915	1.1032	1.460560917854309	0.0001988173103252058	1.1415788918733596	884666.0	0.7649768382310868	0.39
8	160	1915	1.1305	0.4438685178756714	0.00019854376385801558	1.0764753699302674	966586.0	0.7493401795625687	0.42
9	170	1915	0.916	0.3598199188709259	0.0001982420144285912	0.9625475347042084	1048506.0	0.7849951088428497	0.44
9	180	1915	0.8729	0.5134291052818298	0.00019791214839681408	0.8617467492818832	1123815.0	0.7960918992757797	0.47
10	190	1915	1.0748	0.7894044518470764	0.0001975542601694622	1.1130005449056626	1166919.0	0.7604668140411377	0.5
10	200	1915	1.0278	1.1817182302474976	0.00019716845217319118	1.0831171095371246	1182872.0	0.7665993243455886	0.52
11	210	1915	0.9612	0.4200129806995392	0.00019675483482521993	0.9298439174890518	1264792.0	0.7776515126228333	0.55
11	220	1915	0.8514	0.4009917974472046	0.0001963135265017296	0.8816195160150528	1346712.0	0.797873905301094	0.57
12	230	1915	0.9058	0.5374516248703003	0.00019584465350398465	0.9023019880056381	1418607.0	0.7881518036127091	0.6
13	240	1915	0.9315	0.9440836906433105	0.00019534835002218585	0.9686785072088242	1459648.0	0.7887694537639618	0.63
13	250	1915	0.8801	3.436066150665283	0.00019482475809706512	0.9381696835160256	1476475.0	0.7897573739290238	0.65
14	260	1915	0.9189	0.4213131368160248	0.0001942740275792342	0.886342903971672	1558395.0	0.7877566039562225	0.68
14	270	1915	0.8936	0.4579828679561615	0.0001936963160862975	0.9317364454269409	1640315.0	0.78873410820961	0.7
15	280	1915	0.8966	0.5316987633705139	0.00019309178895774261	0.9148368030786515	1709554.0	0.7925915122032166	0.73
15	290	1915	0.8024	0.8919654488563538	0.00019246061920762046	0.8256901234388352	1749867.0	0.8177113652229309	0.76
16	300	1915	0.8152	1.2278205156326294	0.00019180298747502908	0.90315712839365	1766864.0	0.8097490608692169	0.78
16	310	1915	0.893	0.45286622643470764	0.00019111908197241536	0.8510008156299591	1848784.0	0.7878421276807785	0.81
17	320	1915	0.8047	0.41520369052886963	0.00019040909843170902	0.8477382659912109	1930704.0	0.8079545468091964	0.84
17	330	1915	0.8271	0.6131967902183533	0.00018967324004830468	0.8447065323591232	2006582.0	0.8029349774122239	0.86
18	340	1915	0.8217	0.9027134776115417	0.00018891171742290794	0.8573419392108917	2048981.0	0.8086616486310959	0.89
18	350	1915	0.7487	1.9160085916519165	0.00018812474850126188	0.842904993891716	2063934.0	0.8266607314348221	0.91
19	360	1915	0.782	0.5186044573783875	0.0001873125585117717	0.7414158508181572	2145854.0	0.8104838639497757	0.94
19	370	1915	0.8292	0.578879177570343	0.00018647537990104494	0.8780058741569519	2221436.0	0.8054068237543106	0.97
20	380	1915	0.7878	1.40608811378479	0.0001856134522673658	0.8799103111028671	2254509.0	0.8155989795923233	0.99
20	390	1915	0.7148	0.6034770011901855	0.00018472702229212288	0.7155006736516952	2314497.0	0.8257474452257156	1.02
21	400	1915	0.7255	0.4438634216785431	0.00018381634366920947	0.7692175805568695	2396417.0	0.8245845586061478	1.04
21	410	1915	0.6972	0.5070385336875916	0.0001828816770324171	0.7631408378481865	2478174.0	0.8336369872093201	1.07
22	420	1915	0.6937	0.7040484547615051	0.0001819232898808429	0.7366810888051987	2530997.0	0.8383352130651474	1.1
22	430	1915	0.5948	1.170653223991394	0.00018094145650233208	0.7168630555272102	2556310.0	0.8503484964370728	1.12
23	440	1915	0.5954	0.5066709518432617	0.00017993645789497734	0.5565064057707787	2615977.0	0.8543228834867478	1.15
23	450	1915	0.696	0.4153716266155243	0.00017890858168669802	0.7587181478738785	2697897.0	0.8316959947347641	1.17
24	460	1915	0.5781	0.7581884860992432	0.00017785812205292195	0.6157770216464996	2777788.0	0.8544550746679306	1.2
25	470	1915	0.6985	0.681121289730072	0.0001767853796323929	0.7583412379026413	2828315.0	0.8360416829586029	1.23
25	480	1915	0.513	1.6988823413848877	0.00017569066144112886	0.6617145895957947	2853288.0	0.8719803839921951	1.25
26	490	1915	0.5311	0.5656732320785522	0.0001745742807845547	0.48689740225672723	2912719.0	0.8682951658964158	1.28
26	500	1915	0.6274	0.5274895429611206	0.00017343655716783505	0.6705890193581581	2994639.0	0.8467130988836289	1.31
27	510	1915	0.6227	0.9059549570083618	0.00017227781620443282	0.6891606166958809	3071356.0	0.8481803476810456	1.33
27	520	1915	0.6094	0.8234562277793884	0.0001710983895229194	0.6939398571848869	3117773.0	0.8540870279073716	1.36
28	530	1915	0.4445	1.322658896446228	0.00016989861467206355	0.5844679683446884	3141027.0	0.8850283622741699	1.38
28	540	1915	0.5367	0.6937307119369507	0.00016867883502422575	0.5174223408102989	3200604.0	0.868792188167572	1.41
29	550	1915	0.6702	0.5078147649765015	0.000167439399677086	0.7037177085876465	3282524.0	0.8386730194091797	1.44
29	560	1915	0.6263	0.6944918036460876	0.000166180663353733	0.6936090737581253	3361104.0	0.8482726097106934	1.46
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for Aptlantis/TinyLlama-1.1B-HolyC

Finetuned
(495)
this model