haykgrigorian/TimeCapsuleLLM-v2-London-1800-1875: Llama-Architecture 1.2B Model

Model Overview

v2 model, trained from scratch on 112GB of 1800-1875 london texts using a Llama-based Casual Language Model.

Detail	Value
Model Architecture	LlamaForCausalLM (Decoder-Only Transformer)
Parameter Count	~1.22B
Training Type	Trained from Scratch (Random Initialization)
Tokenizer	Custom BPE, Vocab Size 32,000
Sequence Length	2048 tokens
Attention Type	Grouped Query Attention (GQA)

Configuration Details

This model is a custom size and configuration based on Llama:

Parameter	Value
Number of Layers	22
Hidden Size (d)	2048
Intermediate Size ($\text{d}_{\text{ff}}$)	5504
Attention Heads	16 (Query) / 8 (Key/Value)
Activation Function	SiLU (`silu`)
Normalization	RMS Norm (`rms_norm_eps`: 1e-06)
Position Embeddings	Rotary Positional Embeddings (RoPE)

Training Info

This model was trained for 182,000 steps, about 0.5 epochs.

Training Metrics:

Final Training Loss: 3.3951

Start Training Loss: 10.7932

Training Steps: 182,000

Epochs: 0.4997

Gradient Norm Stability: Consistently stable between 0.50 and 0.60 in later stages.

Training time: 117 hours 51 minutes

Cost

This model was trained on an H100 SXM from RunPod

Total: $340.97

How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts