haykgrigorian/TimeCapsuleLLM-v2-London-1800-1875: Llama-Architecture 1.2B Model
Model Overview
v2 model, trained from scratch on 112GB of 1800-1875 london texts using a Llama-based Casual Language Model.
| Detail | Value |
|---|---|
| Model Architecture | LlamaForCausalLM (Decoder-Only Transformer) |
| Parameter Count | ~1.22B |
| Training Type | Trained from Scratch (Random Initialization) |
| Tokenizer | Custom BPE, Vocab Size 32,000 |
| Sequence Length | 2048 tokens |
| Attention Type | Grouped Query Attention (GQA) |
Configuration Details
This model is a custom size and configuration based on Llama:
| Parameter | Value |
|---|---|
| Number of Layers | 22 |
| Hidden Size (d) | 2048 |
| Intermediate Size ($\text{d}_{\text{ff}}$) | 5504 |
| Attention Heads | 16 (Query) / 8 (Key/Value) |
| Activation Function | SiLU (silu) |
| Normalization | RMS Norm (rms_norm_eps: 1e-06) |
| Position Embeddings | Rotary Positional Embeddings (RoPE) |
Training Info
This model was trained for 182,000 steps, about 0.5 epochs.
Training Metrics:
Final Training Loss: 3.3951
Start Training Loss: 10.7932
Training Steps: 182,000
Epochs: 0.4997
Gradient Norm Stability: Consistently stable between 0.50 and 0.60 in later stages.
Training time: 117 hours 51 minutes
Cost
This model was trained on an H100 SXM from RunPod
Total: $340.97
How to Load and Run the Model
Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts
Test script
A run file for testing and evaluating this model is available on the main project repository:
- Test Script Link: run_v2.py on GitHub
- Downloads last month
- 374