haykgrigorian/TimeCapsuleLLM-v2-London-1800-1875: Llama-Architecture 1.2B Model

Model Overview

v2 model, trained from scratch on 112GB of 1800-1875 london texts using a Llama-based Casual Language Model.

Detail Value
Model Architecture LlamaForCausalLM (Decoder-Only Transformer)
Parameter Count ~1.22B
Training Type Trained from Scratch (Random Initialization)
Tokenizer Custom BPE, Vocab Size 32,000
Sequence Length 2048 tokens
Attention Type Grouped Query Attention (GQA)

Configuration Details

This model is a custom size and configuration based on Llama:

Parameter Value
Number of Layers 22
Hidden Size (d) 2048
Intermediate Size ($\text{d}_{\text{ff}}$) 5504
Attention Heads 16 (Query) / 8 (Key/Value)
Activation Function SiLU (silu)
Normalization RMS Norm (rms_norm_eps: 1e-06)
Position Embeddings Rotary Positional Embeddings (RoPE)

Training Info

This model was trained for 182,000 steps, about 0.5 epochs.

Training Metrics:

Final Training Loss: 3.3951

Start Training Loss: 10.7932

Training Steps: 182,000

Epochs: 0.4997

Gradient Norm Stability: Consistently stable between 0.50 and 0.60 in later stages.

Training time: 117 hours 51 minutes

Cost

This model was trained on an H100 SXM from RunPod

Total: $340.97

How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

Test script

A run file for testing and evaluating this model is available on the main project repository:

Downloads last month
374
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for haykgrigorian/TimeCapsuleLLM-v2-llama-1.2B

Finetunes
1 model

Dataset used to train haykgrigorian/TimeCapsuleLLM-v2-llama-1.2B

Collection including haykgrigorian/TimeCapsuleLLM-v2-llama-1.2B