--- license: mit datasets: - haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB language: - en pipeline_tag: text-generation library_name: transformers --- # haykgrigorian/v2mini-eval2: Llama-Architecture 215M Model ## Model Overview **v2mini-eval2** model, trained from scratch on 15GB of 1800-1875 london texts using Llama architecture. This model was trained to validate the tokenizer before scaling to 90GB. | Detail | Value | | :--- | :--- | | **Model Architecture** | LlamaForCausalLM (Decoder-Only Transformer) | | **Parameter Count** | **~215 Million (214.8M)** | | **Training Type** | Trained **from Scratch** (10,000 steps) | | **Tokenizer** | Custom BPE, Vocab Size 32,003 | | **Sequence Length** | 4096 tokens (4x increase from eval1) | | **Attention Type** | Grouped Query Attention (GQA) | ## Configuration Details eval2 uses a different configuration comapared to eval1 for extended context length: | Parameter | Value | | :--- | :--- | | **Number of Layers** | 24 | | **Hidden Size (d)** | 768 | | **Intermediate Size ($\text{d}_{\text{ff}}$)** | 2048 | | **Attention Heads** | 12 (Query) / 6 (Key/Value) | | **Activation Function** | SiLU (`silu`) | | **Normalization** | RMS Norm (`rms_norm_eps`: 1e-05) | | **Position Embeddings** | RoPE (Theta: 10,000) | ## Improvements & Fixes * **Fixed Tokenization:** There was a spacing issue in the output from `eval1` (e.g., "D oes t ha t wor k") It's fixed now. Output should look normal. ## Cost Total time in RunPod VM: 7:28 Total time spent on training: 5:52 A100 SXM Cost: $1.49/Hour Total cost: $11.12 ### How to Load and Run the Model Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts ### Test script A run file for testing and evaluating this model is available on the main project repository: * **Test Script Link:** [test_v2mini_eval2.py on GitHub](https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/london_1800_1875_v2mini_eval1/test_v2mini_eval2.py)