v2mini-eval2 / README.md
haykgrigorian's picture
Update README.md
e2ca0bf verified
metadata
license: mit
datasets:
  - haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB
language:
  - en
pipeline_tag: text-generation
library_name: transformers

haykgrigorian/v2mini-eval2: Llama-Architecture 215M Model

Model Overview

v2mini-eval2 model, trained from scratch on 15GB of 1800-1875 london texts using Llama architecture. This model was trained to validate the tokenizer before scaling to 90GB.

Detail Value
Model Architecture LlamaForCausalLM (Decoder-Only Transformer)
Parameter Count ~215 Million (214.8M)
Training Type Trained from Scratch (10,000 steps)
Tokenizer Custom BPE, Vocab Size 32,003
Sequence Length 4096 tokens (4x increase from eval1)
Attention Type Grouped Query Attention (GQA)

Configuration Details

eval2 uses a different configuration comapared to eval1 for extended context length:

Parameter Value
Number of Layers 24
Hidden Size (d) 768
Intermediate Size ($\text{d}_{\text{ff}}$) 2048
Attention Heads 12 (Query) / 6 (Key/Value)
Activation Function SiLU (silu)
Normalization RMS Norm (rms_norm_eps: 1e-05)
Position Embeddings RoPE (Theta: 10,000)

Improvements & Fixes

  • Fixed Tokenization: There was a spacing issue in the output from eval1 (e.g., "D oes t ha t wor k") It's fixed now. Output should look normal.

Cost

Total time in RunPod VM: 7:28

Total time spent on training: 5:52

A100 SXM Cost: $1.49/Hour

Total cost: $11.12

How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

Test script

A run file for testing and evaluating this model is available on the main project repository: