haykgrigorian/v2mini-eval2: Llama-Architecture 215M Model

Model Overview

v2mini-eval2 model, trained from scratch on 15GB of 1800-1875 london texts using Llama architecture. This model was trained to validate the tokenizer before scaling to 90GB.

Detail Value
Model Architecture LlamaForCausalLM (Decoder-Only Transformer)
Parameter Count ~215 Million (214.8M)
Training Type Trained from Scratch (10,000 steps)
Tokenizer Custom BPE, Vocab Size 32,003
Sequence Length 4096 tokens (4x increase from eval1)
Attention Type Grouped Query Attention (GQA)

Configuration Details

eval2 uses a different configuration comapared to eval1 for extended context length:

Parameter Value
Number of Layers 24
Hidden Size (d) 768
Intermediate Size ($\text{d}_{\text{ff}}$) 2048
Attention Heads 12 (Query) / 6 (Key/Value)
Activation Function SiLU (silu)
Normalization RMS Norm (rms_norm_eps: 1e-05)
Position Embeddings RoPE (Theta: 10,000)

Improvements & Fixes

  • Fixed Tokenization: There was a spacing issue in the output from eval1 (e.g., "D oes t ha t wor k") It's fixed now. Output should look normal.

Cost

Total time in RunPod VM: 7:28

Total time spent on training: 5:52

A100 SXM Cost: $1.49/Hour

Total cost: $11.12

How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

Test script

A run file for testing and evaluating this model is available on the main project repository:

Downloads last month
96
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train haykgrigorian/v2mini-eval2