haykgrigorian/v2mini-eval2: Llama-Architecture 215M Model

Model Overview

v2mini-eval2 model, trained from scratch on 15GB of 1800-1875 london texts using Llama architecture. This model was trained to validate the tokenizer before scaling to 90GB.

Detail	Value
Model Architecture	LlamaForCausalLM (Decoder-Only Transformer)
Parameter Count	~215 Million (214.8M)
Training Type	Trained from Scratch (10,000 steps)
Tokenizer	Custom BPE, Vocab Size 32,003
Sequence Length	4096 tokens (4x increase from eval1)
Attention Type	Grouped Query Attention (GQA)

Configuration Details

eval2 uses a different configuration comapared to eval1 for extended context length:

Parameter	Value
Number of Layers	24
Hidden Size (d)	768
Intermediate Size ($\text{d}_{\text{ff}}$)	2048
Attention Heads	12 (Query) / 6 (Key/Value)
Activation Function	SiLU (`silu`)
Normalization	RMS Norm (`rms_norm_eps`: 1e-05)
Position Embeddings	RoPE (Theta: 10,000)

Improvements & Fixes

Fixed Tokenization: There was a spacing issue in the output from eval1 (e.g., "D oes t ha t wor k") It's fixed now. Output should look normal.

Cost

Total time in RunPod VM: 7:28

Total time spent on training: 5:52

A100 SXM Cost: $1.49/Hour

Total cost: $11.12

How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

Test script

A run file for testing and evaluating this model is available on the main project repository:

Test Script Link: test_v2mini_eval2.py on GitHub

Downloads last month: 96

Safetensors

Model size

0.2B params

Tensor type

F32

haykgrigorian
/

v2mini-eval2