haykgrigorian/v2mini-eval2: Llama-Architecture 215M Model
Model Overview
v2mini-eval2 model, trained from scratch on 15GB of 1800-1875 london texts using Llama architecture. This model was trained to validate the tokenizer before scaling to 90GB.
| Detail | Value |
|---|---|
| Model Architecture | LlamaForCausalLM (Decoder-Only Transformer) |
| Parameter Count | ~215 Million (214.8M) |
| Training Type | Trained from Scratch (10,000 steps) |
| Tokenizer | Custom BPE, Vocab Size 32,003 |
| Sequence Length | 4096 tokens (4x increase from eval1) |
| Attention Type | Grouped Query Attention (GQA) |
Configuration Details
eval2 uses a different configuration comapared to eval1 for extended context length:
| Parameter | Value |
|---|---|
| Number of Layers | 24 |
| Hidden Size (d) | 768 |
| Intermediate Size ($\text{d}_{\text{ff}}$) | 2048 |
| Attention Heads | 12 (Query) / 6 (Key/Value) |
| Activation Function | SiLU (silu) |
| Normalization | RMS Norm (rms_norm_eps: 1e-05) |
| Position Embeddings | RoPE (Theta: 10,000) |
Improvements & Fixes
- Fixed Tokenization: There was a spacing issue in the output from
eval1(e.g., "D oes t ha t wor k") It's fixed now. Output should look normal.
Cost
Total time in RunPod VM: 7:28
Total time spent on training: 5:52
A100 SXM Cost: $1.49/Hour
Total cost: $11.12
How to Load and Run the Model
Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts
Test script
A run file for testing and evaluating this model is available on the main project repository:
- Test Script Link: test_v2mini_eval2.py on GitHub
- Downloads last month
- 96