File size: 2,077 Bytes
93b6729
b3e44dc
93b6729
 
b3e44dc
 
 
e2ca0bf
b3e44dc
 
93b6729
c588a56
b3e44dc
 
 
 
 
 
 
 
93b6729
b3e44dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: mit
datasets:
- haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB
language:
- en
pipeline_tag: text-generation
library_name: transformers
---


# haykgrigorian/v2mini-eval2: Llama-Architecture 215M Model

## Model Overview

**v2mini-eval2** model, trained from scratch on 15GB of 1800-1875 london texts using Llama architecture. This model was trained to validate the tokenizer before scaling to 90GB.

| Detail | Value |
| :--- | :--- |
| **Model Architecture** | LlamaForCausalLM (Decoder-Only Transformer) |
| **Parameter Count** | **~215 Million (214.8M)** |
| **Training Type** | Trained **from Scratch** (10,000 steps) |
| **Tokenizer** | Custom BPE, Vocab Size 32,003 |
| **Sequence Length** | 4096 tokens (4x increase from eval1) |
| **Attention Type** | Grouped Query Attention (GQA) |

## Configuration Details

eval2 uses a different configuration comapared to eval1 for extended context length:

| Parameter | Value |
| :--- | :--- |
| **Number of Layers** | 24 |
| **Hidden Size (d)** | 768 |
| **Intermediate Size ($\text{d}_{\text{ff}}$)** | 2048 |
| **Attention Heads** | 12 (Query) / 6 (Key/Value) |
| **Activation Function** | SiLU (`silu`) |
| **Normalization** | RMS Norm (`rms_norm_eps`: 1e-05) |
| **Position Embeddings** | RoPE (Theta: 10,000) |

## Improvements & Fixes 

* **Fixed Tokenization:** There was a spacing issue in the output from `eval1` (e.g., "D oes t ha t wor k") It's fixed now. Output should look normal.

## Cost

Total time in RunPod VM: 7:28 

Total time spent on training: 5:52

A100 SXM Cost: $1.49/Hour

Total cost: $11.12


### How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

### Test script

A run file for testing and evaluating this model is available on the main project repository:

* **Test Script Link:** [test_v2mini_eval2.py on GitHub](https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/london_1800_1875_v2mini_eval1/test_v2mini_eval2.py)