File size: 2,204 Bytes
5f76dd7
0310377
5f76dd7
 
 
777c57c
 
c762e9a
e620911
 
 
 
 
 
5d07770
e620911
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0310377
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: mit
language:
- en
pipeline_tag: text-generation
datasets:
- haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB
library_name: transformers
---

# haykgrigorian/v2mini-eval1: Llama-Architecture 318M Model

## Model Overview

**v2mini-eval1** model, trained from scratch on 15GB of 1800-1875 london texts using the modern Llama architecture. This model was trained for v2's dataset evaluation.

| Detail | Value |
| :--- | :--- |
| **Model Architecture** | LlamaForCausalLM (Decoder-Only Transformer) |
| **Parameter Count** | **~318 Million (318M)** |
| **Training Type** | Trained **from Scratch** (Random Initialization) |
| **Tokenizer** | Custom BPE, Vocab Size 32,000 |
| **Sequence Length** | 1024 tokens |
| **Attention Type** | Grouped Query Attention (GQA) |

## Configuration Details

This model is a custom size and configuration based on Llama:

| Parameter | Value |
| :--- | :--- |
| **Number of Layers** | 20 |
| **Hidden Size (d)** | 1024 |
| **Intermediate Size ($\text{d}_{\text{ff}}$)** | 2752 |
| **Attention Heads** | 16 (Query) / 8 (Key/Value) |
| **Activation Function** | SiLU (`silu`) |
| **Normalization** | RMS Norm (`rms_norm_eps`: 1e-05) |
| **Position Embeddings** | Rotary Positional Embeddings (RoPE) |

## Model Issues 

This is an evaluation model, it was trained from scratch using a 15GB sample from a 90GB dataset for 10k steps. There was a tokenization issue and output comes out like this: 

- default: "D oes that work more of h ise x cell ent st ir ring , in his pl ays"

- fixed: "Does that work more of his excellent stirring, in his plays"

This is just a tokenizer issue, just fix the output yourself or if you're lazy feed it to an LLM and have it fixed.

### How to Load and Run the Model

Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts

### Test script

A run file for testing and evaluating this model is available on the main project repository:

* **Test Script Link:** [test_v2mini_eval1.py on GitHub](https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/london_1800_1875_v2mini_eval1/test_v2mini_eval1.py)