v2mini-eval1 / README.md
haykgrigorian's picture
Adding `transformers` as library name (#1)
c762e9a verified
---
license: mit
language:
- en
pipeline_tag: text-generation
datasets:
- haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB
library_name: transformers
---
# haykgrigorian/v2mini-eval1: Llama-Architecture 318M Model
## Model Overview
**v2mini-eval1** model, trained from scratch on 15GB of 1800-1875 london texts using the modern Llama architecture. This model was trained for v2's dataset evaluation.
| Detail | Value |
| :--- | :--- |
| **Model Architecture** | LlamaForCausalLM (Decoder-Only Transformer) |
| **Parameter Count** | **~318 Million (318M)** |
| **Training Type** | Trained **from Scratch** (Random Initialization) |
| **Tokenizer** | Custom BPE, Vocab Size 32,000 |
| **Sequence Length** | 1024 tokens |
| **Attention Type** | Grouped Query Attention (GQA) |
## Configuration Details
This model is a custom size and configuration based on Llama:
| Parameter | Value |
| :--- | :--- |
| **Number of Layers** | 20 |
| **Hidden Size (d)** | 1024 |
| **Intermediate Size ($\text{d}_{\text{ff}}$)** | 2752 |
| **Attention Heads** | 16 (Query) / 8 (Key/Value) |
| **Activation Function** | SiLU (`silu`) |
| **Normalization** | RMS Norm (`rms_norm_eps`: 1e-05) |
| **Position Embeddings** | Rotary Positional Embeddings (RoPE) |
## Model Issues
This is an evaluation model, it was trained from scratch using a 15GB sample from a 90GB dataset for 10k steps. There was a tokenization issue and output comes out like this:
- default: "D oes that work more of h ise x cell ent st ir ring , in his pl ays"
- fixed: "Does that work more of his excellent stirring, in his plays"
This is just a tokenizer issue, just fix the output yourself or if you're lazy feed it to an LLM and have it fixed.
### How to Load and Run the Model
Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts
### Test script
A run file for testing and evaluating this model is available on the main project repository:
* **Test Script Link:** [test_v2mini_eval1.py on GitHub](https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/london_1800_1875_v2mini_eval1/test_v2mini_eval1.py)