File size: 2,077 Bytes
93b6729 b3e44dc 93b6729 b3e44dc e2ca0bf b3e44dc 93b6729 c588a56 b3e44dc 93b6729 b3e44dc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
license: mit
datasets:
- haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB
language:
- en
pipeline_tag: text-generation
library_name: transformers
---
# haykgrigorian/v2mini-eval2: Llama-Architecture 215M Model
## Model Overview
**v2mini-eval2** model, trained from scratch on 15GB of 1800-1875 london texts using Llama architecture. This model was trained to validate the tokenizer before scaling to 90GB.
| Detail | Value |
| :--- | :--- |
| **Model Architecture** | LlamaForCausalLM (Decoder-Only Transformer) |
| **Parameter Count** | **~215 Million (214.8M)** |
| **Training Type** | Trained **from Scratch** (10,000 steps) |
| **Tokenizer** | Custom BPE, Vocab Size 32,003 |
| **Sequence Length** | 4096 tokens (4x increase from eval1) |
| **Attention Type** | Grouped Query Attention (GQA) |
## Configuration Details
eval2 uses a different configuration comapared to eval1 for extended context length:
| Parameter | Value |
| :--- | :--- |
| **Number of Layers** | 24 |
| **Hidden Size (d)** | 768 |
| **Intermediate Size ($\text{d}_{\text{ff}}$)** | 2048 |
| **Attention Heads** | 12 (Query) / 6 (Key/Value) |
| **Activation Function** | SiLU (`silu`) |
| **Normalization** | RMS Norm (`rms_norm_eps`: 1e-05) |
| **Position Embeddings** | RoPE (Theta: 10,000) |
## Improvements & Fixes
* **Fixed Tokenization:** There was a spacing issue in the output from `eval1` (e.g., "D oes t ha t wor k") It's fixed now. Output should look normal.
## Cost
Total time in RunPod VM: 7:28
Total time spent on training: 5:52
A100 SXM Cost: $1.49/Hour
Total cost: $11.12
### How to Load and Run the Model
Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts
### Test script
A run file for testing and evaluating this model is available on the main project repository:
* **Test Script Link:** [test_v2mini_eval2.py on GitHub](https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/london_1800_1875_v2mini_eval1/test_v2mini_eval2.py) |