|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: text-generation |
|
|
datasets: |
|
|
- haykgrigorian/TimeCapsuleLLM-London-1800-1875-v2-15GB |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# haykgrigorian/v2mini-eval1: Llama-Architecture 318M Model |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
**v2mini-eval1** model, trained from scratch on 15GB of 1800-1875 london texts using the modern Llama architecture. This model was trained for v2's dataset evaluation. |
|
|
|
|
|
| Detail | Value | |
|
|
| :--- | :--- | |
|
|
| **Model Architecture** | LlamaForCausalLM (Decoder-Only Transformer) | |
|
|
| **Parameter Count** | **~318 Million (318M)** | |
|
|
| **Training Type** | Trained **from Scratch** (Random Initialization) | |
|
|
| **Tokenizer** | Custom BPE, Vocab Size 32,000 | |
|
|
| **Sequence Length** | 1024 tokens | |
|
|
| **Attention Type** | Grouped Query Attention (GQA) | |
|
|
|
|
|
## Configuration Details |
|
|
|
|
|
This model is a custom size and configuration based on Llama: |
|
|
|
|
|
| Parameter | Value | |
|
|
| :--- | :--- | |
|
|
| **Number of Layers** | 20 | |
|
|
| **Hidden Size (d)** | 1024 | |
|
|
| **Intermediate Size ($\text{d}_{\text{ff}}$)** | 2752 | |
|
|
| **Attention Heads** | 16 (Query) / 8 (Key/Value) | |
|
|
| **Activation Function** | SiLU (`silu`) | |
|
|
| **Normalization** | RMS Norm (`rms_norm_eps`: 1e-05) | |
|
|
| **Position Embeddings** | Rotary Positional Embeddings (RoPE) | |
|
|
|
|
|
## Model Issues |
|
|
|
|
|
This is an evaluation model, it was trained from scratch using a 15GB sample from a 90GB dataset for 10k steps. There was a tokenization issue and output comes out like this: |
|
|
|
|
|
- default: "D oes that work more of h ise x cell ent st ir ring , in his pl ays" |
|
|
|
|
|
- fixed: "Does that work more of his excellent stirring, in his plays" |
|
|
|
|
|
This is just a tokenizer issue, just fix the output yourself or if you're lazy feed it to an LLM and have it fixed. |
|
|
|
|
|
### How to Load and Run the Model |
|
|
|
|
|
Install all the files locally in a folder and run the test script. You will have to make some adjustments in the run script like updating the config/file path and test prompts |
|
|
|
|
|
### Test script |
|
|
|
|
|
A run file for testing and evaluating this model is available on the main project repository: |
|
|
|
|
|
* **Test Script Link:** [test_v2mini_eval1.py on GitHub](https://github.com/haykgrigo3/TimeCapsuleLLM/blob/main/london_1800_1875_v2mini_eval1/test_v2mini_eval1.py) |