File size: 4,155 Bytes
1d85fd5
 
 
 
 
 
 
83835bd
1d85fd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a5c696
1d85fd5
 
 
 
 
 
 
 
 
 
d702e45
1d85fd5
 
 
 
 
 
 
 
 
d702e45
 
 
 
1d85fd5
 
d702e45
 
1d85fd5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: mit
datasets:
- roneneldan/TinyStories
language:
- en
pipeline_tag: text-generation
new_version: GODELEV/Test-1-3000
---
# Test-1-2000: A 190M Parameter Narrative Engine

**Test-1-2000** is a compact, high-performance Transformer model based on the Llama architecture. It was specifically trained to master long-range narrative consistency and logical coherence within the domain of short-form storytelling. 

Despite its efficient parameter count, the model demonstrates sophisticated "world-modeling" capabilities—understanding cause-and-effect, emotional nuance, and character persistence over extended contexts.

## 🚀 Model Highlights
*   **Architecture:** Llama-based Decoder-only Transformer
*   **Parameters:** 190.55 Million
*   **Context Window:** 2048 Tokens
*   **Final Training Loss:** 1.27 (at Step 2,000)
*   **Optimization:** Fully compiled via `torch.compile` with Flash Attention 2 support.

---

## 🧠 Model Structure & Design

The model utilizes a modern LLM blueprint designed for training stability and inference speed. By implementing **Rotary Positional Embeddings (RoPE)**, the model maintains a precise understanding of token relationships across its full 2048-token window, doubling the standard context length of most models in this size class.

### Technical Specifications
| Feature | Specification |
| :--- | :--- |
| **Hidden Dimension** | 768 |
| **Layers (Depth)** | 12 |
| **Attention Heads** | 12 |
| **Intermediate Size** | 3072 |
| **Activation Function** | SwiGLU |
| **Normalization** | RMSNorm |
| **Vocab Size** | 50,257 (GPT-2 Tokenizer) |

---

## 📈 The Evolution of Learning

Training on the **TinyStories** dataset allows us to observe the model's cognitive development in distinct phases. **Test-1-2000** achieved high literacy by progressing through these stages:

### 1. The Lexical Phase (Steps 0 – 250)
The model mastered basic English syntax and frequent patterns. It learned that "Once upon a time" is the standard anchor for its world.

### 2. The Relational Phase (Steps 250 – 1,000)
The model began connecting nouns with logical actions. It started understanding "spatial" logic—that if a character is in a park, they are likely playing or seeing trees. Loss dipped significantly below 1.5.

### 3. The Coherence Phase (Steps 1,000 – 2,000)
The final phase of this run focused on **Narrative Resolution**. The model learned to close the loops it opened, ensuring that a story starting with a problem (e.g., "Lily was bored") ends with a logical solution.

---

## 🛠 Training Configuration

### Hyperparameters
*   **Precision:** `bfloat16` Mixed Precision
*   **Optimizer:** AdamW ($\beta_1=0.9, \beta_2=0.95$)
*   **Learning Rate:** 5e-4 (Scheduled via `OneCycleLR`)
*   **Total Batch Size:** ~262,144 tokens per step
*   **Weight Decay:** 0.01

### Dataset
**TinyStories (2M):** A collection of synthetic stories focusing on a vocabulary a 3-year-old would understand, but with the structural complexity of professional writing. This allows the model to learn "reasoning" without the noise of the open web.

---

## 💻 Usage: Quick Start

You can load and run **Test-1-2000** using the Hugging Face `transformers` library:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "GODELEV/Test-1-2000"

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

# Prepare Prompt
prompt = "Once upon a time, Tom found a blue car."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate Story
output = model.generate(
    **inputs, 
    max_new_tokens=200, 
    temperature=0.7, 
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1,
    # CRITICAL ADDITIONS BELOW:
    eos_token_id=tokenizer.eos_token_id, 
    pad_token_id=tokenizer.pad_token_id  
)

# Set skip_special_tokens=False first just to verify if the token is there(|endoftext|) 
# then switch back to True for a clean output.
print(tokenizer.decode(output[0], skip_special_tokens=True))