Test-1-2000: A 190M Parameter Narrative Engine
Test-1-2000 is a compact, high-performance Transformer model based on the Llama architecture. It was specifically trained to master long-range narrative consistency and logical coherence within the domain of short-form storytelling.
Despite its efficient parameter count, the model demonstrates sophisticated "world-modeling" capabilitiesβunderstanding cause-and-effect, emotional nuance, and character persistence over extended contexts.
π Model Highlights
- Architecture: Llama-based Decoder-only Transformer
- Parameters: 190.55 Million
- Context Window: 2048 Tokens
- Final Training Loss: 1.27 (at Step 2,000)
- Optimization: Fully compiled via
torch.compilewith Flash Attention 2 support.
π§ Model Structure & Design
The model utilizes a modern LLM blueprint designed for training stability and inference speed. By implementing Rotary Positional Embeddings (RoPE), the model maintains a precise understanding of token relationships across its full 2048-token window, doubling the standard context length of most models in this size class.
Technical Specifications
| Feature | Specification |
|---|---|
| Hidden Dimension | 768 |
| Layers (Depth) | 12 |
| Attention Heads | 12 |
| Intermediate Size | 3072 |
| Activation Function | SwiGLU |
| Normalization | RMSNorm |
| Vocab Size | 50,257 (GPT-2 Tokenizer) |
π The Evolution of Learning
Training on the TinyStories dataset allows us to observe the model's cognitive development in distinct phases. Test-1-2000 achieved high literacy by progressing through these stages:
1. The Lexical Phase (Steps 0 β 250)
The model mastered basic English syntax and frequent patterns. It learned that "Once upon a time" is the standard anchor for its world.
2. The Relational Phase (Steps 250 β 1,000)
The model began connecting nouns with logical actions. It started understanding "spatial" logicβthat if a character is in a park, they are likely playing or seeing trees. Loss dipped significantly below 1.5.
3. The Coherence Phase (Steps 1,000 β 2,000)
The final phase of this run focused on Narrative Resolution. The model learned to close the loops it opened, ensuring that a story starting with a problem (e.g., "Lily was bored") ends with a logical solution.
π Training Configuration
Hyperparameters
- Precision:
bfloat16Mixed Precision - Optimizer: AdamW ($\beta_1=0.9, \beta_2=0.95$)
- Learning Rate: 5e-4 (Scheduled via
OneCycleLR) - Total Batch Size: ~262,144 tokens per step
- Weight Decay: 0.01
Dataset
TinyStories (2M): A collection of synthetic stories focusing on a vocabulary a 3-year-old would understand, but with the structural complexity of professional writing. This allows the model to learn "reasoning" without the noise of the open web.
π» Usage: Quick Start
You can load and run Test-1-2000 using the Hugging Face transformers library:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_path = "GODELEV/Test-1-2000"
# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Prepare Prompt
prompt = "Once upon a time, Tom found a blue car."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate Story
output = model.generate(
**inputs,
max_new_tokens=200,
temperature=0.7,
top_p=0.9,
do_sample=True,
repetition_penalty=1.1,
# CRITICAL ADDITIONS BELOW:
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id
)
# Set skip_special_tokens=False first just to verify if the token is there(|endoftext|)
# then switch back to True for a clean output.
print(tokenizer.decode(output[0], skip_special_tokens=True))
- Downloads last month
- 98