Test-1-2000: A 190M Parameter Narrative Engine

Test-1-2000 is a compact, high-performance Transformer model based on the Llama architecture. It was specifically trained to master long-range narrative consistency and logical coherence within the domain of short-form storytelling.

Despite its efficient parameter count, the model demonstrates sophisticated "world-modeling" capabilities—understanding cause-and-effect, emotional nuance, and character persistence over extended contexts.

🚀 Model Highlights

Architecture: Llama-based Decoder-only Transformer
Parameters: 190.55 Million
Context Window: 2048 Tokens
Final Training Loss: 1.27 (at Step 2,000)
Optimization: Fully compiled via torch.compile with Flash Attention 2 support.

🧠 Model Structure & Design

The model utilizes a modern LLM blueprint designed for training stability and inference speed. By implementing Rotary Positional Embeddings (RoPE), the model maintains a precise understanding of token relationships across its full 2048-token window, doubling the standard context length of most models in this size class.

Technical Specifications

Feature	Specification
Hidden Dimension	768
Layers (Depth)	12
Attention Heads	12
Intermediate Size	3072
Activation Function	SwiGLU
Normalization	RMSNorm
Vocab Size	50,257 (GPT-2 Tokenizer)

📈 The Evolution of Learning

Training on the TinyStories dataset allows us to observe the model's cognitive development in distinct phases. Test-1-2000 achieved high literacy by progressing through these stages:

1. The Lexical Phase (Steps 0 – 250)

The model mastered basic English syntax and frequent patterns. It learned that "Once upon a time" is the standard anchor for its world.

2. The Relational Phase (Steps 250 – 1,000)

The model began connecting nouns with logical actions. It started understanding "spatial" logic—that if a character is in a park, they are likely playing or seeing trees. Loss dipped significantly below 1.5.

3. The Coherence Phase (Steps 1,000 – 2,000)

The final phase of this run focused on Narrative Resolution. The model learned to close the loops it opened, ensuring that a story starting with a problem (e.g., "Lily was bored") ends with a logical solution.

🛠 Training Configuration

Hyperparameters

Precision: bfloat16 Mixed Precision
Optimizer: AdamW ($\beta_1=0.9, \beta_2=0.95$)
Learning Rate: 5e-4 (Scheduled via OneCycleLR)
Total Batch Size: ~262,144 tokens per step
Weight Decay: 0.01

Dataset

TinyStories (2M): A collection of synthetic stories focusing on a vocabulary a 3-year-old would understand, but with the structural complexity of professional writing. This allows the model to learn "reasoning" without the noise of the open web.

💻 Usage: Quick Start

You can load and run Test-1-2000 using the Hugging Face transformers library:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "GODELEV/Test-1-2000"

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

# Prepare Prompt
prompt = "Once upon a time, Tom found a blue car."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate Story
output = model.generate(
    **inputs, 
    max_new_tokens=200, 
    temperature=0.7, 
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1,
    # CRITICAL ADDITIONS BELOW:
    eos_token_id=tokenizer.eos_token_id, 
    pad_token_id=tokenizer.pad_token_id  
)

# Set skip_special_tokens=False first just to verify if the token is there(|endoftext|) 
# then switch back to True for a clean output.
print(tokenizer.decode(output[0], skip_special_tokens=True))

Downloads last month: 98

Safetensors

Model size

0.2B params

Tensor type

F32

GODELEV
/

Test-1-2000