Test-1-2000: A 190M Parameter Narrative Engine

Test-1-2000 is a compact, high-performance Transformer model based on the Llama architecture. It was specifically trained to master long-range narrative consistency and logical coherence within the domain of short-form storytelling.

Despite its efficient parameter count, the model demonstrates sophisticated "world-modeling" capabilitiesβ€”understanding cause-and-effect, emotional nuance, and character persistence over extended contexts.

πŸš€ Model Highlights

  • Architecture: Llama-based Decoder-only Transformer
  • Parameters: 190.55 Million
  • Context Window: 2048 Tokens
  • Final Training Loss: 1.27 (at Step 2,000)
  • Optimization: Fully compiled via torch.compile with Flash Attention 2 support.

🧠 Model Structure & Design

The model utilizes a modern LLM blueprint designed for training stability and inference speed. By implementing Rotary Positional Embeddings (RoPE), the model maintains a precise understanding of token relationships across its full 2048-token window, doubling the standard context length of most models in this size class.

Technical Specifications

Feature Specification
Hidden Dimension 768
Layers (Depth) 12
Attention Heads 12
Intermediate Size 3072
Activation Function SwiGLU
Normalization RMSNorm
Vocab Size 50,257 (GPT-2 Tokenizer)

πŸ“ˆ The Evolution of Learning

Training on the TinyStories dataset allows us to observe the model's cognitive development in distinct phases. Test-1-2000 achieved high literacy by progressing through these stages:

1. The Lexical Phase (Steps 0 – 250)

The model mastered basic English syntax and frequent patterns. It learned that "Once upon a time" is the standard anchor for its world.

2. The Relational Phase (Steps 250 – 1,000)

The model began connecting nouns with logical actions. It started understanding "spatial" logicβ€”that if a character is in a park, they are likely playing or seeing trees. Loss dipped significantly below 1.5.

3. The Coherence Phase (Steps 1,000 – 2,000)

The final phase of this run focused on Narrative Resolution. The model learned to close the loops it opened, ensuring that a story starting with a problem (e.g., "Lily was bored") ends with a logical solution.


πŸ›  Training Configuration

Hyperparameters

  • Precision: bfloat16 Mixed Precision
  • Optimizer: AdamW ($\beta_1=0.9, \beta_2=0.95$)
  • Learning Rate: 5e-4 (Scheduled via OneCycleLR)
  • Total Batch Size: ~262,144 tokens per step
  • Weight Decay: 0.01

Dataset

TinyStories (2M): A collection of synthetic stories focusing on a vocabulary a 3-year-old would understand, but with the structural complexity of professional writing. This allows the model to learn "reasoning" without the noise of the open web.


πŸ’» Usage: Quick Start

You can load and run Test-1-2000 using the Hugging Face transformers library:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "GODELEV/Test-1-2000"

# Load Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)

# Prepare Prompt
prompt = "Once upon a time, Tom found a blue car."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate Story
output = model.generate(
    **inputs, 
    max_new_tokens=200, 
    temperature=0.7, 
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1,
    # CRITICAL ADDITIONS BELOW:
    eos_token_id=tokenizer.eos_token_id, 
    pad_token_id=tokenizer.pad_token_id  
)

# Set skip_special_tokens=False first just to verify if the token is there(|endoftext|) 
# then switch back to True for a clean output.
print(tokenizer.decode(output[0], skip_special_tokens=True))
Downloads last month
98
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train GODELEV/Test-1-2000