--- license: mit datasets: - roneneldan/TinyStories language: - en pipeline_tag: text-generation new_version: GODELEV/Test-1-3000 --- # Test-1-2000: A 190M Parameter Narrative Engine **Test-1-2000** is a compact, high-performance Transformer model based on the Llama architecture. It was specifically trained to master long-range narrative consistency and logical coherence within the domain of short-form storytelling. Despite its efficient parameter count, the model demonstrates sophisticated "world-modeling" capabilities—understanding cause-and-effect, emotional nuance, and character persistence over extended contexts. ## 🚀 Model Highlights * **Architecture:** Llama-based Decoder-only Transformer * **Parameters:** 190.55 Million * **Context Window:** 2048 Tokens * **Final Training Loss:** 1.27 (at Step 2,000) * **Optimization:** Fully compiled via `torch.compile` with Flash Attention 2 support. --- ## 🧠 Model Structure & Design The model utilizes a modern LLM blueprint designed for training stability and inference speed. By implementing **Rotary Positional Embeddings (RoPE)**, the model maintains a precise understanding of token relationships across its full 2048-token window, doubling the standard context length of most models in this size class. ### Technical Specifications | Feature | Specification | | :--- | :--- | | **Hidden Dimension** | 768 | | **Layers (Depth)** | 12 | | **Attention Heads** | 12 | | **Intermediate Size** | 3072 | | **Activation Function** | SwiGLU | | **Normalization** | RMSNorm | | **Vocab Size** | 50,257 (GPT-2 Tokenizer) | --- ## 📈 The Evolution of Learning Training on the **TinyStories** dataset allows us to observe the model's cognitive development in distinct phases. **Test-1-2000** achieved high literacy by progressing through these stages: ### 1. The Lexical Phase (Steps 0 – 250) The model mastered basic English syntax and frequent patterns. It learned that "Once upon a time" is the standard anchor for its world. ### 2. The Relational Phase (Steps 250 – 1,000) The model began connecting nouns with logical actions. It started understanding "spatial" logic—that if a character is in a park, they are likely playing or seeing trees. Loss dipped significantly below 1.5. ### 3. The Coherence Phase (Steps 1,000 – 2,000) The final phase of this run focused on **Narrative Resolution**. The model learned to close the loops it opened, ensuring that a story starting with a problem (e.g., "Lily was bored") ends with a logical solution. --- ## 🛠 Training Configuration ### Hyperparameters * **Precision:** `bfloat16` Mixed Precision * **Optimizer:** AdamW ($\beta_1=0.9, \beta_2=0.95$) * **Learning Rate:** 5e-4 (Scheduled via `OneCycleLR`) * **Total Batch Size:** ~262,144 tokens per step * **Weight Decay:** 0.01 ### Dataset **TinyStories (2M):** A collection of synthetic stories focusing on a vocabulary a 3-year-old would understand, but with the structural complexity of professional writing. This allows the model to learn "reasoning" without the noise of the open web. --- ## 💻 Usage: Quick Start You can load and run **Test-1-2000** using the Hugging Face `transformers` library: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_path = "GODELEV/Test-1-2000" # Load Tokenizer and Model tokenizer = AutoTokenizer.from_pretrained(model_path) model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype=torch.bfloat16, device_map="auto" ) # Prepare Prompt prompt = "Once upon a time, Tom found a blue car." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # Generate Story output = model.generate( **inputs, max_new_tokens=200, temperature=0.7, top_p=0.9, do_sample=True, repetition_penalty=1.1, # CRITICAL ADDITIONS BELOW: eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id ) # Set skip_special_tokens=False first just to verify if the token is there(|endoftext|) # then switch back to True for a clean output. print(tokenizer.decode(output[0], skip_special_tokens=True))