| --- |
| license: mit |
| datasets: |
| - roneneldan/TinyStories |
| language: |
| - en |
| pipeline_tag: text-generation |
| new_version: GODELEV/Test-1-3000 |
| --- |
| # Test-1-2000: A 190M Parameter Narrative Engine |
|
|
| **Test-1-2000** is a compact, high-performance Transformer model based on the Llama architecture. It was specifically trained to master long-range narrative consistency and logical coherence within the domain of short-form storytelling. |
|
|
| Despite its efficient parameter count, the model demonstrates sophisticated "world-modeling" capabilities—understanding cause-and-effect, emotional nuance, and character persistence over extended contexts. |
|
|
| ## 🚀 Model Highlights |
| * **Architecture:** Llama-based Decoder-only Transformer |
| * **Parameters:** 190.55 Million |
| * **Context Window:** 2048 Tokens |
| * **Final Training Loss:** 1.27 (at Step 2,000) |
| * **Optimization:** Fully compiled via `torch.compile` with Flash Attention 2 support. |
|
|
| --- |
|
|
| ## 🧠 Model Structure & Design |
|
|
| The model utilizes a modern LLM blueprint designed for training stability and inference speed. By implementing **Rotary Positional Embeddings (RoPE)**, the model maintains a precise understanding of token relationships across its full 2048-token window, doubling the standard context length of most models in this size class. |
|
|
| ### Technical Specifications |
| | Feature | Specification | |
| | :--- | :--- | |
| | **Hidden Dimension** | 768 | |
| | **Layers (Depth)** | 12 | |
| | **Attention Heads** | 12 | |
| | **Intermediate Size** | 3072 | |
| | **Activation Function** | SwiGLU | |
| | **Normalization** | RMSNorm | |
| | **Vocab Size** | 50,257 (GPT-2 Tokenizer) | |
|
|
| --- |
|
|
| ## 📈 The Evolution of Learning |
|
|
| Training on the **TinyStories** dataset allows us to observe the model's cognitive development in distinct phases. **Test-1-2000** achieved high literacy by progressing through these stages: |
|
|
| ### 1. The Lexical Phase (Steps 0 – 250) |
| The model mastered basic English syntax and frequent patterns. It learned that "Once upon a time" is the standard anchor for its world. |
|
|
| ### 2. The Relational Phase (Steps 250 – 1,000) |
| The model began connecting nouns with logical actions. It started understanding "spatial" logic—that if a character is in a park, they are likely playing or seeing trees. Loss dipped significantly below 1.5. |
|
|
| ### 3. The Coherence Phase (Steps 1,000 – 2,000) |
| The final phase of this run focused on **Narrative Resolution**. The model learned to close the loops it opened, ensuring that a story starting with a problem (e.g., "Lily was bored") ends with a logical solution. |
|
|
| --- |
|
|
| ## 🛠 Training Configuration |
|
|
| ### Hyperparameters |
| * **Precision:** `bfloat16` Mixed Precision |
| * **Optimizer:** AdamW ($\beta_1=0.9, \beta_2=0.95$) |
| * **Learning Rate:** 5e-4 (Scheduled via `OneCycleLR`) |
| * **Total Batch Size:** ~262,144 tokens per step |
| * **Weight Decay:** 0.01 |
|
|
| ### Dataset |
| **TinyStories (2M):** A collection of synthetic stories focusing on a vocabulary a 3-year-old would understand, but with the structural complexity of professional writing. This allows the model to learn "reasoning" without the noise of the open web. |
|
|
| --- |
|
|
| ## 💻 Usage: Quick Start |
|
|
| You can load and run **Test-1-2000** using the Hugging Face `transformers` library: |
| ```python |
| import torch |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_path = "GODELEV/Test-1-2000" |
| |
| # Load Tokenizer and Model |
| tokenizer = AutoTokenizer.from_pretrained(model_path) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_path, |
| torch_dtype=torch.bfloat16, |
| device_map="auto" |
| ) |
| |
| # Prepare Prompt |
| prompt = "Once upon a time, Tom found a blue car." |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| # Generate Story |
| output = model.generate( |
| **inputs, |
| max_new_tokens=200, |
| temperature=0.7, |
| top_p=0.9, |
| do_sample=True, |
| repetition_penalty=1.1, |
| # CRITICAL ADDITIONS BELOW: |
| eos_token_id=tokenizer.eos_token_id, |
| pad_token_id=tokenizer.pad_token_id |
| ) |
| |
| # Set skip_special_tokens=False first just to verify if the token is there(|endoftext|) |
| # then switch back to True for a clean output. |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) |