Test-1-2000 / README.md

Update README.md

83835bd verified 2 days ago

4.16 kB

	---
	license: mit
	datasets:
	- roneneldan/TinyStories
	language:
	- en
	pipeline_tag: text-generation
	new_version: GODELEV/Test-1-3000
	---
	# Test-1-2000: A 190M Parameter Narrative Engine

	Test-1-2000 is a compact, high-performance Transformer model based on the Llama architecture. It was specifically trained to master long-range narrative consistency and logical coherence within the domain of short-form storytelling.

	Despite its efficient parameter count, the model demonstrates sophisticated "world-modeling" capabilities—understanding cause-and-effect, emotional nuance, and character persistence over extended contexts.

	## 🚀 Model Highlights
	* Architecture: Llama-based Decoder-only Transformer
	* Parameters: 190.55 Million
	* Context Window: 2048 Tokens
	* Final Training Loss: 1.27 (at Step 2,000)
	* Optimization: Fully compiled via `torch.compile` with Flash Attention 2 support.

	---

	## 🧠 Model Structure & Design

	The model utilizes a modern LLM blueprint designed for training stability and inference speed. By implementing Rotary Positional Embeddings (RoPE), the model maintains a precise understanding of token relationships across its full 2048-token window, doubling the standard context length of most models in this size class.

	### Technical Specifications
	\| Feature \| Specification \|
	\| :--- \| :--- \|
	\| Hidden Dimension \| 768 \|
	\| Layers (Depth) \| 12 \|
	\| Attention Heads \| 12 \|
	\| Intermediate Size \| 3072 \|
	\| Activation Function \| SwiGLU \|
	\| Normalization \| RMSNorm \|
	\| Vocab Size \| 50,257 (GPT-2 Tokenizer) \|

	---

	## 📈 The Evolution of Learning

	Training on the TinyStories dataset allows us to observe the model's cognitive development in distinct phases. Test-1-2000 achieved high literacy by progressing through these stages:

	### 1. The Lexical Phase (Steps 0 – 250)
	The model mastered basic English syntax and frequent patterns. It learned that "Once upon a time" is the standard anchor for its world.

	### 2. The Relational Phase (Steps 250 – 1,000)
	The model began connecting nouns with logical actions. It started understanding "spatial" logic—that if a character is in a park, they are likely playing or seeing trees. Loss dipped significantly below 1.5.

	### 3. The Coherence Phase (Steps 1,000 – 2,000)
	The final phase of this run focused on Narrative Resolution. The model learned to close the loops it opened, ensuring that a story starting with a problem (e.g., "Lily was bored") ends with a logical solution.

	---

	## 🛠 Training Configuration

	### Hyperparameters
	* Precision: `bfloat16` Mixed Precision
	* Optimizer: AdamW ($\beta_1=0.9, \beta_2=0.95$)
	* Learning Rate: 5e-4 (Scheduled via `OneCycleLR`)
	* Total Batch Size: ~262,144 tokens per step
	* Weight Decay: 0.01

	### Dataset
	TinyStories (2M): A collection of synthetic stories focusing on a vocabulary a 3-year-old would understand, but with the structural complexity of professional writing. This allows the model to learn "reasoning" without the noise of the open web.

	---

	## 💻 Usage: Quick Start

	You can load and run Test-1-2000 using the Hugging Face `transformers` library:
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_path = "GODELEV/Test-1-2000"

	# Load Tokenizer and Model
	tokenizer = AutoTokenizer.from_pretrained(model_path)
	model = AutoModelForCausalLM.from_pretrained(
	model_path,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)

	# Prepare Prompt
	prompt = "Once upon a time, Tom found a blue car."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	# Generate Story
	output = model.generate(
	**inputs,
	max_new_tokens=200,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	repetition_penalty=1.1,
	# CRITICAL ADDITIONS BELOW:
	eos_token_id=tokenizer.eos_token_id,
	pad_token_id=tokenizer.pad_token_id
	)

	# Set skip_special_tokens=False first just to verify if the token is there(\|endoftext\|)
	# then switch back to True for a clean output.
	print(tokenizer.decode(output[0], skip_special_tokens=True))