File size: 5,512 Bytes
2dd27f2 82fce22 2dd27f2 82fce22 2dd27f2 82fce22 2dd27f2 82fce22 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
---
library_name: transformers
license: apache-2.0
datasets:
- roneneldan/TinyStories
language:
- en
---
# Tiny Recursive Model (TRM)
A compact language model featuring a recursive architecture designed for efficient text generation. This model uses a custom `TinyRecursiveModel` class with a ~7M parameter logic core [1].
## Model Details
- **Model Type**: Causal Language Model with Custom Recursive Architecture
- **Parameters**: ~40.21M total parameters (7.39M logic core, 32.82M vocabulary)
- **Architecture**: 3 physical layers, 8 recursive loops, 8 attention heads [1]
- **Vocabulary Size**: 50,257 tokens
- **Context Length**: 1024 tokens
- **Embedding Dimension**: 512
## ⚠️ Important: Custom Model Class
This model uses a **custom `TinyRecursiveModel` class** that is not part of the standard transformers library [1]. You must use `trust_remote_code=True` when loading the model.
## Installation Requirements
```bash
pip install transformers torch
```
## Usage
### Method 1: Using trust_remote_code (Recommended)
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the model and tokenizer (MUST use trust_remote_code=True)
model_name = "ainz/tiny-recursive-model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True # Required for custom model class
)
# Generate text
input_text = "Once upon a time"
inputs = tokenizer(input_text, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs["input_ids"],
max_length=100,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.eos_token_id
)
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)
```
### Method 2: Manual Class Loading
If you prefer not to use `trust_remote_code`, you can manually download and use the model files:
```python
import torch
from huggingface_hub import hf_hub_download
# Download the model files
model_path = hf_hub_download(repo_id="ainz/tiny-recursive-model", filename="pytorch_model.bin")
config_path = hf_hub_download(repo_id="ainz/tiny-recursive-model", filename="config.json")
# You'll need to copy the TinyRecursiveModel class definition locally
# Then load manually:
# model = TinyRecursiveModel.from_pretrained("ainz/tiny-recursive-model")
```
### Batch Generation Example
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model with trust_remote_code
tokenizer = AutoTokenizer.from_pretrained("ainz/tiny-recursive-model")
model = AutoModelForCausalLM.from_pretrained(
"ainz/tiny-recursive-model",
trust_remote_code=True
)
# Generate for multiple prompts
prompts = [
"The future of artificial intelligence",
"In a distant galaxy",
"The secret to happiness"
]
inputs = tokenizer(prompts, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model.generate(
inputs["input_ids"],
attention_mask=inputs["attention_mask"],
max_length=80,
do_sample=True,
temperature=0.7,
pad_token_id=tokenizer.eos_token_id
)
for i, output in enumerate(outputs):
text = tokenizer.decode(output, skip_special_tokens=True)
print(f"Prompt {i+1}: {text}\n")
```
### Advanced Generation Parameters
```python
# More creative generation
outputs = model.generate(
inputs["input_ids"],
max_length=150,
do_sample=True,
temperature=0.8, # Higher = more creative
top_k=50, # Consider top 50 tokens
top_p=0.95, # Nucleus sampling
repetition_penalty=1.1, # Reduce repetition
pad_token_id=tokenizer.eos_token_id
)
# Deterministic generation
outputs = model.generate(
inputs["input_ids"],
max_length=100,
do_sample=False, # Greedy decoding
pad_token_id=tokenizer.eos_token_id
)
```
## Architecture Overview
This model implements a novel recursive architecture where layers are reused multiple times through loops [1]. Key features:
- **Recursive Layers**: 3 physical transformer layers recursively applied 8 times
- **Parameter Efficiency**: Achieves 7.39M logic parameters through recursive design
- **Custom Implementation**: Uses `TinyRecursiveModel` class with `TRMConfig`
## Model Performance
Training completed with:
- **Final Training Loss**: ~2.0
- **Training Steps**: 7,032 (1 epoch)
- **Parameter Breakdown**: 7.39M logic core + 32.82M vocabulary
## Security Note
This model requires `trust_remote_code=True` because it uses custom model architecture code. Only use this if you trust the model source.
## Troubleshooting
**Error loading model?**
- Make sure you're using `trust_remote_code=True`
- Ensure you have the latest transformers version: `pip install --upgrade transformers`
**Generation issues?**
- The model is relatively small (7.39M logic parameters) - adjust temperature and sampling parameters
- Try different prompt formats for better results
## Limitations
- Small model size (~7M logic parameters) may limit performance compared to larger models
- Custom architecture requires `trust_remote_code=True`
- Best suited for creative writing and simple text completion tasks
## Citation
```bibtex
@model{tiny_recursive_model_2024,
author = {ainz},
title = {Tiny Recursive Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/ainz/tiny-recursive-model}
}
``` |