Trouter-20B Quick Start Guide
Get up and running with Trouter-20B in minutes.
Installation
pip install transformers torch accelerate bitsandbytes
Basic Usage
Option 1: Full Precision (Requires ~40GB VRAM)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"Trouter-Library/Trouter-20B",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Trouter-Library/Trouter-20B")
prompt = "Explain machine learning:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Option 2: 4-bit Quantization (Requires ~10GB VRAM) ⭐ Recommended
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
"Trouter-Library/Trouter-20B",
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Trouter-Library/Trouter-20B")
prompt = "Explain machine learning:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Chat Interface
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
# Load model
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
"Trouter-Library/Trouter-20B",
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Trouter-Library/Trouter-20B")
# Create conversation
messages = [
{"role": "user", "content": "What is quantum computing?"}
]
# Apply chat template
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
# Generate response
outputs = model.generate(
**inputs,
max_new_tokens=300,
temperature=0.7,
top_p=0.95,
do_sample=True
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)
# Continue conversation
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": "Can you explain it more simply?"})
Generation Parameters
Adjust these for different use cases:
Creative Writing (More Random)
outputs = model.generate(
**inputs,
max_new_tokens=500,
temperature=0.9, # Higher = more creative
top_p=0.95,
top_k=50,
do_sample=True
)
Factual/Technical (More Deterministic)
outputs = model.generate(
**inputs,
max_new_tokens=300,
temperature=0.3, # Lower = more focused
top_p=0.9,
do_sample=True
)
Code Generation (Precise)
outputs = model.generate(
**inputs,
max_new_tokens=400,
temperature=0.2,
top_p=0.95,
repetition_penalty=1.1,
do_sample=True
)
Memory Requirements
| Configuration | VRAM Required | Setup |
|---|---|---|
| Full (BF16) | ~40GB | torch_dtype=torch.bfloat16 |
| 8-bit | ~20GB | load_in_8bit=True |
| 4-bit | ~10GB | 4-bit quantization config |
Common Issues
Out of Memory
- Use 4-bit quantization
- Reduce
max_new_tokens - Clear GPU cache:
torch.cuda.empty_cache()
Slow Generation
- Use smaller
max_new_tokens - Set
do_sample=Falsefor greedy decoding - Reduce batch size
Poor Quality
- Adjust temperature (0.7-0.9 for most tasks)
- Increase max_new_tokens
- Try different prompts
Next Steps
- See USAGE_GUIDE.md for advanced examples
- Check examples.py for code samples
- Read EVALUATION.md for benchmark results
Simple Copy-Paste Example
# Install first: pip install transformers torch accelerate bitsandbytes
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch
# Load model (4-bit for efficiency)
model = AutoModelForCausalLM.from_pretrained(
"Trouter-Library/Trouter-20B",
quantization_config=BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
),
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Trouter-Library/Trouter-20B")
# Generate text
prompt = "Write a Python function to calculate factorial:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
That's it! You're ready to use Trouter-20B.