Trouter-Library commited on
Commit
85c988d
·
verified ·
1 Parent(s): c11b4a4

Create QUICKSTART.md

Browse files
Files changed (1) hide show
  1. QUICKSTART.md +200 -0
QUICKSTART.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Trouter-20B Quick Start Guide
2
+
3
+ Get up and running with Trouter-20B in minutes.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ pip install transformers torch accelerate bitsandbytes
9
+ ```
10
+
11
+ ## Basic Usage
12
+
13
+ ### Option 1: Full Precision (Requires ~40GB VRAM)
14
+
15
+ ```python
16
+ from transformers import AutoTokenizer, AutoModelForCausalLM
17
+ import torch
18
+
19
+ model = AutoModelForCausalLM.from_pretrained(
20
+ "Trouter-Library/Trouter-20B",
21
+ torch_dtype=torch.bfloat16,
22
+ device_map="auto"
23
+ )
24
+ tokenizer = AutoTokenizer.from_pretrained("Trouter-Library/Trouter-20B")
25
+
26
+ prompt = "Explain machine learning:"
27
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
28
+ outputs = model.generate(**inputs, max_new_tokens=200)
29
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
30
+ ```
31
+
32
+ ### Option 2: 4-bit Quantization (Requires ~10GB VRAM) ⭐ Recommended
33
+
34
+ ```python
35
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
36
+ import torch
37
+
38
+ bnb_config = BitsAndBytesConfig(
39
+ load_in_4bit=True,
40
+ bnb_4bit_quant_type="nf4",
41
+ bnb_4bit_compute_dtype=torch.bfloat16
42
+ )
43
+
44
+ model = AutoModelForCausalLM.from_pretrained(
45
+ "Trouter-Library/Trouter-20B",
46
+ quantization_config=bnb_config,
47
+ device_map="auto"
48
+ )
49
+ tokenizer = AutoTokenizer.from_pretrained("Trouter-Library/Trouter-20B")
50
+
51
+ prompt = "Explain machine learning:"
52
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
53
+ outputs = model.generate(**inputs, max_new_tokens=200)
54
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
55
+ ```
56
+
57
+ ## Chat Interface
58
+
59
+ ```python
60
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
61
+ import torch
62
+
63
+ # Load model
64
+ bnb_config = BitsAndBytesConfig(
65
+ load_in_4bit=True,
66
+ bnb_4bit_quant_type="nf4",
67
+ bnb_4bit_compute_dtype=torch.bfloat16
68
+ )
69
+
70
+ model = AutoModelForCausalLM.from_pretrained(
71
+ "Trouter-Library/Trouter-20B",
72
+ quantization_config=bnb_config,
73
+ device_map="auto"
74
+ )
75
+ tokenizer = AutoTokenizer.from_pretrained("Trouter-Library/Trouter-20B")
76
+
77
+ # Create conversation
78
+ messages = [
79
+ {"role": "user", "content": "What is quantum computing?"}
80
+ ]
81
+
82
+ # Apply chat template
83
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
84
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
85
+
86
+ # Generate response
87
+ outputs = model.generate(
88
+ **inputs,
89
+ max_new_tokens=300,
90
+ temperature=0.7,
91
+ top_p=0.95,
92
+ do_sample=True
93
+ )
94
+
95
+ response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
96
+ print(response)
97
+
98
+ # Continue conversation
99
+ messages.append({"role": "assistant", "content": response})
100
+ messages.append({"role": "user", "content": "Can you explain it more simply?"})
101
+ ```
102
+
103
+ ## Generation Parameters
104
+
105
+ Adjust these for different use cases:
106
+
107
+ ### Creative Writing (More Random)
108
+ ```python
109
+ outputs = model.generate(
110
+ **inputs,
111
+ max_new_tokens=500,
112
+ temperature=0.9, # Higher = more creative
113
+ top_p=0.95,
114
+ top_k=50,
115
+ do_sample=True
116
+ )
117
+ ```
118
+
119
+ ### Factual/Technical (More Deterministic)
120
+ ```python
121
+ outputs = model.generate(
122
+ **inputs,
123
+ max_new_tokens=300,
124
+ temperature=0.3, # Lower = more focused
125
+ top_p=0.9,
126
+ do_sample=True
127
+ )
128
+ ```
129
+
130
+ ### Code Generation (Precise)
131
+ ```python
132
+ outputs = model.generate(
133
+ **inputs,
134
+ max_new_tokens=400,
135
+ temperature=0.2,
136
+ top_p=0.95,
137
+ repetition_penalty=1.1,
138
+ do_sample=True
139
+ )
140
+ ```
141
+
142
+ ## Memory Requirements
143
+
144
+ | Configuration | VRAM Required | Setup |
145
+ |--------------|---------------|-------|
146
+ | **Full (BF16)** | ~40GB | `torch_dtype=torch.bfloat16` |
147
+ | **8-bit** | ~20GB | `load_in_8bit=True` |
148
+ | **4-bit** | ~10GB | 4-bit quantization config |
149
+
150
+ ## Common Issues
151
+
152
+ ### Out of Memory
153
+ - Use 4-bit quantization
154
+ - Reduce `max_new_tokens`
155
+ - Clear GPU cache: `torch.cuda.empty_cache()`
156
+
157
+ ### Slow Generation
158
+ - Use smaller `max_new_tokens`
159
+ - Set `do_sample=False` for greedy decoding
160
+ - Reduce batch size
161
+
162
+ ### Poor Quality
163
+ - Adjust temperature (0.7-0.9 for most tasks)
164
+ - Increase max_new_tokens
165
+ - Try different prompts
166
+
167
+ ## Next Steps
168
+
169
+ - See [USAGE_GUIDE.md](./USAGE_GUIDE.md) for advanced examples
170
+ - Check [examples.py](./examples.py) for code samples
171
+ - Read [EVALUATION.md](./EVALUATION.md) for benchmark results
172
+
173
+ ## Simple Copy-Paste Example
174
+
175
+ ```python
176
+ # Install first: pip install transformers torch accelerate bitsandbytes
177
+
178
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
179
+ import torch
180
+
181
+ # Load model (4-bit for efficiency)
182
+ model = AutoModelForCausalLM.from_pretrained(
183
+ "Trouter-Library/Trouter-20B",
184
+ quantization_config=BitsAndBytesConfig(
185
+ load_in_4bit=True,
186
+ bnb_4bit_quant_type="nf4",
187
+ bnb_4bit_compute_dtype=torch.bfloat16
188
+ ),
189
+ device_map="auto"
190
+ )
191
+ tokenizer = AutoTokenizer.from_pretrained("Trouter-Library/Trouter-20B")
192
+
193
+ # Generate text
194
+ prompt = "Write a Python function to calculate factorial:"
195
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
196
+ outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
197
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
198
+ ```
199
+
200
+ That's it! You're ready to use Trouter-20B.