manbeast3b commited on
Commit
7021d37
ยท
verified ยท
1 Parent(s): 9708f65

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +264 -34
README.md CHANGED
@@ -9,36 +9,107 @@ tags:
9
  - commoneval
10
  - wildvoice
11
  - voicebench
 
12
  ---
13
 
14
- # Qwen2.5-0.5B-Instruct Text Classification Model
15
 
16
- This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) using LoRA (Low-Rank Adaptation) for text classification tasks.
17
 
18
- ## Model Description
19
 
20
- The model has been trained to classify text into three categories:
21
- - **ifeval**: Instruction-following tasks with specific formatting requirements
22
- - **commoneval**: Factual questions and knowledge-based queries
23
- - **wildvoice**: Conversational, informal language patterns
 
 
24
 
25
- ## Performance
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- - **Overall Accuracy**: 93.33% (28/30 correct)
28
- - **ifeval**: 100% (10/10)
29
- - **commoneval**: 80% (8/10)
30
- - **wildvoice**: 100% (10/10)
31
 
32
- ## Usage
 
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
  ```python
35
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
36
 
37
  # Load the model and tokenizer
38
- model = AutoModelForCausalLM.from_pretrained("your-username/qwen2.5-0.5b-text-classification")
39
- tokenizer = AutoTokenizer.from_pretrained("your-username/qwen2.5-0.5b-text-classification")
40
 
41
- # Example usage
42
  def classify_text(text):
43
  prompt = f"Text: {text}\nLabel:"
44
  inputs = tokenizer(prompt, return_tensors="pt")
@@ -68,37 +139,196 @@ print(classify_text("Hey, how are you doing today?"))
68
  # Output: wildvoice
69
  ```
70
 
71
- ## Training Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
 
 
 
 
 
 
 
 
73
  - **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
74
- - **Method**: LoRA (Low-Rank Adaptation)
75
- - **Trainable Parameters**: 0.88% of total parameters
76
- - **Learning Rate**: 5e-4
77
- - **Batch Size**: 2
78
- - **Training Steps**: 150
79
- - **Max Length**: 128
80
 
81
- ## Dataset
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
 
 
83
  The model was trained on synthetic data representing three text categories:
84
- - Instruction-following tasks (ifeval)
85
- - Factual questions (commoneval)
86
- - Conversational text (wildvoice)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
- ## Limitations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
- - The model was trained on synthetic data and may not generalize well to all real-world text
91
- - Performance may vary on text that doesn't clearly fit the three defined categories
92
- - The model is optimized for the specific prompt format used during training
 
 
93
 
94
- ## Citation
 
 
 
 
 
 
 
 
95
 
96
  ```bibtex
97
  @misc{qwen2.5-0.5b-text-classification,
98
- title={Qwen2.5-0.5B Text Classification Model},
99
  author={Your Name},
100
  year={2024},
101
  publisher={Hugging Face},
102
- howpublished={\url{https://huggingface.co/your-username/qwen2.5-0.5b-text-classification}}
 
103
  }
104
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - commoneval
10
  - wildvoice
11
  - voicebench
12
+ - fine-tuned
13
  ---
14
 
15
+ # Qwen2.5-0.5B Text Classification Model
16
 
17
+ This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) using LoRA (Low-Rank Adaptation) for text classification tasks. The model has been specifically trained to classify text into three categories based on VoiceBench dataset patterns.
18
 
19
+ ## ๐ŸŽฏ Model Description
20
 
21
+ The model has been trained to classify text into three distinct categories:
22
+ - **ifeval**: Instruction-following tasks with specific formatting requirements and step-by-step instructions
23
+ - **commoneval**: Factual questions and knowledge-based queries requiring direct answers
24
+ - **wildvoice**: Conversational, informal language patterns and natural dialogue
25
+
26
+ ## ๐Ÿ“Š Performance Results
27
 
28
+ ### Overall Performance
29
+ - **Overall Accuracy**: **93.33%** (28/30 correct predictions)
30
+ - **Training Method**: LoRA (Low-Rank Adaptation)
31
+ - **Trainable Parameters**: 0.88% of total parameters (4,399,104 out of 498,431,872)
32
+
33
+ ### Per-Category Performance
34
+ | Category | Accuracy | Correct/Total | Description |
35
+ |----------|----------|---------------|-------------|
36
+ | **ifeval** | **100%** | 10/10 | Perfect performance on instruction-following tasks |
37
+ | **commoneval** | **80%** | 8/10 | Good performance on factual questions |
38
+ | **wildvoice** | **100%** | 10/10 | Perfect performance on conversational text |
39
+
40
+ ### Confusion Matrix
41
+ ```
42
+ ifeval:
43
+ -> ifeval: 10
44
+ commoneval:
45
+ -> commoneval: 8
46
+ -> unknown: 1
47
+ -> wildvoice: 1
48
+ wildvoice:
49
+ -> wildvoice: 10
50
+ ```
51
 
52
+ ## ๐Ÿ”ฌ Development Journey & Methods Tried
 
 
 
53
 
54
+ ### Initial Challenges
55
+ We started with several approaches that didn't work well:
56
 
57
+ 1. **GRPO (Group Relative Policy Optimization)**: Initial attempts with GRPO training showed poor performance
58
+ - Loss decreased but model wasn't learning classification
59
+ - Model generated irrelevant responses like "unknown", "txt", "com"
60
+ - Overall accuracy: ~20%
61
+
62
+ 2. **Full Fine-tuning**: Attempted full fine-tuning of larger models
63
+ - CUDA out of memory issues with larger models
64
+ - Numerical instability with certain model architectures
65
+ - Poor convergence on classification task
66
+
67
+ 3. **Complex Prompt Formats**: Tried various complex prompt structures
68
+ - "Classify this text as ifeval, commoneval, or wildvoice: ..."
69
+ - Model struggled with complex instructions
70
+ - Generated explanations instead of simple labels
71
+
72
+ ### Breakthrough: Direct Classification Approach
73
+
74
+ The key breakthrough came with a **direct, simple approach**:
75
+
76
+ #### 1. **Simplified Prompt Format**
77
+ Instead of complex classification prompts, we used a simple format:
78
+ ```
79
+ Text: {input_text}
80
+ Label: {expected_label}
81
+ ```
82
+
83
+ #### 2. **LoRA (Low-Rank Adaptation)**
84
+ - Used PEFT library for efficient fine-tuning
85
+ - Only trained 0.88% of parameters
86
+ - Much more stable than full fine-tuning
87
+ - Faster training and inference
88
+
89
+ #### 3. **Focused Training Data**
90
+ Created clear, distinct examples for each category:
91
+ - **ifeval**: Instruction-following with specific formatting requirements
92
+ - **commoneval**: Factual questions requiring direct answers
93
+ - **wildvoice**: Conversational, informal language patterns
94
+
95
+ #### 4. **Optimal Hyperparameters**
96
+ - **Learning Rate**: 5e-4 (higher than initial attempts)
97
+ - **Batch Size**: 2 (smaller for stability)
98
+ - **Max Length**: 128 (shorter sequences)
99
+ - **Training Steps**: 150
100
+ - **LoRA Rank**: 8 (focused learning)
101
+
102
+ ## ๐Ÿš€ Usage
103
+
104
+ ### Basic Usage
105
  ```python
106
  from transformers import AutoModelForCausalLM, AutoTokenizer
107
+ import torch
108
 
109
  # Load the model and tokenizer
110
+ model = AutoModelForCausalLM.from_pretrained("manbeast3b/qwen2.5-0.5b-text-classification")
111
+ tokenizer = AutoTokenizer.from_pretrained("manbeast3b/qwen2.5-0.5b-text-classification")
112
 
 
113
  def classify_text(text):
114
  prompt = f"Text: {text}\nLabel:"
115
  inputs = tokenizer(prompt, return_tensors="pt")
 
139
  # Output: wildvoice
140
  ```
141
 
142
+ ### Advanced Usage with Confidence Scoring
143
+ ```python
144
+ def classify_with_confidence(text, num_samples=5):
145
+ predictions = []
146
+ for _ in range(num_samples):
147
+ prompt = f"Text: {text}\nLabel:"
148
+ inputs = tokenizer(prompt, return_tensors="pt")
149
+
150
+ with torch.no_grad():
151
+ generated = model.generate(
152
+ **inputs,
153
+ max_new_tokens=15,
154
+ do_sample=True,
155
+ temperature=0.3, # Slightly higher for diversity
156
+ top_p=0.9,
157
+ pad_token_id=tokenizer.eos_token_id,
158
+ eos_token_id=tokenizer.eos_token_id,
159
+ )
160
+
161
+ response = tokenizer.decode(generated[0], skip_special_tokens=True)
162
+ prediction = response[len(prompt):].strip().lower()
163
+
164
+ # Clean up prediction
165
+ if 'ifeval' in prediction:
166
+ prediction = 'ifeval'
167
+ elif 'commoneval' in prediction:
168
+ prediction = 'commoneval'
169
+ elif 'wildvoice' in prediction:
170
+ prediction = 'wildvoice'
171
+ else:
172
+ prediction = 'unknown'
173
+
174
+ predictions.append(prediction)
175
+
176
+ # Calculate confidence
177
+ from collections import Counter
178
+ counts = Counter(predictions)
179
+ most_common = counts.most_common(1)[0]
180
+ confidence = most_common[1] / len(predictions)
181
+
182
+ return most_common[0], confidence
183
 
184
+ # Example with confidence
185
+ label, confidence = classify_with_confidence("Please follow these steps: 1) Read 2) Think 3) Write")
186
+ print(f"Prediction: {label}, Confidence: {confidence:.2%}")
187
+ ```
188
+
189
+ ## ๐Ÿ“ˆ Training Details
190
+
191
+ ### Model Architecture
192
  - **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
193
+ - **Parameters**: 498,431,872 total, 4,399,104 trainable (0.88%)
194
+ - **Precision**: FP16 (mixed precision)
195
+ - **Device**: CUDA (GPU accelerated)
 
 
 
196
 
197
+ ### Training Configuration
198
+ ```python
199
+ # LoRA Configuration
200
+ lora_config = LoraConfig(
201
+ task_type=TaskType.CAUSAL_LM,
202
+ r=8, # Rank
203
+ lora_alpha=16, # LoRA alpha
204
+ lora_dropout=0.1,
205
+ target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
206
+ )
207
+
208
+ # Training Arguments
209
+ training_args = TrainingArguments(
210
+ learning_rate=5e-4,
211
+ per_device_train_batch_size=2,
212
+ max_steps=150,
213
+ max_length=128,
214
+ fp16=True,
215
+ gradient_accumulation_steps=1,
216
+ warmup_steps=20,
217
+ weight_decay=0.01,
218
+ max_grad_norm=1.0
219
+ )
220
+ ```
221
 
222
+ ### Dataset
223
  The model was trained on synthetic data representing three text categories:
224
+ - **60 total samples** (20 per category)
225
+ - **ifeval**: Instruction-following tasks with specific formatting requirements
226
+ - **commoneval**: Factual questions and knowledge-based queries
227
+ - **wildvoice**: Conversational, informal language patterns
228
+
229
+ ## ๐Ÿ” Error Analysis
230
+
231
+ ### Failed Predictions (2 out of 30)
232
+ 1. **"What is 2 plus 2?"** โ†’ Predicted: `unknown` (Expected: `commoneval`)
233
+ - Model generated: `#eval{1} Label: #eval{2} Label: #`
234
+ - Issue: Model generated code-like syntax instead of simple label
235
+
236
+ 2. **"What is the opposite of hot?"** โ†’ Predicted: `wildvoice` (Expected: `commoneval`)
237
+ - Model generated: `#wildvoice:comoneval:hot:yourresponse:whatis`
238
+ - Issue: Model generated complex response instead of simple label
239
+
240
+ ### Success Factors
241
+ - **Simple prompt format** was crucial for success
242
+ - **LoRA fine-tuning** provided stable training
243
+ - **Focused training data** with clear category distinctions
244
+ - **Appropriate hyperparameters** (learning rate, batch size, etc.)
245
+
246
+ ## ๐Ÿ› ๏ธ Technical Implementation
247
+
248
+ ### Files Structure
249
+ ```
250
+ merged_classification_model/
251
+ โ”œโ”€โ”€ README.md # This file
252
+ โ”œโ”€โ”€ config.json # Model configuration
253
+ โ”œโ”€โ”€ generation_config.json # Generation settings
254
+ โ”œโ”€โ”€ model.safetensors # Model weights (988MB)
255
+ โ”œโ”€โ”€ tokenizer.json # Tokenizer vocabulary
256
+ โ”œโ”€โ”€ tokenizer_config.json # Tokenizer configuration
257
+ โ”œโ”€โ”€ special_tokens_map.json # Special tokens mapping
258
+ โ”œโ”€โ”€ added_tokens.json # Added tokens
259
+ โ”œโ”€โ”€ merges.txt # BPE merges
260
+ โ”œโ”€โ”€ vocab.json # Vocabulary
261
+ โ””โ”€โ”€ chat_template.jinja # Chat template
262
+ ```
263
 
264
+ ### Dependencies
265
+ ```bash
266
+ pip install transformers>=4.56.0
267
+ pip install torch>=2.0.0
268
+ pip install peft>=0.17.0
269
+ pip install accelerate>=0.21.0
270
+ ```
271
+
272
+ ## ๐ŸŽฏ Use Cases
273
+
274
+ This model is particularly useful for:
275
+ - **Text categorization** in educational platforms
276
+ - **Content filtering** based on text type
277
+ - **Dataset preprocessing** for machine learning pipelines
278
+ - **VoiceBench-style evaluation** systems
279
+ - **Instruction following detection** in AI systems
280
+ - **Conversational vs. factual text separation**
281
+
282
+ ## โš ๏ธ Limitations
283
 
284
+ 1. **Synthetic Training Data**: Model was trained on synthetic data and may not generalize perfectly to all real-world text
285
+ 2. **Three-Category Limitation**: Only classifies into the three predefined categories
286
+ 3. **Prompt Sensitivity**: Performance may vary with different prompt formats
287
+ 4. **Edge Cases**: Some edge cases (like mathematical questions) may be misclassified
288
+ 5. **Language**: Primarily trained on English text
289
 
290
+ ## ๐Ÿ”ฎ Future Improvements
291
+
292
+ 1. **Larger Training Dataset**: Use real VoiceBench data with proper audio transcription
293
+ 2. **More Categories**: Expand to include additional text types
294
+ 3. **Multilingual Support**: Train on multiple languages
295
+ 4. **Confidence Calibration**: Improve confidence scoring
296
+ 5. **Few-shot Learning**: Add support for few-shot classification
297
+
298
+ ## ๐Ÿ“š Citation
299
 
300
  ```bibtex
301
  @misc{qwen2.5-0.5b-text-classification,
302
+ title={Qwen2.5-0.5B Text Classification Model for VoiceBench-style Evaluation},
303
  author={Your Name},
304
  year={2024},
305
  publisher={Hugging Face},
306
+ howpublished={\url{https://huggingface.co/manbeast3b/qwen2.5-0.5b-text-classification}},
307
+ note={Fine-tuned using LoRA on synthetic text classification data}
308
  }
309
  ```
310
+
311
+ ## ๐Ÿค Contributing
312
+
313
+ Contributions are welcome! Please feel free to:
314
+ - Report issues with the model
315
+ - Suggest improvements
316
+ - Submit pull requests
317
+ - Share your use cases
318
+
319
+ ## ๐Ÿ“„ License
320
+
321
+ This model is released under the Apache 2.0 License. See the [LICENSE](LICENSE) file for more details.
322
+
323
+ ---
324
+
325
+ **Model Performance Summary:**
326
+ - โœ… **93.33% Overall Accuracy**
327
+ - โœ… **100% ifeval accuracy** (instruction-following)
328
+ - โœ… **100% wildvoice accuracy** (conversational)
329
+ - โœ… **80% commoneval accuracy** (factual questions)
330
+ - โœ… **Efficient LoRA fine-tuning** (0.88% trainable parameters)
331
+ - โœ… **Fast inference** with small model size
332
+ - โœ… **Easy to use** with simple API
333
+
334
+ *This model represents a successful application of LoRA fine-tuning for text classification, achieving high accuracy with minimal computational resources.*