manbeast3b
/

qwen2.5-0.5b-text-classification

@@ -9,36 +9,107 @@ tags:
 - commoneval
 - wildvoice
 - voicebench
 ---
-# Qwen2.5-0.5B-Instruct Text Classification Model
-This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) using LoRA (Low-Rank Adaptation) for text classification tasks.
-## Model Description
-The model has been trained to classify text into three categories:
-- **ifeval**: Instruction-following tasks with specific formatting requirements
-- **commoneval**: Factual questions and knowledge-based queries
-- **wildvoice**: Conversational, informal language patterns
-## Performance
-- **Overall Accuracy**: 93.33% (28/30 correct)
-- **ifeval**: 100% (10/10)
-- **commoneval**: 80% (8/10)
-- **wildvoice**: 100% (10/10)
-## Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # Load the model and tokenizer
-model = AutoModelForCausalLM.from_pretrained("your-username/qwen2.5-0.5b-text-classification")
-tokenizer = AutoTokenizer.from_pretrained("your-username/qwen2.5-0.5b-text-classification")
-# Example usage
 def classify_text(text):
     prompt = f"Text: {text}\nLabel:"
     inputs = tokenizer(prompt, return_tensors="pt")
@@ -68,37 +139,196 @@ print(classify_text("Hey, how are you doing today?"))
 # Output: wildvoice
 ```
-## Training Details
 - **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
-- **Method**: LoRA (Low-Rank Adaptation)
-- **Trainable Parameters**: 0.88% of total parameters
-- **Learning Rate**: 5e-4
-- **Batch Size**: 2
-- **Training Steps**: 150
-- **Max Length**: 128
-## Dataset
 The model was trained on synthetic data representing three text categories:
-- Instruction-following tasks (ifeval)
-- Factual questions (commoneval)
-- Conversational text (wildvoice)
-## Limitations
-- The model was trained on synthetic data and may not generalize well to all real-world text
-- Performance may vary on text that doesn't clearly fit the three defined categories
-- The model is optimized for the specific prompt format used during training
-## Citation
 ```bibtex
 @misc{qwen2.5-0.5b-text-classification,
-  title={Qwen2.5-0.5B Text Classification Model},
   author={Your Name},
   year={2024},
   publisher={Hugging Face},
-  howpublished={\url{https://huggingface.co/your-username/qwen2.5-0.5b-text-classification}}
 }
 ```

 - commoneval
 - wildvoice
 - voicebench
+- fine-tuned
 ---
+# Qwen2.5-0.5B Text Classification Model
+This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) using LoRA (Low-Rank Adaptation) for text classification tasks. The model has been specifically trained to classify text into three categories based on VoiceBench dataset patterns.
+## 🎯 Model Description
+The model has been trained to classify text into three distinct categories:
+- **ifeval**: Instruction-following tasks with specific formatting requirements and step-by-step instructions
+- **commoneval**: Factual questions and knowledge-based queries requiring direct answers
+- **wildvoice**: Conversational, informal language patterns and natural dialogue
+## 📊 Performance Results
+### Overall Performance
+- **Overall Accuracy**: **93.33%** (28/30 correct predictions)
+- **Training Method**: LoRA (Low-Rank Adaptation)
+- **Trainable Parameters**: 0.88% of total parameters (4,399,104 out of 498,431,872)
+### Per-Category Performance
+| Category | Accuracy | Correct/Total | Description |
+|----------|----------|---------------|-------------|
+| **ifeval** | **100%** | 10/10 | Perfect performance on instruction-following tasks |
+| **commoneval** | **80%** | 8/10 | Good performance on factual questions |
+| **wildvoice** | **100%** | 10/10 | Perfect performance on conversational text |
+### Confusion Matrix
+```
+ifeval:
+  -> ifeval: 10
+commoneval:
+  -> commoneval: 8
+  -> unknown: 1
+  -> wildvoice: 1
+wildvoice:
+  -> wildvoice: 10
+```
+## 🔬 Development Journey & Methods Tried
+### Initial Challenges
+We started with several approaches that didn't work well:
+1. **GRPO (Group Relative Policy Optimization)**: Initial attempts with GRPO training showed poor performance
+   - Loss decreased but model wasn't learning classification
+   - Model generated irrelevant responses like "unknown", "txt", "com"
+   - Overall accuracy: ~20%
+2. **Full Fine-tuning**: Attempted full fine-tuning of larger models
+   - CUDA out of memory issues with larger models
+   - Numerical instability with certain model architectures
+   - Poor convergence on classification task
+3. **Complex Prompt Formats**: Tried various complex prompt structures
+   - "Classify this text as ifeval, commoneval, or wildvoice: ..."
+   - Model struggled with complex instructions
+   - Generated explanations instead of simple labels
+### Breakthrough: Direct Classification Approach
+The key breakthrough came with a **direct, simple approach**:
+#### 1. **Simplified Prompt Format**
+Instead of complex classification prompts, we used a simple format:
+```
+Text: {input_text}
+Label: {expected_label}
+```
+#### 2. **LoRA (Low-Rank Adaptation)**
+- Used PEFT library for efficient fine-tuning
+- Only trained 0.88% of parameters
+- Much more stable than full fine-tuning
+- Faster training and inference
+#### 3. **Focused Training Data**
+Created clear, distinct examples for each category:
+- **ifeval**: Instruction-following with specific formatting requirements
+- **commoneval**: Factual questions requiring direct answers
+- **wildvoice**: Conversational, informal language patterns
+#### 4. **Optimal Hyperparameters**
+- **Learning Rate**: 5e-4 (higher than initial attempts)
+- **Batch Size**: 2 (smaller for stability)
+- **Max Length**: 128 (shorter sequences)
+- **Training Steps**: 150
+- **LoRA Rank**: 8 (focused learning)
+## 🚀 Usage
+### Basic Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
 # Load the model and tokenizer
+model = AutoModelForCausalLM.from_pretrained("manbeast3b/qwen2.5-0.5b-text-classification")
+tokenizer = AutoTokenizer.from_pretrained("manbeast3b/qwen2.5-0.5b-text-classification")
 def classify_text(text):
     prompt = f"Text: {text}\nLabel:"
     inputs = tokenizer(prompt, return_tensors="pt")
 # Output: wildvoice
 ```
+### Advanced Usage with Confidence Scoring
+```python
+def classify_with_confidence(text, num_samples=5):
+    predictions = []
+    for _ in range(num_samples):
+        prompt = f"Text: {text}\nLabel:"
+        inputs = tokenizer(prompt, return_tensors="pt")
+        with torch.no_grad():
+            generated = model.generate(
+                **inputs,
+                max_new_tokens=15,
+                do_sample=True,
+                temperature=0.3,  # Slightly higher for diversity
+                top_p=0.9,
+                pad_token_id=tokenizer.eos_token_id,
+                eos_token_id=tokenizer.eos_token_id,
+            )
+        response = tokenizer.decode(generated[0], skip_special_tokens=True)
+        prediction = response[len(prompt):].strip().lower()
+        # Clean up prediction
+        if 'ifeval' in prediction:
+            prediction = 'ifeval'
+        elif 'commoneval' in prediction:
+            prediction = 'commoneval'
+        elif 'wildvoice' in prediction:
+            prediction = 'wildvoice'
+        else:
+            prediction = 'unknown'
+        predictions.append(prediction)
+    # Calculate confidence
+    from collections import Counter
+    counts = Counter(predictions)
+    most_common = counts.most_common(1)[0]
+    confidence = most_common[1] / len(predictions)
+    return most_common[0], confidence
+# Example with confidence
+label, confidence = classify_with_confidence("Please follow these steps: 1) Read 2) Think 3) Write")
+print(f"Prediction: {label}, Confidence: {confidence:.2%}")
+```
+## 📈 Training Details
+### Model Architecture
 - **Base Model**: Qwen/Qwen2.5-0.5B-Instruct
+- **Parameters**: 498,431,872 total, 4,399,104 trainable (0.88%)
+- **Precision**: FP16 (mixed precision)
+- **Device**: CUDA (GPU accelerated)
+### Training Configuration
+```python
+# LoRA Configuration
+lora_config = LoraConfig(
+    task_type=TaskType.CAUSAL_LM,
+    r=8,  # Rank
+    lora_alpha=16,  # LoRA alpha
+    lora_dropout=0.1,
+    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
+)
+# Training Arguments
+training_args = TrainingArguments(
+    learning_rate=5e-4,
+    per_device_train_batch_size=2,
+    max_steps=150,
+    max_length=128,
+    fp16=True,
+    gradient_accumulation_steps=1,
+    warmup_steps=20,
+    weight_decay=0.01,
+    max_grad_norm=1.0
+)
+```
+### Dataset
 The model was trained on synthetic data representing three text categories:
+- **60 total samples** (20 per category)
+- **ifeval**: Instruction-following tasks with specific formatting requirements
+- **commoneval**: Factual questions and knowledge-based queries
+- **wildvoice**: Conversational, informal language patterns
+## 🔍 Error Analysis
+### Failed Predictions (2 out of 30)
+1. **"What is 2 plus 2?"** → Predicted: `unknown` (Expected: `commoneval`)
+   - Model generated: `#eval{1} Label: #eval{2} Label: #`
+   - Issue: Model generated code-like syntax instead of simple label
+2. **"What is the opposite of hot?"** → Predicted: `wildvoice` (Expected: `commoneval`)
+   - Model generated: `#wildvoice:comoneval:hot:yourresponse:whatis`
+   - Issue: Model generated complex response instead of simple label
+### Success Factors
+- **Simple prompt format** was crucial for success
+- **LoRA fine-tuning** provided stable training
+- **Focused training data** with clear category distinctions
+- **Appropriate hyperparameters** (learning rate, batch size, etc.)
+## 🛠️ Technical Implementation
+### Files Structure
+```
+merged_classification_model/
+├── README.md                    # This file
+├── config.json                  # Model configuration
+├── generation_config.json       # Generation settings
+├── model.safetensors           # Model weights (988MB)
+├── tokenizer.json              # Tokenizer vocabulary
+├── tokenizer_config.json       # Tokenizer configuration
+├── special_tokens_map.json     # Special tokens mapping
+├── added_tokens.json           # Added tokens
+├── merges.txt                  # BPE merges
+├── vocab.json                  # Vocabulary
+└── chat_template.jinja         # Chat template
+```
+### Dependencies
+```bash
+pip install transformers>=4.56.0
+pip install torch>=2.0.0
+pip install peft>=0.17.0
+pip install accelerate>=0.21.0
+```
+## 🎯 Use Cases
+This model is particularly useful for:
+- **Text categorization** in educational platforms
+- **Content filtering** based on text type
+- **Dataset preprocessing** for machine learning pipelines
+- **VoiceBench-style evaluation** systems
+- **Instruction following detection** in AI systems
+- **Conversational vs. factual text separation**
+## ⚠️ Limitations
+1. **Synthetic Training Data**: Model was trained on synthetic data and may not generalize perfectly to all real-world text
+2. **Three-Category Limitation**: Only classifies into the three predefined categories
+3. **Prompt Sensitivity**: Performance may vary with different prompt formats
+4. **Edge Cases**: Some edge cases (like mathematical questions) may be misclassified
+5. **Language**: Primarily trained on English text
+## 🔮 Future Improvements
+1. **Larger Training Dataset**: Use real VoiceBench data with proper audio transcription
+2. **More Categories**: Expand to include additional text types
+3. **Multilingual Support**: Train on multiple languages
+4. **Confidence Calibration**: Improve confidence scoring
+5. **Few-shot Learning**: Add support for few-shot classification
+## 📚 Citation
 ```bibtex
 @misc{qwen2.5-0.5b-text-classification,
+  title={Qwen2.5-0.5B Text Classification Model for VoiceBench-style Evaluation},
   author={Your Name},
   year={2024},
   publisher={Hugging Face},
+  howpublished={\url{https://huggingface.co/manbeast3b/qwen2.5-0.5b-text-classification}},
+  note={Fine-tuned using LoRA on synthetic text classification data}
 }
 ```
+## 🤝 Contributing
+Contributions are welcome! Please feel free to:
+- Report issues with the model
+- Suggest improvements
+- Submit pull requests
+- Share your use cases
+## 📄 License
+This model is released under the Apache 2.0 License. See the [LICENSE](LICENSE) file for more details.
+---
+**Model Performance Summary:**
+- ✅ **93.33% Overall Accuracy**
+- ✅ **100% ifeval accuracy** (instruction-following)
+- ✅ **100% wildvoice accuracy** (conversational)
+- ✅ **80% commoneval accuracy** (factual questions)
+- ✅ **Efficient LoRA fine-tuning** (0.88% trainable parameters)
+- ✅ **Fast inference** with small model size
+- ✅ **Easy to use** with simple API
+*This model represents a successful application of LoRA fine-tuning for text classification, achieving high accuracy with minimal computational resources.*