rmtariq's picture
🧪 Add Complete testing documentation
3cb6e39 verified
# 🧪 Testing Guide for Multilingual Emotion Classifier
This guide provides comprehensive testing capabilities for the `rmtariq/multilingual-emotion-classifier` model.
## 🚀 Quick Start
### Installation
```bash
# Install requirements
pip install -r requirements_testing.txt
# Or install manually
pip install torch transformers numpy pandas scikit-learn
```
### Basic Usage
```bash
# Quick test (recommended for first-time users)
python test_model.py --test-type quick
# Comprehensive test
python test_model.py --test-type comprehensive
# Interactive testing
python test_model.py --test-type interactive
# Performance benchmark
python test_model.py --test-type benchmark
# Run all tests
python test_model.py --test-type all
```
## 📋 Test Types
### 1. 🚀 Quick Test
**Purpose**: Fast validation of core functionality
**Duration**: ~30 seconds
**Coverage**: 13 essential test cases (English + Malay)
```bash
python test_model.py --test-type quick
```
**What it tests**:
- ✅ Basic English emotions (6 cases)
- ✅ Basic Malay emotions (4 cases)
- ✅ Previously problematic cases (3 cases)
**Expected Results**: >90% accuracy
### 2. 🔬 Comprehensive Test
**Purpose**: Thorough validation across all categories
**Duration**: ~2 minutes
**Coverage**: 24 test cases across multiple categories
```bash
python test_model.py --test-type comprehensive
```
**Test Categories**:
- **English Basic**: Core English emotion expressions
- **Malay Basic**: Core Malay emotion expressions
- **Malay Fixed Issues**: Previously problematic cases (now fixed)
- **Edge Cases**: Boundary and special cases
**Expected Results**: >85% overall accuracy
### 3. 🎮 Interactive Test
**Purpose**: Manual testing with custom inputs
**Duration**: User-controlled
**Coverage**: Unlimited custom test cases
```bash
python test_model.py --test-type interactive
```
**Features**:
- Real-time emotion classification
- Confidence scoring
- Emoji visualization
- Easy exit (type 'quit')
**Example Session**:
```
💬 Your text: I am so excited!
🎭 Result: 😊 happy
📊 Confidence: 99.8%
💪 High confidence!
💬 Your text: Saya gembira!
🎭 Result: 😊 happy
📊 Confidence: 99.9%
💪 High confidence!
```
### 4. ⚡ Benchmark Test
**Purpose**: Performance and speed evaluation
**Duration**: ~1 minute
**Coverage**: 100 predictions for timing analysis
```bash
python test_model.py --test-type benchmark
```
**Metrics Measured**:
- Total processing time
- Average time per prediction
- Predictions per second
- Performance classification
**Expected Results**: >5 predictions/second
## 🎯 Supported Emotions
The model classifies text into 6 emotion categories:
| Emotion | Emoji | Description | Example (English) | Example (Malay) |
|---------|-------|-------------|-------------------|-----------------|
| **anger** | 😠 | Frustration, rage | "I'm so angry!" | "Marah betul!" |
| **fear** | 😨 | Anxiety, worry | "I'm scared!" | "Takut sangat!" |
| **happy** | 😊 | Joy, excitement | "I'm so happy!" | "Gembira sangat!" |
| **love** | ❤️ | Affection, care | "I love you!" | "Sayang kamu!" |
| **sadness** | 😢 | Sorrow, grief | "I'm so sad" | "Sedih betul" |
| **surprise** | 😲 | Amazement, shock | "What a surprise!" | "Terkejut betul!" |
## 🔧 Advanced Usage
### Custom Model Testing
```bash
# Test a different model
python test_model.py --model "your-model-name" --test-type quick
# Test local model
python test_model.py --model "./path/to/local/model" --test-type comprehensive
```
### Programmatic Usage
```python
from test_model import EmotionModelTester
# Initialize tester
tester = EmotionModelTester("rmtariq/multilingual-emotion-classifier")
# Run specific tests
quick_accuracy = tester.quick_test()
comprehensive_accuracy = tester.comprehensive_test()
speed = tester.benchmark_test()
print(f"Quick test accuracy: {quick_accuracy:.1%}")
print(f"Comprehensive accuracy: {comprehensive_accuracy:.1%}")
print(f"Speed: {speed:.1f} predictions/second")
```
## 📊 Expected Performance
### Accuracy Targets
- **Quick Test**: >90% accuracy
- **Comprehensive Test**: >85% accuracy
- **English Performance**: >95% accuracy
- **Malay Performance**: >85% accuracy
### Speed Targets
- **CPU Performance**: >5 predictions/second
- **GPU Performance**: >20 predictions/second
### Confidence Levels
- **High Confidence**: >90% (💪)
- **Good Confidence**: 70-90% (👍)
- **Low Confidence**: <70% (⚠️)
## 🐛 Troubleshooting
### Common Issues
#### 1. Model Loading Errors
```
❌ Error loading model: ...
```
**Solutions**:
- Check internet connection
- Verify model name spelling
- Try: `pip install --upgrade transformers`
#### 2. CUDA/GPU Issues
```
CUDA out of memory
```
**Solutions**:
- The model automatically falls back to CPU
- Reduce batch size if using custom code
- Use `--device cpu` flag if available
#### 3. Slow Performance
```
⚠️ SLOW. Consider optimization.
```
**Solutions**:
- Use GPU if available
- Close other applications
- Consider model quantization for production
### Getting Help
If you encounter issues:
1. **Check Requirements**: Ensure all dependencies are installed
2. **Update Libraries**: `pip install --upgrade transformers torch`
3. **Check Model Status**: Visit [model page](https://huggingface.co/rmtariq/multilingual-emotion-classifier)
4. **Report Issues**: Create an issue on the repository
## 🎯 Test Case Examples
### English Test Cases
```python
# Basic emotions
"I am so happy today!" # → happy
"This makes me really angry!" # → anger
"I love you so much!" # → love
"I'm scared of spiders" # → fear
"This news makes me sad" # → sadness
"What a surprise!" # → surprise
```
### Malay Test Cases
```python
# Basic emotions
"Saya sangat gembira!" # → happy
"Aku marah dengan keadaan ini" # → anger
"Aku sayang kamu" # → love
"Saya takut dengan ini" # → fear
"Sedih betul dengan berita" # → sadness
"Terkejut dengan kejadian" # → surprise
# Fixed issues (previously problematic)
"Ini adalah hari jadi terbaik" # → happy (was: anger)
"Terbaik!" # → happy (was: surprise)
"Ini adalah hari yang baik" # → happy (was: anger)
```
## 📈 Performance History
### Version 2.1 (Current)
-**Overall Accuracy**: 85.0%
-**English Performance**: 100%
-**Malay Performance**: 100% (fixed issues)
-**Speed**: 5-20 predictions/second
### Key Improvements
- 🔧 Fixed Malay birthday context classification
- 🔧 Fixed "baik/terbaik" positive expression recognition
- 🔧 Improved confidence scores
- 🔧 Enhanced robustness
## 🏆 Success Criteria
A successful test run should show:
-**Quick Test**: >90% accuracy
-**No Critical Failures**: All basic emotions working
-**Malay Fixes Verified**: Birthday/positive contexts → happy
-**Reasonable Speed**: >5 predictions/second
-**High Confidence**: Most predictions >90%
---
**Model Repository**: https://huggingface.co/rmtariq/multilingual-emotion-classifier
**Author**: rmtariq
**Last Updated**: June 2024