multilingual-emotion-classifier / TESTING_GUIDE.md

🧪 Add Complete testing documentation

3cb6e39 verified 6 months ago

7.17 kB

	# 🧪 Testing Guide for Multilingual Emotion Classifier

	This guide provides comprehensive testing capabilities for the `rmtariq/multilingual-emotion-classifier` model.

	## 🚀 Quick Start

	### Installation
	```bash
	# Install requirements
	pip install -r requirements_testing.txt

	# Or install manually
	pip install torch transformers numpy pandas scikit-learn
	```

	### Basic Usage
	```bash
	# Quick test (recommended for first-time users)
	python test_model.py --test-type quick

	# Comprehensive test
	python test_model.py --test-type comprehensive

	# Interactive testing
	python test_model.py --test-type interactive

	# Performance benchmark
	python test_model.py --test-type benchmark

	# Run all tests
	python test_model.py --test-type all
	```

	## 📋 Test Types

	### 1. 🚀 Quick Test
	Purpose: Fast validation of core functionality
	Duration: ~30 seconds
	Coverage: 13 essential test cases (English + Malay)

	```bash
	python test_model.py --test-type quick
	```

	What it tests:
	- ✅ Basic English emotions (6 cases)
	- ✅ Basic Malay emotions (4 cases)
	- ✅ Previously problematic cases (3 cases)

	Expected Results: >90% accuracy

	### 2. 🔬 Comprehensive Test
	Purpose: Thorough validation across all categories
	Duration: ~2 minutes
	Coverage: 24 test cases across multiple categories

	```bash
	python test_model.py --test-type comprehensive
	```

	Test Categories:
	- English Basic: Core English emotion expressions
	- Malay Basic: Core Malay emotion expressions
	- Malay Fixed Issues: Previously problematic cases (now fixed)
	- Edge Cases: Boundary and special cases

	Expected Results: >85% overall accuracy

	### 3. 🎮 Interactive Test
	Purpose: Manual testing with custom inputs
	Duration: User-controlled
	Coverage: Unlimited custom test cases

	```bash
	python test_model.py --test-type interactive
	```

	Features:
	- Real-time emotion classification
	- Confidence scoring
	- Emoji visualization
	- Easy exit (type 'quit')

	Example Session:
	```
	💬 Your text: I am so excited!
	🎭 Result: 😊 happy
	📊 Confidence: 99.8%
	💪 High confidence!

	💬 Your text: Saya gembira!
	🎭 Result: 😊 happy
	📊 Confidence: 99.9%
	💪 High confidence!
	```

	### 4. ⚡ Benchmark Test
	Purpose: Performance and speed evaluation
	Duration: ~1 minute
	Coverage: 100 predictions for timing analysis

	```bash
	python test_model.py --test-type benchmark
	```

	Metrics Measured:
	- Total processing time
	- Average time per prediction
	- Predictions per second
	- Performance classification

	Expected Results: >5 predictions/second

	## 🎯 Supported Emotions

	The model classifies text into 6 emotion categories:

	\| Emotion \| Emoji \| Description \| Example (English) \| Example (Malay) \|
	\|---------\|-------\|-------------\|-------------------\|-----------------\|
	\| anger \| 😠 \| Frustration, rage \| "I'm so angry!" \| "Marah betul!" \|
	\| fear \| 😨 \| Anxiety, worry \| "I'm scared!" \| "Takut sangat!" \|
	\| happy \| 😊 \| Joy, excitement \| "I'm so happy!" \| "Gembira sangat!" \|
	\| love \| ❤️ \| Affection, care \| "I love you!" \| "Sayang kamu!" \|
	\| sadness \| 😢 \| Sorrow, grief \| "I'm so sad" \| "Sedih betul" \|
	\| surprise \| 😲 \| Amazement, shock \| "What a surprise!" \| "Terkejut betul!" \|

	## 🔧 Advanced Usage

	### Custom Model Testing
	```bash
	# Test a different model
	python test_model.py --model "your-model-name" --test-type quick

	# Test local model
	python test_model.py --model "./path/to/local/model" --test-type comprehensive
	```

	### Programmatic Usage
	```python
	from test_model import EmotionModelTester

	# Initialize tester
	tester = EmotionModelTester("rmtariq/multilingual-emotion-classifier")

	# Run specific tests
	quick_accuracy = tester.quick_test()
	comprehensive_accuracy = tester.comprehensive_test()
	speed = tester.benchmark_test()

	print(f"Quick test accuracy: {quick_accuracy:.1%}")
	print(f"Comprehensive accuracy: {comprehensive_accuracy:.1%}")
	print(f"Speed: {speed:.1f} predictions/second")
	```

	## 📊 Expected Performance

	### Accuracy Targets
	- Quick Test: >90% accuracy
	- Comprehensive Test: >85% accuracy
	- English Performance: >95% accuracy
	- Malay Performance: >85% accuracy

	### Speed Targets
	- CPU Performance: >5 predictions/second
	- GPU Performance: >20 predictions/second

	### Confidence Levels
	- High Confidence: >90% (💪)
	- Good Confidence: 70-90% (👍)
	- Low Confidence: <70% (⚠️)

	## 🐛 Troubleshooting

	### Common Issues

	#### 1. Model Loading Errors
	```
	❌ Error loading model: ...
	```
	Solutions:
	- Check internet connection
	- Verify model name spelling
	- Try: `pip install --upgrade transformers`

	#### 2. CUDA/GPU Issues
	```
	CUDA out of memory
	```
	Solutions:
	- The model automatically falls back to CPU
	- Reduce batch size if using custom code
	- Use `--device cpu` flag if available

	#### 3. Slow Performance
	```
	⚠️ SLOW. Consider optimization.
	```
	Solutions:
	- Use GPU if available
	- Close other applications
	- Consider model quantization for production

	### Getting Help

	If you encounter issues:

	1. Check Requirements: Ensure all dependencies are installed
	2. Update Libraries: `pip install --upgrade transformers torch`
	3. Check Model Status: Visit [model page](https://huggingface.co/rmtariq/multilingual-emotion-classifier)
	4. Report Issues: Create an issue on the repository

	## 🎯 Test Case Examples

	### English Test Cases
	```python
	# Basic emotions
	"I am so happy today!" # → happy
	"This makes me really angry!" # → anger
	"I love you so much!" # → love
	"I'm scared of spiders" # → fear
	"This news makes me sad" # → sadness
	"What a surprise!" # → surprise
	```

	### Malay Test Cases
	```python
	# Basic emotions
	"Saya sangat gembira!" # → happy
	"Aku marah dengan keadaan ini" # → anger
	"Aku sayang kamu" # → love
	"Saya takut dengan ini" # → fear
	"Sedih betul dengan berita" # → sadness
	"Terkejut dengan kejadian" # → surprise

	# Fixed issues (previously problematic)
	"Ini adalah hari jadi terbaik" # → happy (was: anger)
	"Terbaik!" # → happy (was: surprise)
	"Ini adalah hari yang baik" # → happy (was: anger)
	```

	## 📈 Performance History

	### Version 2.1 (Current)
	- ✅ Overall Accuracy: 85.0%
	- ✅ English Performance: 100%
	- ✅ Malay Performance: 100% (fixed issues)
	- ✅ Speed: 5-20 predictions/second

	### Key Improvements
	- 🔧 Fixed Malay birthday context classification
	- 🔧 Fixed "baik/terbaik" positive expression recognition
	- 🔧 Improved confidence scores
	- 🔧 Enhanced robustness

	## 🏆 Success Criteria

	A successful test run should show:

	- ✅ Quick Test: >90% accuracy
	- ✅ No Critical Failures: All basic emotions working
	- ✅ Malay Fixes Verified: Birthday/positive contexts → happy
	- ✅ Reasonable Speed: >5 predictions/second
	- ✅ High Confidence: Most predictions >90%

	---

	Model Repository: https://huggingface.co/rmtariq/multilingual-emotion-classifier
	Author: rmtariq
	Last Updated: June 2024