marcosremar

Add comprehensive testing infrastructure

98938e3 3 months ago

6.04 kB

	# Testing Guide - OPTION A Ensemble

	Este guia mostra como testar o sistema OPTION A em diferentes ambientes.

	## 🧪 Opções de Teste

	### 1. Teste Local (Mais Rápido)

	Testa a estrutura sem carregar modelos pesados:

	```bash
	python test_local.py
	```

	Vantagens:
	- ✅ Rápido (~10 segundos)
	- ✅ Não precisa de GPU
	- ✅ Testa imports e estrutura

	Limitações:
	- ❌ Não carrega modelos reais
	- ❌ Não testa inferência

	---

	### 2. Teste Completo Local

	Testa com carregamento de modelos (requer download de ~3GB):

	```bash
	python scripts/test/test_quick.py
	```

	Testes incluídos:
	1. Model loading
	2. Single audio annotation
	3. Batch processing (5 samples)
	4. Balanced mode (OPTION A - 3 models)
	5. Performance benchmark

	Tempo estimado:
	- Primeira execução: ~10-15 min (download de modelos)
	- Execuções seguintes: ~2-5 min

	Requisitos:
	- RAM: 8GB mínimo (4GB pode funcionar com quick mode)
	- Disk: ~5GB para modelos
	- CPU: Qualquer (GPU não necessária para teste)

	---

	### 3. Docker (Isolado)

	Testa em container Docker isolado:

	```bash
	# Build
	docker build -f Dockerfile.test -t ensemble-test .

	# Run
	docker run ensemble-test
	```

	Vantagens:
	- ✅ Ambiente limpo e reproduzível
	- ✅ Não afeta sistema local
	- ✅ Fácil de compartilhar

	---

	### 4. Google Cloud Spot Instance (Mais Barato)

	Testa em máquina cloud baratinha (~$0.01/hora):

	```bash
	bash scripts/test/launch_gcp_spot.sh
	```

	Custo estimado:
	- `e2-micro`: $0.0025/hr (1GB RAM) - Só para teste de estrutura
	- `e2-medium`: $0.01/hr (4GB RAM) - ⭐ Recomendado para teste completo
	- `e2-standard-2`: $0.02/hr (8GB RAM) - Para teste com balanced mode

	O que faz:
	1. Busca zona mais barata
	2. Lança instância spot/preemptible
	3. Instala dependências automaticamente
	4. Roda testes
	5. Fornece comandos para SSH e cleanup

	Comandos úteis:
	```bash
	# Listar instâncias
	gcloud compute instances list

	# SSH na instância
	gcloud compute ssh ensemble-test-XXX --zone=us-central1-a

	# Deletar instância
	gcloud compute instances delete ensemble-test-XXX --zone=us-central1-a --quiet
	```

	---

	### 5. AWS Spot Instance

	Alternativa usando AWS (geralmente mais caro que GCP):

	```bash
	bash scripts/test/launch_spot_test.sh
	```

	Custo estimado:
	- `t3a.medium`: ~$0.009/hr (4GB RAM) ⭐ Mais barato
	- `t3.medium`: ~$0.01/hr (4GB RAM)
	- `t3a.large`: ~$0.018/hr (8GB RAM)

	---

	## 📊 Níveis de Teste

	### Nível 1: Estrutura (test_local.py)
	```
	✓ Imports
	✓ Annotator creation
	✓ Model structure
	```
	Tempo: ~10s \| RAM: <1GB

	### Nível 2: Quick Mode (test_quick.py --mode quick)
	```
	✓ Load 2 models (emotion2vec + SenseVoice)
	✓ Single annotation
	✓ Batch processing
	```
	Tempo: ~2-5min \| RAM: ~4GB

	### Nível 3: Balanced Mode (test_quick.py --mode balanced)
	```
	✓ Load 3 models (OPTION A)
	✓ Full annotation pipeline
	✓ Performance benchmark
	```
	Tempo: ~5-10min \| RAM: ~6GB

	### Nível 4: Production Test
	```
	✓ Annotate real dataset (100 samples)
	✓ Evaluation with ground truth
	✓ Performance metrics
	```
	Tempo: ~15-30min \| RAM: ~8GB

	---

	## 🎯 Recomendações por Caso de Uso

	### Para Desenvolvimento
	```bash
	python test_local.py # Validar mudanças rápido
	```

	### Para CI/CD
	```bash
	docker run ensemble-test # Testes automatizados
	```

	### Para Validação Pre-Produção
	```bash
	python scripts/test/test_quick.py --mode balanced
	```

	### Para Benchmark de Performance
	```bash
	bash scripts/test/launch_gcp_spot.sh # Ambiente limpo e controlado
	```

	---

	## ❌ Troubleshooting

	### Out of Memory

	```bash
	# Usar quick mode (2 modelos)
	python scripts/test/test_quick.py --mode quick

	# Ou testar sem carregar modelos
	python test_local.py
	```

	### Models não baixam

	```bash
	# Limpar cache
	rm -rf ~/.cache/huggingface/

	# Tentar novamente
	python scripts/test/test_quick.py
	```

	### Timeout em cloud

	```bash
	# Aumentar tipo de máquina
	# Em GCP: e2-medium → e2-standard-2
	# Em AWS: t3.medium → t3.large
	```

	---

	## 📈 Resultados Esperados

	### test_local.py
	```
	✅ ALL LOCAL TESTS PASSED!
	imports: ✅ PASS
	create_annotator: ✅ PASS
	model_structure: ✅ PASS
	```

	### test_quick.py (Quick Mode)
	```
	✅ ALL TESTS PASSED!
	model_loading: ✅ PASS (2 models)
	single_annotation: ✅ PASS (~2s)
	batch_processing: ✅ PASS (~8s for 5 samples)
	```

	### test_quick.py (Balanced Mode - OPTION A)
	```
	✅ ALL TESTS PASSED!
	model_loading: ✅ PASS (3 models)
	single_annotation: ✅ PASS (~3s)
	balanced_mode: ✅ PASS (3 predictions)
	benchmark: ✅ PASS
	```

	---

	## 💰 Custo Estimado de Teste

	\| Ambiente \| Custo/Teste \| Tempo \| Notas \|
	\|----------\|-------------\|-------\|-------\|
	\| Local \| $0 \| 2-10min \| Melhor para dev \|
	\| Docker \| $0 \| 5-15min \| Build inicial demora \|
	\| GCP Spot (e2-medium) \| $0.001-0.003 \| 10-20min \| ⭐ Melhor custo-benefício \|
	\| AWS Spot (t3a.medium) \| $0.003-0.005 \| 10-20min \| Alternativa \|

	Exemplo de custo real:
	- Teste completo em GCP e2-medium: $0.01/hr × 0.3hr = $0.003 (menos de 1 centavo!)

	---

	## 🚀 Quick Start

	Para começar rápido:

	```bash
	# 1. Teste local (validar estrutura)
	python test_local.py

	# 2. Se passou, teste completo
	python scripts/test/test_quick.py

	# 3. Se quer testar em cloud barato
	bash scripts/test/launch_gcp_spot.sh
	```

	---

	## 📝 Logs e Debugging

	### Ver logs detalhados
	```bash
	python test_local.py 2>&1 \| tee test.log
	```

	### Apenas erros
	```bash
	python scripts/test/test_quick.py 2>&1 \| grep -E "(ERROR\|FAIL\|❌)"
	```

	### Com timestamp
	```bash
	python scripts/test/test_quick.py 2>&1 \| ts
	```

	---

	## ✅ Checklist Pre-Produção

	Antes de usar em produção, execute:

	- [ ] `python test_local.py` → Passa
	- [ ] `python scripts/test/test_quick.py --mode quick` → Passa
	- [ ] `python scripts/test/test_quick.py --mode balanced` → Passa
	- [ ] Teste com áudio real (não sintético)
	- [ ] Evaluation com ground truth
	- [ ] Performance benchmark

	---

	Desenvolvido para OPTION A - Ensemble otimizado de 3 modelos 🎯