# Testing Guide - OPTION A Ensemble

Este guia mostra como testar o sistema OPTION A em diferentes ambientes.

## 🧪 Opções de Teste

### 1. Teste Local (Mais Rápido)

Testa a estrutura sem carregar modelos pesados:

```bash
python test_local.py
```

**Vantagens**:
- ✅ Rápido (~10 segundos)
- ✅ Não precisa de GPU
- ✅ Testa imports e estrutura

**Limitações**:
- ❌ Não carrega modelos reais
- ❌ Não testa inferência

---

### 2. Teste Completo Local

Testa com carregamento de modelos (requer download de ~3GB):

```bash
python scripts/test/test_quick.py
```

**Testes incluídos**:
1. Model loading
2. Single audio annotation
3. Batch processing (5 samples)
4. Balanced mode (OPTION A - 3 models)
5. Performance benchmark

**Tempo estimado**:
- Primeira execução: ~10-15 min (download de modelos)
- Execuções seguintes: ~2-5 min

**Requisitos**:
- RAM: 8GB mínimo (4GB pode funcionar com quick mode)
- Disk: ~5GB para modelos
- CPU: Qualquer (GPU não necessária para teste)

---

### 3. Docker (Isolado)

Testa em container Docker isolado:

```bash
# Build
docker build -f Dockerfile.test -t ensemble-test .

# Run
docker run ensemble-test
```

**Vantagens**:
- ✅ Ambiente limpo e reproduzível
- ✅ Não afeta sistema local
- ✅ Fácil de compartilhar

---

### 4. Google Cloud Spot Instance (Mais Barato)

Testa em máquina cloud baratinha (~$0.01/hora):

```bash
bash scripts/test/launch_gcp_spot.sh
```

**Custo estimado**:
- `e2-micro`: $0.0025/hr (1GB RAM) - Só para teste de estrutura
- `e2-medium`: $0.01/hr (4GB RAM) - ⭐ Recomendado para teste completo
- `e2-standard-2`: $0.02/hr (8GB RAM) - Para teste com balanced mode

**O que faz**:
1. Busca zona mais barata
2. Lança instância spot/preemptible
3. Instala dependências automaticamente
4. Roda testes
5. Fornece comandos para SSH e cleanup

**Comandos úteis**:
```bash
# Listar instâncias
gcloud compute instances list

# SSH na instância
gcloud compute ssh ensemble-test-XXX --zone=us-central1-a

# Deletar instância
gcloud compute instances delete ensemble-test-XXX --zone=us-central1-a --quiet
```

---

### 5. AWS Spot Instance

Alternativa usando AWS (geralmente mais caro que GCP):

```bash
bash scripts/test/launch_spot_test.sh
```

**Custo estimado**:
- `t3a.medium`: ~$0.009/hr (4GB RAM) ⭐ Mais barato
- `t3.medium`: ~$0.01/hr (4GB RAM)
- `t3a.large`: ~$0.018/hr (8GB RAM)

---

## 📊 Níveis de Teste

### Nível 1: Estrutura (test_local.py)
```
✓ Imports
✓ Annotator creation
✓ Model structure
```
**Tempo**: ~10s | **RAM**: <1GB

### Nível 2: Quick Mode (test_quick.py --mode quick)
```
✓ Load 2 models (emotion2vec + SenseVoice)
✓ Single annotation
✓ Batch processing
```
**Tempo**: ~2-5min | **RAM**: ~4GB

### Nível 3: Balanced Mode (test_quick.py --mode balanced)
```
✓ Load 3 models (OPTION A)
✓ Full annotation pipeline
✓ Performance benchmark
```
**Tempo**: ~5-10min | **RAM**: ~6GB

### Nível 4: Production Test
```
✓ Annotate real dataset (100 samples)
✓ Evaluation with ground truth
✓ Performance metrics
```
**Tempo**: ~15-30min | **RAM**: ~8GB

---

## 🎯 Recomendações por Caso de Uso

### Para Desenvolvimento
```bash
python test_local.py  # Validar mudanças rápido
```

### Para CI/CD
```bash
docker run ensemble-test  # Testes automatizados
```

### Para Validação Pre-Produção
```bash
python scripts/test/test_quick.py --mode balanced
```

### Para Benchmark de Performance
```bash
bash scripts/test/launch_gcp_spot.sh  # Ambiente limpo e controlado
```

---

## ❌ Troubleshooting

### Out of Memory

```bash
# Usar quick mode (2 modelos)
python scripts/test/test_quick.py --mode quick

# Ou testar sem carregar modelos
python test_local.py
```

### Models não baixam

```bash
# Limpar cache
rm -rf ~/.cache/huggingface/

# Tentar novamente
python scripts/test/test_quick.py
```

### Timeout em cloud

```bash
# Aumentar tipo de máquina
# Em GCP: e2-medium → e2-standard-2
# Em AWS: t3.medium → t3.large
```

---

## 📈 Resultados Esperados

### test_local.py
```
✅ ALL LOCAL TESTS PASSED!
  imports: ✅ PASS
  create_annotator: ✅ PASS
  model_structure: ✅ PASS
```

### test_quick.py (Quick Mode)
```
✅ ALL TESTS PASSED!
  model_loading: ✅ PASS (2 models)
  single_annotation: ✅ PASS (~2s)
  batch_processing: ✅ PASS (~8s for 5 samples)
```

### test_quick.py (Balanced Mode - OPTION A)
```
✅ ALL TESTS PASSED!
  model_loading: ✅ PASS (3 models)
  single_annotation: ✅ PASS (~3s)
  balanced_mode: ✅ PASS (3 predictions)
  benchmark: ✅ PASS
```

---

## 💰 Custo Estimado de Teste

| Ambiente | Custo/Teste | Tempo | Notas |
|----------|-------------|-------|-------|
| **Local** | $0 | 2-10min | Melhor para dev |
| **Docker** | $0 | 5-15min | Build inicial demora |
| **GCP Spot (e2-medium)** | $0.001-0.003 | 10-20min | ⭐ Melhor custo-benefício |
| **AWS Spot (t3a.medium)** | $0.003-0.005 | 10-20min | Alternativa |

**Exemplo de custo real**:
- Teste completo em GCP e2-medium: $0.01/hr × 0.3hr = **$0.003** (menos de 1 centavo!)

---

## 🚀 Quick Start

**Para começar rápido**:

```bash
# 1. Teste local (validar estrutura)
python test_local.py

# 2. Se passou, teste completo
python scripts/test/test_quick.py

# 3. Se quer testar em cloud barato
bash scripts/test/launch_gcp_spot.sh
```

---

## 📝 Logs e Debugging

### Ver logs detalhados
```bash
python test_local.py 2>&1 | tee test.log
```

### Apenas erros
```bash
python scripts/test/test_quick.py 2>&1 | grep -E "(ERROR|FAIL|❌)"
```

### Com timestamp
```bash
python scripts/test/test_quick.py 2>&1 | ts
```

---

## ✅ Checklist Pre-Produção

Antes de usar em produção, execute:

- [ ] `python test_local.py` → Passa
- [ ] `python scripts/test/test_quick.py --mode quick` → Passa
- [ ] `python scripts/test/test_quick.py --mode balanced` → Passa
- [ ] Teste com áudio real (não sintético)
- [ ] Evaluation com ground truth
- [ ] Performance benchmark

---

**Desenvolvido para OPTION A - Ensemble otimizado de 3 modelos** 🎯