ensemble-tts-annotation / REALTIME_PROGRESS.md
marcosremar
Fix emotion2vec loading - use wav2vec2 compatible model
d669352

πŸš€ Real-Time Progress - SkyPilot Fine-tuning

Status: ⏳ IN PROGRESS Started: 2025-12-02 13:00 UTC Cluster: sky-33ba-marcos


πŸ“Š Current Job: Fine-tuning

Machine Provisioned:

Provider: Vast.ai (Czechia, CZ, EU)
Instance: A100 SXM4
vCPUs: 32 cores
RAM: 64GB
GPU: A100 (1x)
Cost: $0.00/hr ✨ FREE!

What's Running:

  1. βœ… Machine provisioned
  2. ⏳ Installing dependencies (torch, transformers, librosa)
  3. ⏳ Cloning repository
  4. ⏳ Creating synthetic data (50 samples/emotion)
  5. ⏳ Preparing dataset
  6. ⏳ Fine-tuning emotion2vec (10 epochs)
  7. ⏳ Testing model

Estimated Time:

  • Setup: ~5min
  • Data generation: ~1min
  • Fine-tuning: ~20-30min
  • Testing: ~2min
  • Total: ~30-40min

Expected Output:

βœ… Fine-tuning complete!
Model saved to: models/emotion/emotion2vec_finetuned_synthetic/

πŸ“ How to Monitor

Check logs in real-time:

sky logs sky-33ba-marcos -f

Check status:

sky status

SSH to machine (while running):

sky ssh sky-33ba-marcos
# Inside:
cd ensemble-tts-annotation
watch -n 1 nvidia-smi  # Monitor GPU usage

πŸ’° Cost Tracking

Item Cost
Validation test $0.00
Fine-tuning (current) $0.00 (Vast.ai spot)
Total so far $0.00 ✨

🎯 After This Completes

Next Steps:

  1. Download model:

    sky scp sky-33ba-marcos:~/ensemble-tts-annotation/models/emotion/finetuned/ ./models/
    
  2. Test locally:

    from ensemble_tts import EnsembleAnnotator
    
    annotator = EnsembleAnnotator(mode='balanced', device='cuda')
    result = annotator.annotate('audio.wav')
    
  3. Cleanup:

    sky down sky-33ba-marcos
    
  4. Then run:

    • Multi-GPU test (optional)
    • OR Full Orpheus annotation (118k samples)

πŸ“ˆ Progress Updates

βœ… Job Completed - Partial Success

Time: 2025-12-02 13:03 UTC Duration: 3 minutes Status: βœ… SUCCEEDED (com erro no model loading)

What Worked βœ…

  • βœ… Machine provisioned (A100 SXM4, 32 vCPUs, 64GB RAM)
  • βœ… Dependencies installed (torch, transformers, librosa)
  • βœ… Repository cloned
  • βœ… 350 synthetic samples created (50/emotion)
  • βœ… Dataset prepared (data/prepared/synthetic_prepared)

Issues Found ❌

  • ❌ emotion2vec model loading failed
  • ❌ Model requires funasr library (not standard transformers)
  • ❌ Fine-tuning didn't execute
  • ❌ Model testing failed

Next Steps πŸ”§

  1. Update emotion2vec implementation to use compatible wav2vec2
  2. Re-run fine-tuning with corrected code
  3. Or: Install funasr for native emotion2vec support

Last update: 2025-12-02 13:07 UTC - Completed with model loading error