ensemble-tts-annotation / REALTIME_PROGRESS.md

marcosremar

Fix emotion2vec loading - use wav2vec2 compatible model

d669352 3 months ago

2.77 kB

	# 🚀 Real-Time Progress - SkyPilot Fine-tuning

	Status: ⏳ IN PROGRESS
	Started: 2025-12-02 13:00 UTC
	Cluster: sky-33ba-marcos

	---

	## 📊 Current Job: Fine-tuning

	### Machine Provisioned:
	```
	Provider: Vast.ai (Czechia, CZ, EU)
	Instance: A100 SXM4
	vCPUs: 32 cores
	RAM: 64GB
	GPU: A100 (1x)
	Cost: $0.00/hr ✨ FREE!
	```

	### What's Running:
	1. ✅ Machine provisioned
	2. ⏳ Installing dependencies (torch, transformers, librosa)
	3. ⏳ Cloning repository
	4. ⏳ Creating synthetic data (50 samples/emotion)
	5. ⏳ Preparing dataset
	6. ⏳ Fine-tuning emotion2vec (10 epochs)
	7. ⏳ Testing model

	### Estimated Time:
	- Setup: ~5min
	- Data generation: ~1min
	- Fine-tuning: ~20-30min
	- Testing: ~2min
	- Total: ~30-40min

	### Expected Output:
	```
	✅ Fine-tuning complete!
	Model saved to: models/emotion/emotion2vec_finetuned_synthetic/
	```

	---

	## 📝 How to Monitor

	### Check logs in real-time:
	```bash
	sky logs sky-33ba-marcos -f
	```

	### Check status:
	```bash
	sky status
	```

	### SSH to machine (while running):
	```bash
	sky ssh sky-33ba-marcos
	# Inside:
	cd ensemble-tts-annotation
	watch -n 1 nvidia-smi # Monitor GPU usage
	```

	---

	## 💰 Cost Tracking

	\| Item \| Cost \|
	\|------\|------\|
	\| Validation test \| $0.00 \|
	\| Fine-tuning (current) \| $0.00 (Vast.ai spot) \|
	\| Total so far \| $0.00 ✨ \|

	---

	## 🎯 After This Completes

	### Next Steps:
	1. Download model:
	```bash
	sky scp sky-33ba-marcos:~/ensemble-tts-annotation/models/emotion/finetuned/ ./models/
	```

	2. Test locally:
	```python
	from ensemble_tts import EnsembleAnnotator

	annotator = EnsembleAnnotator(mode='balanced', device='cuda')
	result = annotator.annotate('audio.wav')
	```

	3. Cleanup:
	```bash
	sky down sky-33ba-marcos
	```

	4. Then run:
	- Multi-GPU test (optional)
	- OR Full Orpheus annotation (118k samples)

	---

	## 📈 Progress Updates

	### ✅ Job Completed - Partial Success

	Time: 2025-12-02 13:03 UTC
	Duration: 3 minutes
	Status: ✅ SUCCEEDED (com erro no model loading)

	#### What Worked ✅
	- ✅ Machine provisioned (A100 SXM4, 32 vCPUs, 64GB RAM)
	- ✅ Dependencies installed (torch, transformers, librosa)
	- ✅ Repository cloned
	- ✅ 350 synthetic samples created (50/emotion)
	- ✅ Dataset prepared (data/prepared/synthetic_prepared)

	#### Issues Found ❌
	- ❌ emotion2vec model loading failed
	- ❌ Model requires `funasr` library (not standard transformers)
	- ❌ Fine-tuning didn't execute
	- ❌ Model testing failed

	#### Next Steps 🔧
	1. Update emotion2vec implementation to use compatible wav2vec2
	2. Re-run fine-tuning with corrected code
	3. Or: Install funasr for native emotion2vec support

	Last update: 2025-12-02 13:07 UTC - Completed with model loading error