gijs commited on
Commit
5e01695
·
verified ·
1 Parent(s): 7ca7b55

README: list all 9 training datasets (expresso/vox1/vox2 were missing)

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -38,8 +38,8 @@ the SigLIP sigmoid contrastive loss.
38
 
39
  ## Training data
40
 
41
- Trained for **1 epoch** on the open `voiceclap_10` mixture used in the
42
- VoiceNet paper:
43
 
44
  - `emolia-balanced-5M-subset` (annotated subset of [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset))
45
  - `laions_got_talent_clean_with_captions`
@@ -47,6 +47,9 @@ VoiceNet paper:
47
  - `synthetic_vocal_bursts`
48
  - `improved_synthetic_vocal_bursts`
49
  - `ears`
 
 
 
50
 
51
  All clips are captioned with `MOSS-Audio-8B-Thinking`-derived dense vocal-style
52
  captions covering emotions, talking-style attributes, and demographics.
 
38
 
39
  ## Training data
40
 
41
+ Trained for **1 epoch** on the open `voiceclap_10_safe` mixture (9 datasets)
42
+ used in the VoiceNet paper:
43
 
44
  - `emolia-balanced-5M-subset` (annotated subset of [Emilia](https://huggingface.co/datasets/amphion/Emilia-Dataset))
45
  - `laions_got_talent_clean_with_captions`
 
47
  - `synthetic_vocal_bursts`
48
  - `improved_synthetic_vocal_bursts`
49
  - `ears`
50
+ - `expresso`
51
+ - `voxceleb1`
52
+ - `voxceleb2`
53
 
54
  All clips are captioned with `MOSS-Audio-8B-Thinking`-derived dense vocal-style
55
  captions covering emotions, talking-style attributes, and demographics.