vantagewithai commited on
Commit
6b20de8
Β·
verified Β·
1 Parent(s): fd62bca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -3
README.md CHANGED
@@ -1,3 +1,63 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-to-speech
4
+ library_name: transformers
5
+ ---
6
+ ## Step-Audio-EditX
7
+
8
+ **Files Needed for Vantage-Step-Audio-EditX ComfyUI node**
9
+
10
+ **Original Model Link:** [https://huggingface.co/stepfun-ai/Step-Audio-EditX](https://huggingface.co/stepfun-ai/Step-Audio-EditX)
11
+
12
+ **Watch us at Youtube:** [@VantageWithAI](https://www.youtube.com/@vantagewithai)
13
+
14
+ 🌟 [ComfyUI Node](https://github.com/stepfun-ai/Step-Audio-EditX)
15
+
16
+
17
+ After downloading the models, copy them into ComfyUI/models, you should have the following structure:
18
+ ```
19
+ ComfyUI/
20
+ β”œβ”€β”€ models/
21
+ β”‚ β”œβ”€β”€ Step-Audio-EditX/
22
+ β”‚ β”œβ”€β”€β”€β”€ CosyVoice-300M-25Hz/
23
+ β”‚ β”‚ β”œβ”€β”€β”€ campplus.onnx
24
+ β”‚ β”‚ β”œβ”€β”€β”€ cosyvoice.yaml
25
+ β”‚ β”‚ β”œβ”€β”€β”€ flow.pt
26
+ β”‚ β”‚ └─── hift.pt
27
+ β”‚ β”œβ”€β”€ model.safetensors
28
+ β”‚ └── speech_tokenizer_v1.onnx
29
+ ```
30
+
31
+ ## Features
32
+ - **Zero-Shot TTS**
33
+ - Excellent zero-shot TTS cloning for Mandarin, English, Sichuanese, and Cantonese.
34
+ - To use a dialect, just add a **[Sichuanese]** or **[Cantonese]** tag before your text.
35
+
36
+ - **Emotion and Speaking Style Editing**
37
+ - Remarkably effective iterative control over emotions and styles, supporting **dozens** of options for editing.
38
+ - Emotion Editing : [ *Angry*, *Happy*, *Sad*, *Excited*, *Fearful*, *Surprised*, *Disgusted*, etc. ]
39
+ - Speaking Style Editing: [ *Act_coy*, *Older*, *Child*, *Whisper*, *Serious*, *Generous*, *Exaggerated*, etc.]
40
+ - Editing with more emotion and more speaking styles is on the way. **Get Ready!** πŸš€
41
+
42
+ - **Paralinguistic Editing**:
43
+ - Precise control over 10 types of paralinguistic features for more natural, human-like, and expressive synthetic audio.
44
+ - Supporting Tags:
45
+ - [ *Breathing*, *Laughter*, *Suprise-oh*, *Confirmation-en*, *Uhm*, *Suprise-ah*, *Suprise-wa*, *Sigh*, *Question-ei*, *Dissatisfaction-hnn* ]
46
+
47
+ For more examples, see [demo page](https://stepaudiollm.github.io/step-audio-editx/).
48
+
49
+
50
+ ## Citation
51
+
52
+ ```
53
+ @misc{yan2025stepaudioeditxtechnicalreport,
54
+ title={Step-Audio-EditX Technical Report},
55
+ author={Chao Yan and Boyong Wu and Peng Yang and Pengfei Tan and Guoqiang Hu and Yuxin Zhang and Xiangyu and Zhang and Fei Tian and Xuerui Yang and Xiangyu Zhang and Daxin Jiang and Gang Yu},
56
+ year={2025},
57
+ eprint={2511.03601},
58
+ archivePrefix={arXiv},
59
+ primaryClass={cs.CL},
60
+ url={https://arxiv.org/abs/2511.03601},
61
+ }
62
+
63
+ ```