vantagewithai
/

Step-Fun-EditX-ComfyUI

Model card Files Files and versions

vantagewithai commited on Nov 25, 2025

Commit

6b20de8

·

verified ·

1 Parent(s): fd62bca

Update README.md

Files changed (1) hide show

README.md +63 -3

README.md CHANGED Viewed

@@ -1,3 +1,63 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: text-to-speech
+library_name: transformers
+---
+## Step-Audio-EditX
+**Files Needed for Vantage-Step-Audio-EditX ComfyUI node**
+**Original Model Link:** [https://huggingface.co/stepfun-ai/Step-Audio-EditX](https://huggingface.co/stepfun-ai/Step-Audio-EditX)
+**Watch us at Youtube:** [@VantageWithAI](https://www.youtube.com/@vantagewithai)
+🌟 [ComfyUI Node](https://github.com/stepfun-ai/Step-Audio-EditX)
+After downloading the models, copy them into ComfyUI/models, you should have the following structure:
+```
+ComfyUI/
+├── models/
+│   ├── Step-Audio-EditX/
+│   ├──── CosyVoice-300M-25Hz/
+│   │     ├─── campplus.onnx
+│   │     ├─── cosyvoice.yaml
+│   │     ├─── flow.pt
+│   │     └─── hift.pt
+│   ├── model.safetensors
+│   └── speech_tokenizer_v1.onnx
+```
+## Features
+- **Zero-Shot TTS**
+  - Excellent zero-shot TTS cloning for Mandarin, English, Sichuanese, and Cantonese.
+  - To use a dialect, just add a **[Sichuanese]** or **[Cantonese]** tag before your text.
+- **Emotion and Speaking Style Editing**
+  - Remarkably effective iterative control over emotions and styles, supporting **dozens** of options for editing.
+    - Emotion Editing : [ *Angry*, *Happy*, *Sad*, *Excited*, *Fearful*, *Surprised*, *Disgusted*, etc. ]
+    - Speaking Style Editing: [ *Act_coy*, *Older*, *Child*, *Whisper*, *Serious*, *Generous*, *Exaggerated*, etc.]
+    - Editing with more emotion and more speaking styles is on the way. **Get Ready!** 🚀
+- **Paralinguistic Editing**:
+  -  Precise control over 10 types of paralinguistic features for more natural, human-like, and expressive synthetic audio.
+  - Supporting Tags:
+    - [ *Breathing*, *Laughter*, *Suprise-oh*, *Confirmation-en*, *Uhm*, *Suprise-ah*, *Suprise-wa*, *Sigh*, *Question-ei*, *Dissatisfaction-hnn* ]
+For more examples, see [demo page](https://stepaudiollm.github.io/step-audio-editx/).
+## Citation
+```
+@misc{yan2025stepaudioeditxtechnicalreport,
+      title={Step-Audio-EditX Technical Report},
+      author={Chao Yan and Boyong Wu and Peng Yang and Pengfei Tan and Guoqiang Hu and Yuxin Zhang and Xiangyu and Zhang and Fei Tian and Xuerui Yang and Xiangyu Zhang and Daxin Jiang and Gang Yu},
+      year={2025},
+      eprint={2511.03601},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2511.03601},
+}
+```