1 5 12

Xinfa Zhu

xfzhu

https://orcid.org/0000-0001-9275-523X

zxf-icpc

AI & ML interests

Speech Generation

Recent Activity

liked a model 13 days ago

Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

liked a Space 19 days ago

Qwen/Qwen3-TTS

upvoted a collection 19 days ago

Qwen3-TTS

View all activity

Organizations

liked a model 13 days ago

Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

Text-to-Speech • 2B • Updated 13 days ago • 476k • 916

liked a Space 19 days ago

Qwen3-TTS Demo

🎙

1.36k

Generate speech from text with voice design, cloning, or speakers

upvoted a collection 19 days ago

Qwen3-TTS

Collection

7 items • Updated 19 days ago • 284

commented a paper 11 months ago

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published Feb 18, 2025 • 86 •

liked a dataset 11 months ago

ASLP-lab/Emo-Emilia

Viewer • Updated Feb 27, 2025 • 1.4k • 635 • 8

authored a paper 11 months ago

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Paper • 2503.01710 • Published Mar 3, 2025 • 6

liked a Space 11 months ago

Spark TTS

🌖

230

A text-to-speech model powered by SparkAudio and Mobvoi.

liked a model 12 months ago

SparkAudio/Spark-TTS-0.5B

Text-to-Speech • Updated Mar 7, 2025 • 943 • 725

liked a dataset 12 months ago

HKUSTAudio/Audio-FLAN-Dataset

Preview • Updated Oct 6, 2025 • 285 • 40

upvoted a collection 12 months ago

Llasa

Collection

TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 11 items • Updated May 11, 2025 • 20

liked a Space 12 months ago

OSUM

💬

西北工业大学ASLP实验室OSUM项目demo展示

upvoted an article 12 months ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

Feb 11, 2025

•

upvoted a paper about 1 year ago

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6, 2025 • 27

liked 2 models about 1 year ago

HKUSTAudio/Llasa-1B

Text-to-Speech • 1B • Updated May 10, 2025 • 14.5k • 102

HKUSTAudio/Llasa-3B

Text-to-Speech • 4B • Updated May 10, 2025 • 547 • 525

authored a paper about 1 year ago

Autoregressive Speech Synthesis with Next-Distribution Prediction

Paper • 2412.16846 • Published Dec 22, 2024 • 1

upvoted a paper about 1 year ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 87

liked a model almost 2 years ago

meta-llama/Meta-Llama-3-8B-Instruct

Text Generation • 8B • Updated Jun 18, 2025 • 1.46M • • 4.38k

liked a dataset almost 2 years ago

Wenetspeech4TTS/WenetSpeech4TTS

Updated Jul 25, 2024 • 699 • 84

liked a Space almost 2 years ago

Whisper

📉

2.7k

Transcribe audio or YouTube videos into text with Whisper