ZhenYE

ZhenYe234

·

https://github.com/zhenye234

zhenye234

AI & ML interests

None yet

Recent Activity

authored a paper 22 days ago

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

upvoted a paper about 1 month ago

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

updated a model about 1 month ago

HKUSTAudio/Talker-T2AV

View all activity

Organizations

upvoted a paper about 1 month ago

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

Paper • 2606.02482 • Published Jun 1 • 36

upvoted a collection about 1 month ago

Talker-T2AV

Talker-T2AV • 3 items • Updated May 24 • 4

upvoted a paper about 1 month ago

Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

Paper • 2604.23586 • Published Apr 26 • 7

upvoted a paper 9 months ago

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km

Paper • 2510.09606 • Published Oct 10, 2025 • 18

upvoted a collection about 1 year ago

Canary-TTS

10 items • Updated Mar 2 • 3

upvoted a collection over 1 year ago

Multimodal Reasoning

179 items • Updated Feb 7 • 41

upvoted a paper over 1 year ago

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published Mar 6, 2025 • 72

upvoted an article over 1 year ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

Steveeeeeeen

•

Feb 11, 2025

• 34

upvoted a paper over 1 year ago

Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis

Paper • 2502.04128 • Published Feb 6, 2025 • 28

upvoted a collection over 1 year ago

Llasa

TTS foundation model compatible with Llama framework (160k hours tokenized speech data released) • 12 items • Updated 8 days ago • 22