1 41 35

Zhisheng Zheng

zhisheng01

https://zhishengzheng.com/

zhisheng147

AI & ML interests

LLM, Speech and Audio Processing

Recent Activity

updated a dataset 22 days ago

zhisheng01/s2sisometric

published a dataset 22 days ago

zhisheng01/s2sisometric

updated a dataset 24 days ago

zhisheng01/mp3d-ambisonics

View all activity

Organizations

upvoted a paper 2 months ago

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

Paper • 2603.03269 • Published Mar 3 • 63

upvoted 2 papers 3 months ago

Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

Paper • 2602.11858 • Published Feb 12 • 63

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Paper • 2602.08794 • Published Feb 9 • 159

upvoted 2 papers 4 months ago

Qwen3-TTS Technical Report

Paper • 2601.15621 • Published Jan 22 • 75

MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization

Paper • 2601.01554 • Published Jan 4 • 60

upvoted a paper 6 months ago

VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 112

upvoted 2 papers 7 months ago

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

Paper • 2510.24693 • Published Oct 28, 2025 • 19

Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Paper • 2510.00515 • Published Oct 1, 2025 • 42

upvoted a paper 8 months ago

StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs

Paper • 2509.22220 • Published Sep 26, 2025 • 66

upvoted 3 papers 9 months ago

upvoted a paper 11 months ago

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Paper • 2506.17450 • Published Jun 20, 2025 • 64

upvoted 6 papers about 1 year ago

Kimi-Audio Technical Report

Paper • 2504.18425 • Published Apr 25, 2025 • 20

Charting and Navigating Hugging Face's Model Atlas

Paper • 2503.10633 • Published Mar 13, 2025 • 94

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

Paper • 2503.04724 • Published Mar 6, 2025 • 72

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25, 2025 • 74

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19, 2025 • 69

Soundwave: Less is More for Speech-Text Alignment in LLMs

Paper • 2502.12900 • Published Feb 18, 2025 • 86

upvoted a paper over 1 year ago

AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting

Paper • 2502.05176 • Published Feb 7, 2025 • 40

Zhisheng Zheng

AI & ML interests

Recent Activity

Organizations

zhisheng01's activity