2 11 7

ZhikangNiu-SII

zkniu

https://zhikangniu.github.io/

ZhikangNiu

AI & ML interests

Phd Student @ Shanghai Innovation Institute & SJTU

Recent Activity

liked a dataset 16 days ago

disco-eth/WorldSpeech

upvoted a paper 17 days ago

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

upvoted a paper 18 days ago

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

View all activity

Organizations

liked a dataset 16 days ago

disco-eth/WorldSpeech

Updated May 18 • 29.2k • 19

upvoted a paper 17 days ago

Light-WAM: Efficient World Action Models with State-Fusion Action Decoding

Paper • 2606.08242 • Published 22 days ago • 10

upvoted a paper 18 days ago

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Paper • 2606.09669 • Published 20 days ago • 46

upvoted a paper 19 days ago

MMAE: A Massive Multitask Audio Editing Benchmark

Paper • 2606.07229 • Published 23 days ago • 46

updated a Space 2 months ago

ml-intern sandbox

🌍

liked a dataset 2 months ago

malaysia-ai/Multilingual-TTS

Updated about 1 hour ago • 5.07k • 21

upvoted a paper 3 months ago

BubbleRAG: Evidence-Driven Retrieval-Augmented Generation for Black-Box Knowledge Graphs

Paper • 2603.20309 • Published Mar 19 • 21

upvoted a paper 5 months ago

UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

Paper • 2602.02437 • Published Feb 2 • 80

liked a model 5 months ago

microsoft/VibeVoice-ASR

Automatic Speech Recognition • 9B • Updated Jan 27 • 733k • 1.19k

upvoted 2 papers 8 months ago

InteractComp: Evaluating Search Agents With Ambiguous Queries

Paper • 2510.24668 • Published Oct 28, 2025 • 100

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

Paper • 2510.24693 • Published Oct 28, 2025 • 19

updated a model 8 months ago

zkniu/Semantic-VAE

Updated Oct 26, 2025 • 3

published a model 9 months ago

zkniu/Semantic-VAE

Updated Oct 26, 2025 • 3

upvoted an article about 1 year ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

Steveeeeeeen

•

Feb 11, 2025

• 34

upvoted a paper over 1 year ago

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Paper • 2502.13128 • Published Feb 18, 2025 • 41

liked a model over 1 year ago

SWivid/F5-TTS

Text-to-Speech • Updated Mar 21, 2025 • 739k • 1.18k

upvoted a paper over 1 year ago

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Paper • 2412.09596 • Published Dec 12, 2024 • 97

updated a model over 1 year ago

SWivid/F5-TTS

Text-to-Speech • Updated Mar 21, 2025 • 739k • 1.18k

liked a Space over 1 year ago

F5-TTS

🗣

2.88k

F5-TTS & E2-TTS: Zero-Shot Voice Cloning (Unofficial Demo)

liked a dataset over 2 years ago

Loie/VGGSound

Viewer • Updated Mar 26, 2023 • 1 • 2.51k • 54