LI
RogerZhuo
AI & ML interests
None yet
Recent Activity
liked
a Space
about 5 hours ago
Qwen/Qwen3-TTS
liked
a dataset
14 days ago
MiniMaxAI/OctoCodingBench
liked
a model
about 1 month ago
microsoft/TRELLIS.2-4B
Organizations
Reading
Music
-
ElectricAlexis/NotaGen
Updated • 149 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 25 -
Running on ZeroFeatured676
Di♪♪Rhythm
🎶676Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 29
AI Arena
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 11.4k • • 427 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 34 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 336 • • 346
LLM
基础大模型相关
must-read-papers
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 12 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 169
OCR
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 805k • • 12.2k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 246k • 379 -
Running on L4Featured282
Thera Arbitrary-Scale Super-Resolution
🔥282Enhance images by increasing their resolution
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 92 • 327
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 9.8k • 425 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 7 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 36 -
franciszzj/Leffa
Image-to-Image • Updated • 340 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 28 -
Running on Zero62
TryOffDiff
🔥62Extract garment images from everyday images!
Data
must-read-papers
Reading
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 12 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 169
Music
-
ElectricAlexis/NotaGen
Updated • 149 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 25 -
Running on ZeroFeatured676
Di♪♪Rhythm
🎶676Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 29
OCR
AI Arena
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 805k • • 12.2k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 246k • 379 -
Running on L4Featured282
Thera Arbitrary-Scale Super-Resolution
🔥282Enhance images by increasing their resolution
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 92 • 327
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 11.4k • • 427 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 34 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 336 • • 346
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 9.8k • 425 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 7 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
LLM
基础大模型相关
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 36 -
franciszzj/Leffa
Image-to-Image • Updated • 340 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 28 -
Running on Zero62
TryOffDiff
🔥62Extract garment images from everyday images!