Frank's picture

Frank

frank0125

AI & ML interests

Speech Modeling

Recent Activity

upvoted a paper about 11 hours ago

ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood

upvoted a paper about 11 hours ago

GenClaw: Code-Driven Agentic Image Generation

upvoted a paper about 11 hours ago

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

View all activity

Organizations

None yet

upvoted 3 papers about 11 hours ago

ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood

Paper • 2605.29257 • Published 1 day ago • 3

GenClaw: Code-Driven Agentic Image Generation

Paper • 2605.30248 • Published 1 day ago • 24

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Paper • 2605.29250 • Published 1 day ago • 50

upvoted 3 papers 3 months ago

Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data

Paper • 2603.07534 • Published Mar 8 • 5

DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning

Paper • 2603.12257 • Published Mar 12 • 31

ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation

Paper • 2603.11421 • Published Mar 12 • 34

upvoted 3 papers 4 months ago

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions

Paper • 2601.17640 • Published Jan 25 • 6

daVinci-Dev: Agent-native Mid-training for Software Engineering

Paper • 2601.18418 • Published Jan 26 • 126

Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis

Paper • 2601.14417 • Published Jan 20 • 5

liked 7 models 10 months ago

openbmb/MiniCPM-V-4

Image-Text-to-Text • 4B • Updated Sep 15, 2025 • 239k • 464

openai/gpt-oss-120b

Text Generation • 120B • Updated Aug 26, 2025 • 4.85M • • 4.82k

openai/gpt-oss-20b

Text Generation • 22B • Updated Aug 26, 2025 • 8.11M • • 4.65k

tiantiaf/voxlect-spanish-dialect-whisper-large-v3

Audio Classification • 2B • Updated Aug 10, 2025 • 76 • 5

tiantiaf/voxlect-english-dialect-whisper-small

Audio Classification • 90.4M • Updated Aug 10, 2025 • 20 • 2

tiantiaf/voxlect-arabic-dialect-whisper-small

Audio Classification • 90.4M • Updated Aug 10, 2025 • 5 • 2

KittenML/kitten-tts-nano-0.1

Updated Aug 30, 2025 • 39.8k • 514

upvoted 3 papers 10 months ago

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report

Paper • 2508.01059 • Published Aug 1, 2025 • 34

Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 276

Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe

Paper • 2508.01691 • Published Aug 3, 2025 • 10

upvoted a paper about 1 year ago

Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published May 22, 2025 • 58