view article Article Supercharge your OCR Pipelines with Open Models +5 merve, ariG23498, davanstrien, hynky, andito, reach-vb, pcuenq ⢠Oct 21, 2025 ⢠315
Less is More: Recursive Reasoning with Tiny Networks Paper ⢠2510.04871 ⢠Published Oct 6, 2025 ⢠517
VibeVoice Collection Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ ⢠8 items ⢠Updated Mar 2 ⢠247
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing Paper ⢠2506.21448 ⢠Published Jun 26, 2025 ⢠9
Seedance 1.0: Exploring the Boundaries of Video Generation Models Paper ⢠2506.09113 ⢠Published Jun 10, 2025 ⢠109
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding Paper ⢠2505.18079 ⢠Published May 23, 2025 ⢠5
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper ⢠2504.02542 ⢠Published Apr 3, 2025 ⢠52
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity Paper ⢠2503.16418 ⢠Published Mar 20, 2025 ⢠36
Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. ⢠17 items ⢠Updated Jul 10, 2025 ⢠213
Cosmos-Preidct1 Collection ā ļø This collection is archived. š https://huggingface.co/collections/nvidia/cosmos3 ⢠14 items ⢠Updated 16 days ago ⢠304