ReaderLM-v2: Small Language Model for HTML to Markdown and JSON Paper • 2503.01151 • Published Mar 3, 2025 • 3
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models Paper • 2401.03506 • Published Jan 7, 2024 • 16
Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS Paper • 2604.11269 • Published 25 days ago • 1
M3SD: Multi-modal, Multi-scenario and Multi-language Speaker Diarization Dataset Paper • 2506.14427 • Published Jun 17, 2025 • 1
MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization Paper • 2601.01554 • Published Jan 4 • 60
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models Paper • 2508.06372 • Published Aug 8, 2025 • 3
VideoPrism Collection VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks. • 5 items • Updated Mar 12 • 19
Gemma 4 Collection Gemma 4 is Google's new model family including including E2B, E4B, 26B-A4B, and 31B. • 28 items • Updated 15 days ago • 176
view article Article Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines +2 Mar 5 • 51
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 Mar 10 • 143