116 280

Mwangi PRO

Benson

AI & ML interests

None yet

Recent Activity

upvoted a paper 2 days ago

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

upvoted a paper 3 days ago

Preference Learning Unlocks LLMs' Psycho-Counseling Skills

liked a dataset 4 days ago

zlab-princeton/Vero-600k

View all activity

Organizations

None yet

upvoted a paper 2 days ago

SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning

Paper • 2606.10804 • Published 16 days ago • 49

upvoted a paper 3 days ago

Preference Learning Unlocks LLMs' Psycho-Counseling Skills

Paper • 2502.19731 • Published Feb 27, 2025 • 8

upvoted a paper 9 days ago

OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains

Paper • 2606.14702 • Published 13 days ago • 31

upvoted a paper 23 days ago

Bernini: Latent Semantic Planning for Video Diffusion

Paper • 2605.22344 • Published May 21 • 19

upvoted a paper 26 days ago

OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

Paper • 2605.29250 • Published 28 days ago • 78

upvoted 3 papers 28 days ago

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

Paper • 2605.27295 • Published about 1 month ago • 23

LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV

Paper • 2605.26244 • Published May 25 • 38

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Paper • 2605.27365 • Published about 1 month ago • 144

upvoted 4 papers about 1 month ago

Lance: Unified Multimodal Modeling by Multi-Task Synergy

Paper • 2605.18678 • Published May 18 • 79

APRES: An Agentic Paper Revision and Evaluation System

Paper • 2603.03142 • Published Mar 3 • 3

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

Paper • 2605.09874 • Published May 11 • 2

jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition

Paper • 2605.08384 • Published May 8 • 11

upvoted a collection about 1 month ago

jina-embeddings-v5-omni

Collection

Multimodal (text + image + video + audio) embedding models aligned with jina-embeddings-v5-text-*. Two sizes, four task variants each. • 27 items • Updated May 12 • 36

upvoted a paper about 1 month ago

CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models

Paper • 2605.08735 • Published May 9 • 71

upvoted a paper about 2 months ago

SkillOS: Learning Skill Curation for Self-Evolving Agents

Paper • 2605.06614 • Published May 7 • 46

upvoted an article about 2 months ago

Article

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

nvidia

•

Apr 28

• 62

upvoted 2 papers 2 months ago

Qwen3.5-Omni Technical Report

Paper • 2604.15804 • Published Apr 17 • 59

VimRAG: Navigating Massive Visual Context in Retrieval-Augmented Generation via Multimodal Memory Graph

Paper • 2602.12735 • Published Feb 13 • 8

upvoted 2 papers 3 months ago

WAVE: Learning Unified & Versatile Audio-Visual Embeddings with Multimodal LLM

Paper • 2509.21990 • Published Sep 26, 2025 • 1

A Simple Baseline for Streaming Video Understanding

Paper • 2604.02317 • Published Apr 2 • 74

Mwangi PRO

AI & ML interests

Recent Activity

Organizations

Benson's activity

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents