view article Article Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents 10 days ago • 45
TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment Paper • 2604.12012 • Published 25 days ago • 12
view article Article DeepSeek-V4: a million-token context that agents can actually use 14 days ago • 42
view article Article DenseOn with the LateOn: Open State-of-the-Art Single and Multi-Vector Models 17 days ago • 37
WildDet3D Collection This is the collection of WildDet3D artifacts, including demos, model checkpoints and data. https://github.com/allenai/WildDet3D • 8 items • Updated 25 days ago • 17
view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs about 1 month ago • 61
Falcon Perception Collection Falcon-Perception and Falcon-OCR model: early-fusion, natively multimodal, dense Autoregressive Transformer models. • 5 items • Updated Apr 6 • 14
view article Article SynthVision: Building a 110K Synthetic Medical VQA Dataset with Cross-Model Validation Mar 23 • 17
view article Article Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face Feb 11, 2025 • 122