view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs 30 days ago β’ 61
Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs Paper β’ 2604.18203 β’ Published 17 days ago β’ 6
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. β’ 12 items β’ Updated 17 days ago β’ 145
view article Article I Built a RAG System That Listens to Live BBC News and Answers Questions About "What Happened 10 Minutes Ago" Dec 9, 2025 β’ 14
Benchmarking Debiasing Methods for LLM-based Parameter Estimates Paper β’ 2506.09627 β’ Published Jun 11, 2025 β’ 1
Platonic Representations for Poverty Mapping: Unified Vision-Language Codes or Agent-Induced Novelty? Paper β’ 2508.01109 β’ Published Aug 1, 2025 β’ 4
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper β’ 2504.21233 β’ Published Apr 30, 2025 β’ 49
Selecting Optimal Candidate Profiles in Adversarial Environments Using Conjoint Analysis and Machine Learning Paper β’ 2504.19043 β’ Published Apr 26, 2025 β’ 4
Degrees of Randomness in Rerandomization Procedures Paper β’ 2310.00861 β’ Published Oct 2, 2023 β’ 1
Can Large Language Models (or Humans) Distill Text? Paper β’ 2403.16584 β’ Published Mar 25, 2024 β’ 3