view article Article OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments +3 29 days ago • 31
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published Jan 30 • 216
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders Paper • 2602.05027 • Published Feb 4 • 60
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 34
view article Article Strand-Rust-Coder-v1: Rust Coding Model Fine-Tuned on Peer-Ranked Synthetic Data Dec 11, 2025 • 4
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator Dec 17, 2025 • 47
view article Article AprielGuard: A Guardrail for Safety and Adversarial Robustness in Modern LLM Systems Dec 23, 2025 • 48
Health AI Developer Foundations (HAI-DEF) Collection Groups models released for use in health AI by Google. Read more about HAI-DEF at http://goo.gle/hai-def • 22 items • Updated about 21 hours ago • 203
Tiny-A2D Collection Small diffusion language models adapted from AR models • 4 items • Updated Dec 6, 2025 • 18
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens Paper • 2511.19418 • Published Nov 24, 2025 • 29