ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents Paper • 2410.06703 • Published Oct 9, 2024 • 3
MedSAM2: Segment Anything in 3D Medical Images and Videos Paper • 2504.03600 • Published Apr 4, 2025 • 10
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20, 2025 • 194
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19, 2025 • 69
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning Paper • 2409.20566 • Published Sep 30, 2024 • 55
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 127
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation Paper • 2410.02458 • Published Oct 3, 2024 • 9