MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents Paper • 2603.09827 • Published 3 days ago • 25
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published Jan 17 • 34
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2, 2025 • 57
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 28 days ago • 54
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 28 days ago • 54
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 28 days ago • 54 • 4
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 28 days ago • 54
view post Post 8477 You can now run MiniMax-2.5 locally! 🚀At 230B parameters, MiniMax-2.5 is the strongest LLM under 700B params, delivering SOTA agentic coding & chat.Run Dynamic 3/4-bit on a 128GB Mac for 20 tokens/s.Guide: https://unsloth.ai/docs/models/minimax-2.5GGUF: unsloth/MiniMax-M2.5-GGUF See translation 1 reply · 🔥 24 24 ❤️ 4 4 + Reply