EvoClaw: Evaluating AI Agents on Continuous Software Evolution Paper • 2603.13428 • Published 11 days ago • 19
Rethinking the Harmonic Loss via Non-Euclidean Distance Layers Paper • 2603.10225 • Published 13 days ago
LLM2Vec-Gen: Generative Embeddings from Large Language Models Paper • 2603.10913 • Published 12 days ago • 43
Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 Paper • 2406.16777 • Published Jun 24, 2024 • 1
LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning Paper • 2602.07075 • Published Feb 6 • 18
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published Jan 24 • 6
PingPong: A Natural Benchmark for Multi-Turn Code-Switching Dialogues Paper • 2601.17277 • Published Jan 24 • 6
INTIMA: A Benchmark for Human-AI Companionship Behavior Paper • 2508.09998 • Published Aug 4, 2025 • 11
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 20