When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA Paper • 2510.04849 • Published Oct 6, 2025 • 115
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA Paper • 2505.21115 • Published May 27, 2025 • 140
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published Feb 18, 2025 • 74
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published Jan 22, 2025 • 69
Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task Paper • 2406.14213 • Published Jun 20, 2024 • 21
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper • 2406.10149 • Published Jun 14, 2024 • 52