Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation Paper • 2604.05083 • Published Apr 6
Contrastive Representation Learning: A Framework and Review Paper • 2010.05113 • Published Oct 10, 2020 • 1
NeurIPS 2025 E2LM Competition : Early Training Evaluation of Language Models Paper • 2506.07731 • Published Jun 9, 2025 • 2
Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance Paper • 2507.22448 • Published Jul 30, 2025 • 71
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published Mar 19 • 3
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time? Paper • 2603.19017 • Published Mar 19 • 3
A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents Paper • 2602.08964 • Published Feb 9 • 1
BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data Paper • 2510.10159 • Published Oct 11, 2025 • 3
Measuring what Matters: Construct Validity in Large Language Model Benchmarks Paper • 2511.04703 • Published Nov 3, 2025 • 8
MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute Control Paper • 2407.02736 • Published Jul 3, 2024
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture Paper • 2310.03052 • Published Oct 4, 2023 • 4
From RAG to Agentic RAG for Faithful Islamic Question Answering Paper • 2601.07528 • Published Jan 12 • 4
Prototypicality Bias Reveals Blindspots in Multimodal Evaluation Metrics Paper • 2601.04946 • Published Jan 8
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper • 2512.03073 • Published Nov 27, 2025 • 7
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 9