LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening Paper • 2605.19597 • Published May 19 • 21
DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training Paper • 2602.05890 • Published Feb 5 • 1
What Makes a Good Speech Tokenizer for LLM-Centric Speech Generation? A Systematic Study Paper • 2506.12537 • Published Jun 14, 2025 • 1
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training Paper • 2502.04066 • Published Feb 6, 2025
TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities Paper • 2407.21693 • Published Jul 31, 2024