FLV Collection Dataset and Models of Paper "Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification in Language Models" • 5 items • Updated 1 day ago • 1
Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification Paper • 2601.22642 • Published 4 days ago • 8
Measuring Hong Kong Massive Multi-Task Language Understanding Paper • 2505.02177 • Published May 4, 2025 • 1
SafeLawBench: Towards Safe Alignment of Large Language Models Paper • 2506.06636 • Published Jun 7, 2025 • 1
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving Paper • 2506.17104 • Published Jun 20, 2025 • 2
Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility Paper • 2601.17027 • Published 17 days ago • 41
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value Paper • 2512.14051 • Published Dec 16, 2025 • 46
GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models Paper • 2511.11134 • Published Nov 14, 2025 • 32