MIRA Collection Group-specific quality scorers from MIRA for mid-training data selection. • 12 items • Updated about 1 month ago • 2
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 12 days ago • 207
MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection Paper • 2605.30288 • Published 30 days ago • 23
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published May 20 • 207
ClawGym: A Scalable Framework for Building Effective Claw Agents Paper • 2604.26904 • Published Apr 29 • 54
Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing Paper • 2512.23611 • Published Dec 29, 2025 • 7
Context as a Tool: Context Management for Long-Horizon SWE-Agents Paper • 2512.22087 • Published Dec 26, 2025 • 4
Scaling Laws for Code: Every Programming Language Matters Paper • 2512.13472 • Published Dec 15, 2025 • 17
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression Paper • 2604.19572 • Published Apr 21 • 23
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published Apr 3 • 239
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published Mar 17 • 312
McEval Collection McEval: Massively Multilingual Code Evaluation • 2 items • Updated Nov 11, 2024 • 1
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published Sep 24, 2024 • 41
TableBench Collection TableBench: A Comprehensive and Complex Benchmark for Table Question Answering • 7 items • Updated Mar 2 • 3
FuzzCoder: Byte-level Fuzzing Test via Large Language Model Paper • 2409.01944 • Published Sep 3, 2024 • 45