view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator Dec 17, 2025 β’ 46
Agent READMEs: An Empirical Study of Context Files for Agentic Coding Paper β’ 2511.12884 β’ Published Nov 17, 2025 β’ 21
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models Paper β’ 2508.06471 β’ Published Aug 8, 2025 β’ 205
Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings Paper β’ 2508.00632 β’ Published Aug 1, 2025 β’ 4
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm Paper β’ 2507.18553 β’ Published Jul 24, 2025 β’ 41
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement Paper β’ 2507.18742 β’ Published Jul 24, 2025 β’ 6
view article Article Automated Discovery of High-Performance GPU Kernels with OpenEvolve Jun 27, 2025 β’ 25
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper β’ 2507.06181 β’ Published Jul 8, 2025 β’ 45
Configurable Preference Tuning βοΈπ Collection CPT uses rubric-guided synthetic data and DPO to enable LLMs to dynamically adjust behavior (e.g., writing style) at inference with system prompts β’ 7 items β’ Updated Jun 17, 2025 β’ 1
Configurable Preference Tuning with Rubric-Guided Synthetic Data Paper β’ 2506.11702 β’ Published Jun 13, 2025 β’ 1
Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit Paper β’ 2506.06607 β’ Published Jun 7, 2025 β’ 2