GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving Paper • 2510.11769 • Published Oct 13, 2025 • 26
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published May 5, 2025 • 25
Self-rewarding correction for mathematical reasoning Paper • 2502.19613 • Published Feb 26, 2025 • 82
$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization Paper • 2410.04717 • Published Oct 7, 2024 • 18
Instruction Diversity Drives Generalization To Unseen Tasks Paper • 2402.10891 • Published Feb 16, 2024
PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis Paper • 2309.05833 • Published Sep 11, 2023
PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models Paper • 2406.06887 • Published Jun 11, 2024 • 2
Accelerated Convergence of Stochastic Heavy Ball Method under Anisotropic Gradient Noise Paper • 2312.14567 • Published Dec 22, 2023 • 1
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models Paper • 2306.12420 • Published Jun 21, 2023 • 2
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment Paper • 2304.06767 • Published Apr 13, 2023 • 2
AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets Paper • 2401.01916 • Published Jan 3, 2024 • 1
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning Paper • 2403.17919 • Published Mar 26, 2024 • 16
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion Paper • 2401.12947 • Published Jan 23, 2024 • 4