-
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 48 -
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 25 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 506
Daniel Kloimwieder
dkkloimwieder
·
AI & ML interests
None yet
Organizations
None yet
Mdl
-
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 288 -
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Paper • 2502.02508 • Published • 22
Paper
-
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Paper • 2509.13761 • Published • 16 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 48 -
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Paper • 2510.03561 • Published • 25 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 506
Mdl
-
S*: Test Time Scaling for Code Generation
Paper • 2502.14382 • Published • 63 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Paper • 2501.04519 • Published • 288 -
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Paper • 2502.02508 • Published • 22
models
0
None public yet
datasets
0
None public yet