aishiknagar 's Collections RL and Agents
updated
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
• 2505.14146
• Published • 20
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications
of Agentic AI
Paper
• 2505.19443
• Published • 15
ARM: Adaptive Reasoning Model
Paper
• 2505.20258
• Published • 45
Enigmata: Scaling Logical Reasoning in Large Language Models with
Synthetic Verifiable Puzzles
Paper
• 2505.19914
• Published • 46
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published • 132
Active-O3: Empowering Multimodal Large Language Models with Active
Perception via GRPO
Paper
• 2505.21457
• Published • 16
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural
Language and Reinforcement Learning
Paper
• 2505.23754
• Published • 15
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
• 2505.24864
• Published • 146
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
• 2506.01939
• Published • 190
Resa: Transparent Reasoning Models via SAEs
Paper
• 2506.09967
• Published • 22
Reasoning with Exploration: An Entropy Perspective
Paper
• 2506.14758
• Published • 30
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning
Attention
Paper
• 2506.13585
• Published • 274
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain
Perspective
Paper
• 2506.14965
• Published • 50
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs
Paper
• 2506.15211
• Published • 39
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
• 2507.10532
• Published • 90
REST: Stress Testing Large Reasoning Models by Asking Multiple Problems
at Once
Paper
• 2507.10541
• Published • 30
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality,
Long Context, and Next Generation Agentic Capabilities
Paper
• 2507.06261
• Published • 67
LLMalMorph: On The Feasibility of Generating Variant Malware using
Large-Language-Models
Paper
• 2507.09411
• Published • 4
The Imitation Game: Turing Machine Imitator is Length Generalizable
Reasoner
Paper
• 2507.13332
• Published • 49