dfuhoiysOHSVFh82934gfjklb

huba-buba

227 448

AI & ML interests

None yet

Recent Activity

upvoted a paper 8 days ago

World Action Models: A Survey

upvoted a paper 27 days ago

Trust Region On-Policy Distillation

upvoted a paper 27 days ago

Self-Distilled Policy Gradient

View all activity

Organizations

None yet

upvoted a paper 8 days ago

World Action Models: A Survey

Paper • 2606.20781 • Published 14 days ago • 56

upvoted 2 papers 27 days ago

Trust Region On-Policy Distillation

Paper • 2606.01249 • Published May 31 • 45

Self-Distilled Policy Gradient

Paper • 2606.04036 • Published 30 days ago • 27

upvoted a paper 30 days ago

Trust-Region Behavior Blending for On-Policy Distillation

Paper • 2605.31159 • Published May 29 • 68

upvoted a paper 3 months ago

The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook

Paper • 2604.02029 • Published Apr 2 • 152

upvoted 3 papers 4 months ago

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published Mar 10 • 158

Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem

Paper • 2512.24873 • Published Dec 31, 2025 • 109

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 221

upvoted an article 4 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

lysandre, ArthurZ, cyrilvallez, reach-vb

•

Dec 1, 2025

• 312

upvoted 5 papers 4 months ago

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published Feb 12 • 62

Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

Paper • 2602.10388 • Published Feb 11 • 245

upvoted an article 5 months ago

Article

Forge: Scalable Agent RL Framework and Algorithm

MiniMax-AI

•

Feb 13

• 156

upvoted 5 papers 5 months ago

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

Paper • 2602.08222 • Published Feb 9 • 290

AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

Paper • 2602.06855 • Published Feb 6 • 83

QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

Paper • 2602.07085 • Published Feb 6 • 191

F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare

Paper • 2602.06717 • Published Feb 6 • 75

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Paper • 2602.04634 • Published Feb 4 • 100