Sailor-Agents

community

Activity Feed

AI & ML interests

None defined yet.

ufotalent

authored 3 papers 4 months ago

submitted a paper to Daily Papers 4 months ago

Revisiting Parameter Server in LLM Post-Training

Paper • 2601.19362 • Published Jan 27 • 8

dreamerdeo

authored 2 papers 6 months ago

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132

Training Optimal Large Diffusion Language Models

Paper • 2510.03280 • Published Sep 28, 2025

lkevinzc

authored 6 papers 8 months ago

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Paper • 2206.10558 • Published Jun 21, 2022 • 2

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14, 2025 • 13

SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis

Paper • 2506.02096 • Published Jun 2, 2025 • 52

Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70

Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69

GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 92

lkevinzc

authored a paper 11 months ago

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30, 2025 • 51

lkevinzc

authored 2 papers about 1 year ago

Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27, 2025 • 27

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19, 2025 • 36

dreamerdeo

authored 4 papers about 1 year ago

FlowReasoner: Reinforcing Query-Level Meta-Agents

Paper • 2504.15257 • Published Apr 21, 2025 • 47

NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

Paper • 2504.13055 • Published Apr 17, 2025 • 19

SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types

Paper • 2412.11757 • Published Dec 16, 2024

Efficient Process Reward Model Training via Active Learning

Paper • 2504.10559 • Published Apr 14, 2025 • 13

ufotalent

authored a paper about 1 year ago

PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization

Paper • 2503.01328 • Published Mar 3, 2025 • 16

AI & ML interests

Team members 9

Sailor-Agents's activity