In a Training Loop 🔄

1 59 146

Peng Wang

stillarrow

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

liked a dataset 15 days ago

open-thoughts/OpenThoughts-Agent-v1-SFT

upvoted a paper about 1 month ago

TradingAgents: Multi-Agents LLM Financial Trading Framework

upvoted a paper about 1 month ago

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

View all activity

Organizations

None yet

liked a dataset 15 days ago

open-thoughts/OpenThoughts-Agent-v1-SFT

Viewer • Updated Jan 27 • 15.2k • 2.79k • 96

upvoted 3 papers about 1 month ago

TradingAgents: Multi-Agents LLM Financial Trading Framework

Paper • 2412.20138 • Published Dec 28, 2024 • 101

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Paper • 2605.21468 • Published May 20 • 51

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Paper • 2605.14212 • Published May 14 • 18

liked a dataset about 1 month ago

agentica-org/DeepCoder-Preview-Dataset

Viewer • Updated Apr 9, 2025 • 25k • 1.98k • 106

liked a model about 1 month ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Text Generation • 33B • Updated Feb 24, 2025 • 743k • • 1.57k

updated a model about 1 month ago

stillarrow/qwen2.5-coder-1.5b-instruct__scpo_no_std_code_hidden_only_shortcut_guard

Text Generation • 2B • Updated May 9 • 84 • 1

upvoted a paper about 2 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 233

updated a model about 2 months ago

stillarrow/qwen2.5-coder-1.5b-instruct__grpo_no_std_code_hidden_only_shortcut_guard

Updated May 6

published 2 models about 2 months ago

stillarrow/qwen2.5-coder-1.5b-instruct__scpo_no_std_code_hidden_only_shortcut_guard

Text Generation • 2B • Updated May 9 • 84 • 1

stillarrow/qwen2.5-coder-1.5b-instruct__jspo_no_std_code_hidden_only_shortcut_guard

Updated May 7

updated a model about 2 months ago

stillarrow/qwen2.5-coder-1.5b-instruct__jspo_no_std_code_hidden_only_shortcut_guard

Updated May 7

published a model about 2 months ago

stillarrow/qwen2.5-coder-1.5b-instruct__grpo_no_std_code_hidden_only_shortcut_guard

Updated May 6

updated a model about 2 months ago

stillarrow/qwen2.5-math-7b__math_subject_proportional_cluster-246fecfa-et_mix_lambda_no_drift_off_ratio_100

Updated May 6

published a model about 2 months ago

stillarrow/qwen2.5-math-7b__math_subject_proportional_cluster-246fecfa-et_mix_lambda_no_drift_off_ratio_100

Updated May 6

updated a model about 2 months ago

stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-0939fc56-policy_lambda_no_drift_off_ratio_100

Updated May 6

published a model about 2 months ago

stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-0939fc56-policy_lambda_no_drift_off_ratio_100

Updated May 6

upvoted a paper about 2 months ago

Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

Paper • 2602.10090 • Published Feb 10 • 53

updated 2 models about 2 months ago

stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-6bc47709-et_mix_lambda_no_drift_off_ratio_100

Updated May 5

stillarrow/qwen2.5-math-7b__skill_accuracy_binning_max_entrop-aabaf976-policy_lambda_no_drift_off_ratio_100

Updated May 5

Peng Wang

AI & ML interests

Recent Activity

Organizations

stillarrow's activity