Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 21
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 21
yujunzhou/SFT_Advanced_Risk_Self_Grading_Qwen3-4B-Base Text Generation • 4B • Updated Dec 17, 2025 • 5
yujunzhou/SFT_Advanced_Risk_Self_Grading_Qwen3-4B-Base Text Generation • 4B • Updated Dec 17, 2025 • 5
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B Text Generation • 4B • Updated Dec 17, 2025 • 1
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B Text Generation • 4B • Updated Dec 17, 2025 • 1
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 2
yujunzhou/SFT_Advanced_Risk_Reward_Tampering_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025 • 2
yujunzhou/SFT_Advanced_Risk_Situation_Aware_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025
yujunzhou/SFT_Advanced_Risk_Situation_Aware_Qwen3-4B-Base Text Generation • 4B • Updated Dec 16, 2025