GRPO Reddit Posts Summarization(LFM & Qwen) Collection GRPO reward-signal ablation on LFM-2.5-350M & Qwen2.5-0.5B for 50-word Reddit summarization. Trained on Apple Silicon via smolcluster (MLX). • 4 items • Updated 10 days ago
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-rouge Summarization • 0.5B • Updated 10 days ago • 157
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-meteor-rouge Summarization • 0.5B • Updated 10 days ago • 145
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-meteor-bleu Summarization • 0.5B • Updated 10 days ago • 123
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-meteor Summarization • 0.5B • Updated 10 days ago • 129
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-bleu-rouge Summarization • 0.5B • Updated 10 days ago • 131
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-bleu Summarization • 0.5B • Updated 10 days ago • 192
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-only Summarization • 0.5B • Updated 10 days ago • 904
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-rouge Summarization • 0.4B • Updated 10 days ago • 241
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-meteor-rouge Summarization • 0.4B • Updated 10 days ago • 158
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-meteor-bleu Summarization • 0.4B • Updated 10 days ago • 150
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-meteor Summarization • 0.4B • Updated 10 days ago • 143
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-bleu-rouge Summarization • 0.4B • Updated 10 days ago • 139
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-bleu Summarization • 0.4B • Updated 10 days ago • 123