GRPO Reddit Posts Summarization(LFM & Qwen) Collection GRPO reward-signal ablation on LFM-2.5-350M & Qwen2.5-0.5B for 50-word Reddit summarization. Trained on Apple Silicon via smolcluster (MLX). • 4 items • Updated 12 days ago
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-rouge Summarization • 0.5B • Updated 12 days ago • 135
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-meteor-rouge Summarization • 0.5B • Updated 12 days ago • 122
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-meteor-bleu Summarization • 0.5B • Updated 12 days ago • 100
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-meteor Summarization • 0.5B • Updated 12 days ago • 105
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-bleu-rouge Summarization • 0.5B • Updated 12 days ago • 113
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-quality-bleu Summarization • 0.5B • Updated 12 days ago • 137
YuvrajSingh9886/Qwen2.5-0.5B-grpo-summarization-length-only Summarization • 0.5B • Updated 12 days ago • 809
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-rouge Summarization • 0.4B • Updated 12 days ago • 201
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-meteor-rouge Summarization • 0.4B • Updated 12 days ago • 136
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-meteor-bleu Summarization • 0.4B • Updated 12 days ago • 127
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-meteor Summarization • 0.4B • Updated 12 days ago • 122
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-bleu-rouge Summarization • 0.4B • Updated 12 days ago • 114
YuvrajSingh9886/LFM2.5-350M-grpo-summarization-length-quality-bleu Summarization • 0.4B • Updated 12 days ago • 102