ldp72/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28 Text Generation • 8B • Updated 5 days ago • 19
ldp72/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28 Text Generation • 8B • Updated 5 days ago • 19
Running 3.9k The Ultra-Scale Playbook 🌌 3.9k The ultimate guide to training LLM on large GPU Clusters
DivMerge: A divergence-based model merging method for multi-tasking Paper • 2509.02108 • Published Sep 2, 2025 • 26