Qwen/Qwen3-Coder-480B-A35B-Instruct Text Generation • 480B • Updated Aug 21, 2025 • 27.4k • • 1.29k
deepseek-ai/DeepSeek-Prover-V2-671B Text Generation • 685B • Updated Apr 30, 2025 • 320 • • 817
Alibaba-NLP/gte-Qwen2-7B-instruct Sentence Similarity • 8B • Updated Mar 24, 2025 • 56.8k • 477
Running 3.67k The Ultra-Scale Playbook 🌌 3.67k The ultimate guide to training LLM on large GPU Clusters