RichardForests 's Collections Transformers & MoE
updated
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Paper
• 2312.07987
• Published • 41
Interfacing Foundation Models' Embeddings
Paper
• 2312.07532
• Published • 11
Point Transformer V3: Simpler, Faster, Stronger
Paper
• 2312.10035
• Published • 23
TheBloke/quantum-v0.01-GPTQ
Text Generation
• 7B • Updated • 7
• 2
Text Generation
• 36B • Updated • 6
• 1
mobiuslabsgmbh/Mixtral-8x7B-Instruct-v0.1-hf-attn-4bit-moe-2bit-HQQ
Text Generation
• Updated • 39
• 38
Denoising Vision Transformers
Paper
• 2401.02957
• Published • 31
DeepSeekMoE: Towards Ultimate Expert Specialization in
Mixture-of-Experts Language Models
Paper
• 2401.06066
• Published • 59
Buffer Overflow in Mixture of Experts
Paper
• 2402.05526
• Published • 9
Beyond Scaling Laws: Understanding Transformer Performance with
Associative Memory
Paper
• 2405.08707
• Published • 34