view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 417
DFlash Collection Block Diffusion for Flash Speculative Decoding • 23 items • Updated 7 days ago • 142
Qwen-3.5-unsloth-mlx Collection AWQ-style pre-scaling using Unsloth's imatrix calibration data, then 3-6-bit affine quantization with the Unsloth mixed-precision recipe via MLX • 20 items • Updated Mar 29 • 20
view article Article WWDC 24: Running Mistral 7B with Core ML +2 pcuenq, FL33TW00D-HF, reach-vb, osanseviero • Jul 22, 2024 • 65
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 7 items • Updated Mar 7, 2024 • 3
Qwen1.5 Collection Qwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. • 53 items • Updated Mar 2 • 214