Switch to transformers inference (vLLM doesn't support KimiLinear architecture) 9905f0a aeb56 commited on Nov 10
Improve vLLM startup with tensor parallelism, better logging, and 10min timeout a82de92 aeb56 commited on Nov 10
Use sequential device_map to fix key naming conflicts during LoRA merge d3d4339 aeb56 commited on Nov 10
Add safe_merge and better error handling for LoRA merge with MoE models 79334bc aeb56 commited on Nov 10
Add 8-bit quantization support and switch to L4x4 hardware for availability e32298d aeb56 commited on Nov 10