Monkey-patch transformers to disable flash attention via wrapper script 2900b36 aeb56 commited on Nov 12, 2025
Workaround flash-attn: create fake module with PyTorch fallback attention b705945 aeb56 commited on Nov 12, 2025
Add live status table and improved logging with attn_implementation=eager fix 0b25a32 aeb56 commited on Nov 12, 2025
Fix multi-GPU: use parallelize=True instead of device_map, update env var 96b6724 aeb56 commited on Nov 12, 2025
Aggressive memory cleanup: 5s wait, env vars, optional model loading 3fb1215 aeb56 commited on Nov 12, 2025
Fix OOM: Unload model before evaluation to free VRAM for lm_eval 74f609c aeb56 commited on Nov 12, 2025
Add Evaluation tab with ARC-Challenge, TruthfulQA, and Winogrande benchmarks 29f5263 aeb56 commited on Nov 10, 2025
Fix flash attention error by patching model config to use eager attention 2f60fd7 aeb56 commited on Nov 10, 2025
Fix flash attention error by using eager attention implementation 74fe23d aeb56 commited on Nov 10, 2025
Add all missing Kimi model dependencies (einops, triton, flash-linear-attention) 3f7365e aeb56 commited on Nov 10, 2025
Switch to transformers inference (vLLM doesn't support KimiLinear architecture) 9905f0a aeb56 commited on Nov 10, 2025
Improve vLLM startup with tensor parallelism, better logging, and 10min timeout a82de92 aeb56 commited on Nov 10, 2025
Fix vLLM server start command to use python3 instead of python 75c2813 aeb56 commited on Nov 10, 2025
Transform Space into professional inference UI for fine-tuned model 5e458c4 aeb56 commited on Nov 10, 2025
Implement manual LoRA merging to fix PEFT key naming conflicts 3a259bc aeb56 commited on Nov 10, 2025
Use sequential device_map to fix key naming conflicts during LoRA merge d3d4339 aeb56 commited on Nov 10, 2025
Add safe_merge and better error handling for LoRA merge with MoE models 79334bc aeb56 commited on Nov 10, 2025
Add 8-bit quantization support and switch to L4x4 hardware for availability e32298d aeb56 commited on Nov 10, 2025
Add L40Sx4 hardware requirement and optimize multi-GPU support 1443f5f aeb56 commited on Nov 10, 2025
Optimize app.py for 48B model on 4xL40S GPUs with multi-GPU support b51ac87 aeb56 commited on Nov 10, 2025
Update requirements.txt with comprehensive dependencies for Kimi model 9bb160e aeb56 commited on Nov 10, 2025