Aggressive memory cleanup: 5s wait, env vars, optional model loading 3fb1215 aeb56 commited on 30 days ago
Fix OOM: Unload model before evaluation to free VRAM for lm_eval 74f609c aeb56 commited on 30 days ago
Add Evaluation tab with ARC-Challenge, TruthfulQA, and Winogrande benchmarks 29f5263 aeb56 commited on Nov 10
Fix flash attention error by patching model config to use eager attention 2f60fd7 aeb56 commited on Nov 10
Switch to transformers inference (vLLM doesn't support KimiLinear architecture) 9905f0a aeb56 commited on Nov 10
Improve vLLM startup with tensor parallelism, better logging, and 10min timeout a82de92 aeb56 commited on Nov 10
Use sequential device_map to fix key naming conflicts during LoRA merge d3d4339 aeb56 commited on Nov 10
Add safe_merge and better error handling for LoRA merge with MoE models 79334bc aeb56 commited on Nov 10
Add 8-bit quantization support and switch to L4x4 hardware for availability e32298d aeb56 commited on Nov 10