Commit History

Monkey-patch transformers to disable flash attention via wrapper script
2900b36

aeb56 commited on

Workaround flash-attn: create fake module with PyTorch fallback attention
b705945

aeb56 commited on

Add flash-attn dependency required by Kimi model
ef25cbe

aeb56 commited on

Fix: Remove height parameter for Gradio 4.19.2 compatibility
0cefed5

aeb56 commited on

Add live status table and improved logging with attn_implementation=eager fix
0b25a32

aeb56 commited on

Trigger rebuild
d7f07c2

aeb56 commited on

Fix multi-GPU: use parallelize=True instead of device_map, update env var
96b6724

aeb56 commited on

Aggressive memory cleanup: 5s wait, env vars, optional model loading
3fb1215

aeb56 commited on

Fix OOM: Unload model before evaluation to free VRAM for lm_eval
74f609c

aeb56 commited on

Disable chat/inference, focus on evaluation only
69cd0c5

aeb56 commited on

Update README.md
3e60f36
verified

optiviseapp commited on

Add Evaluation tab with ARC-Challenge, TruthfulQA, and Winogrande benchmarks
29f5263

aeb56 commited on

Fix flash attention error by patching model config to use eager attention
2f60fd7

aeb56 commited on

Fix flash attention error by using eager attention implementation
74fe23d

aeb56 commited on

Add all missing Kimi model dependencies (einops, triton, flash-linear-attention)
3f7365e

aeb56 commited on

Add tiktoken dependency for Kimi model tokenizer
5015bb9

aeb56 commited on

Switch to transformers inference (vLLM doesn't support KimiLinear architecture)
9905f0a

aeb56 commited on

Improve vLLM startup with tensor parallelism, better logging, and 10min timeout
a82de92

aeb56 commited on

Fix vLLM server start command to use python3 instead of python
75c2813

aeb56 commited on

Remove emoji avatars incompatible with Gradio 4.19.2
5f01a47

aeb56 commited on

Fix Gradio version compatibility and enable share mode
d073f8b

aeb56 commited on

Switch to vLLM for high-performance, stable inference
310eb95

aeb56 commited on

Fix variable scope error causing Internal Server Error
e62c736

aeb56 commited on

Transform Space into professional inference UI for fine-tuned model
5e458c4

aeb56 commited on

Implement manual LoRA merging to fix PEFT key naming conflicts
3a259bc

aeb56 commited on

Use sequential device_map to fix key naming conflicts during LoRA merge
d3d4339

aeb56 commited on

Add safe_merge and better error handling for LoRA merge with MoE models
79334bc

aeb56 commited on

Fix 8-bit quantization CPU offload for large models
1a04e17

aeb56 commited on

Fix UID 1000 user permission issue for Hugging Face Spaces
0dc6b10

aeb56 commited on

Fix cache directory permissions issue
2352d4c

aeb56 commited on

Add 8-bit quantization support and switch to L4x4 hardware for availability
e32298d

aeb56 commited on

Add L40Sx4 hardware requirement and optimize multi-GPU support
1443f5f

aeb56 commited on

Optimize app.py for 48B model on 4xL40S GPUs with multi-GPU support
b51ac87

aeb56 commited on

Update requirements.txt with comprehensive dependencies for Kimi model
9bb160e

aeb56 commited on

Initial commit: LoRA model merger
a951334

aeb56 commited on

initial commit
7a80ad4
verified

optiviseapp commited on