Spaces:

optiviseapp
/

fnmodel

Paused

App Files Files Community

Commit History

Monkey-patch transformers to disable flash attention via wrapper script

2900b36

aeb56 commited on Nov 12, 2025

Workaround flash-attn: create fake module with PyTorch fallback attention

b705945

aeb56 commited on Nov 12, 2025

Add flash-attn dependency required by Kimi model

ef25cbe

aeb56 commited on Nov 12, 2025

Fix: Remove height parameter for Gradio 4.19.2 compatibility

0cefed5

aeb56 commited on Nov 12, 2025

Add live status table and improved logging with attn_implementation=eager fix

0b25a32

aeb56 commited on Nov 12, 2025

Trigger rebuild

d7f07c2

aeb56 commited on Nov 12, 2025

Fix multi-GPU: use parallelize=True instead of device_map, update env var

96b6724

aeb56 commited on Nov 12, 2025

Aggressive memory cleanup: 5s wait, env vars, optional model loading

3fb1215

aeb56 commited on Nov 12, 2025

Fix OOM: Unload model before evaluation to free VRAM for lm_eval

74f609c

aeb56 commited on Nov 12, 2025

Disable chat/inference, focus on evaluation only

69cd0c5

aeb56 commited on Nov 12, 2025

Update README.md

3e60f36
verified

optiviseapp commited on Nov 10, 2025

Add Evaluation tab with ARC-Challenge, TruthfulQA, and Winogrande benchmarks

29f5263

aeb56 commited on Nov 10, 2025

Fix flash attention error by patching model config to use eager attention

2f60fd7

aeb56 commited on Nov 10, 2025

Fix flash attention error by using eager attention implementation

74fe23d

aeb56 commited on Nov 10, 2025

Add all missing Kimi model dependencies (einops, triton, flash-linear-attention)

3f7365e

aeb56 commited on Nov 10, 2025

Add tiktoken dependency for Kimi model tokenizer

5015bb9

aeb56 commited on Nov 10, 2025

Switch to transformers inference (vLLM doesn't support KimiLinear architecture)

9905f0a

aeb56 commited on Nov 10, 2025

Improve vLLM startup with tensor parallelism, better logging, and 10min timeout

a82de92

aeb56 commited on Nov 10, 2025

Fix vLLM server start command to use python3 instead of python

75c2813

aeb56 commited on Nov 10, 2025

Remove emoji avatars incompatible with Gradio 4.19.2

5f01a47

aeb56 commited on Nov 10, 2025

Fix Gradio version compatibility and enable share mode

d073f8b

aeb56 commited on Nov 10, 2025

Switch to vLLM for high-performance, stable inference

310eb95

aeb56 commited on Nov 10, 2025

Fix variable scope error causing Internal Server Error

e62c736

aeb56 commited on Nov 10, 2025

Transform Space into professional inference UI for fine-tuned model

5e458c4

aeb56 commited on Nov 10, 2025

Implement manual LoRA merging to fix PEFT key naming conflicts

3a259bc

aeb56 commited on Nov 10, 2025

Use sequential device_map to fix key naming conflicts during LoRA merge

d3d4339

aeb56 commited on Nov 10, 2025

Add safe_merge and better error handling for LoRA merge with MoE models

79334bc

aeb56 commited on Nov 10, 2025

Fix 8-bit quantization CPU offload for large models

1a04e17

aeb56 commited on Nov 10, 2025

Fix UID 1000 user permission issue for Hugging Face Spaces

0dc6b10

aeb56 commited on Nov 10, 2025

Fix cache directory permissions issue

2352d4c

aeb56 commited on Nov 10, 2025

Add 8-bit quantization support and switch to L4x4 hardware for availability

e32298d

aeb56 commited on Nov 10, 2025

Add L40Sx4 hardware requirement and optimize multi-GPU support

1443f5f

aeb56 commited on Nov 10, 2025

Optimize app.py for 48B model on 4xL40S GPUs with multi-GPU support

b51ac87

aeb56 commited on Nov 10, 2025

Update requirements.txt with comprehensive dependencies for Kimi model

9bb160e

aeb56 commited on Nov 10, 2025

Initial commit: LoRA model merger

a951334

aeb56 commited on Nov 10, 2025

initial commit

7a80ad4
verified

optiviseapp commited on Nov 10, 2025