Add enforce_eager=True to fix KV cache memory issue 1b7fbd7 KinetoLabs Claude Opus 4.5 commited on 1 day ago
Reduce vLLM memory for A100 24GB compatibility b85b1e0 KinetoLabs Claude Opus 4.5 commited on 1 day ago
Fix Gradio 6.x Chatbot API: remove deprecated 'type' parameter fbf7e70 KinetoLabs Claude Opus 4.5 commited on 1 day ago
Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity 14c59e5 KinetoLabs Claude Opus 4.5 commited on 1 day ago
Reduce context/memory to minimize NCCL overhead on L4s 7d5c713 KinetoLabs Claude Opus 4.5 commited on 1 day ago
Align vLLM config with official Qwen3-VL model card b2fe3f4 KinetoLabs Claude Opus 4.5 commited on 1 day ago
Pin vLLM <0.13.0 to fix V1 engine hang on multi-GPU a65b765 KinetoLabs Claude Opus 4.5 commited on 1 day ago
Fix vLLM multi-GPU init: explicit dtype + higher mem util + eager mode 71a896b KinetoLabs Claude Opus 4.5 commited on 1 day ago
Force vLLM V0 engine + reduce max_model_len for stability 3c9a722 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Fix vLLM Engine core initialization failed on multi-GPU ed575b1 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Frontend simplification (4→2 tabs) + lazy imports for HF Spaces 78caafb KinetoLabs Claude Opus 4.5 commited on 2 days ago
Replace dual 8B with single 30B-A3B FP8 vision model 706520f KinetoLabs Claude Opus 4.5 commited on 2 days ago
Reduce thinking model max_new_tokens to fix slow inference 0699c5f KinetoLabs Claude Opus 4.5 commited on 2 days ago
Replace 30B MoE with dual 8B models (Thinking + Instruct) 333c083 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Implement lazy model loading to prevent CUDA OOM on 4xL4 GPUs 5f0db1e KinetoLabs Claude Opus 4.5 commited on 2 days ago
Add property accessors to RealModelStack for interface parity c190082 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Fix multi-GPU compatibility issues (6 locations) d1901ae KinetoLabs Claude Opus 4.5 commited on 2 days ago
Fix multi-GPU support in vendored Qwen3-VL scripts c4bfdfa KinetoLabs Claude Opus 4.5 commited on 2 days ago
Fix embedding/reranker loading with official Qwen3-VL classes 455c786 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Fix critical model implementations and add sample scenarios f3ebc82 KinetoLabs Claude Opus 4.5 commited on 2 days ago