Add enforce_eager=True to fix KV cache memory issue 1b7fbd7 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Reduce vLLM memory for A100 24GB compatibility b85b1e0 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity 14c59e5 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Reduce context/memory to minimize NCCL overhead on L4s 7d5c713 KinetoLabs Claude Opus 4.5 commited on 3 days ago
Align vLLM config with official Qwen3-VL model card b2fe3f4 KinetoLabs Claude Opus 4.5 commited on 3 days ago
Fix vLLM multi-GPU init: explicit dtype + higher mem util + eager mode 71a896b KinetoLabs Claude Opus 4.5 commited on 3 days ago
Force vLLM V0 engine + reduce max_model_len for stability 3c9a722 KinetoLabs Claude Opus 4.5 commited on 3 days ago
Fix vLLM Engine core initialization failed on multi-GPU ed575b1 KinetoLabs Claude Opus 4.5 commited on 3 days ago
Replace dual 8B with single 30B-A3B FP8 vision model 706520f KinetoLabs Claude Opus 4.5 commited on 3 days ago
Replace 30B MoE with dual 8B models (Thinking + Instruct) 333c083 KinetoLabs Claude Opus 4.5 commited on 3 days ago
Implement lazy model loading to prevent CUDA OOM on 4xL4 GPUs 5f0db1e KinetoLabs Claude Opus 4.5 commited on 3 days ago
Add property accessors to RealModelStack for interface parity c190082 KinetoLabs Claude Opus 4.5 commited on 3 days ago
Fix multi-GPU compatibility issues (6 locations) d1901ae KinetoLabs Claude Opus 4.5 commited on 3 days ago
Fix embedding/reranker loading with official Qwen3-VL classes 455c786 KinetoLabs Claude Opus 4.5 commited on 3 days ago
Fix critical model implementations and add sample scenarios f3ebc82 KinetoLabs Claude Opus 4.5 commited on 3 days ago