Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity 14c59e5 KinetoLabs Claude Opus 4.5 commited on 1 day ago
Reduce context/memory to minimize NCCL overhead on L4s 7d5c713 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Force vLLM V0 engine + reduce max_model_len for stability 3c9a722 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Frontend simplification (4→2 tabs) + lazy imports for HF Spaces 78caafb KinetoLabs Claude Opus 4.5 commited on 2 days ago
Replace dual 8B with single 30B-A3B FP8 vision model 706520f KinetoLabs Claude Opus 4.5 commited on 2 days ago
Reduce thinking model max_new_tokens to fix slow inference 0699c5f KinetoLabs Claude Opus 4.5 commited on 2 days ago
Replace 30B MoE with dual 8B models (Thinking + Instruct) 333c083 KinetoLabs Claude Opus 4.5 commited on 2 days ago
Fix critical model implementations and add sample scenarios f3ebc82 KinetoLabs Claude Opus 4.5 commited on 2 days ago