Commit History

Add enforce_eager=True to fix KV cache memory issue
1b7fbd7

KinetoLabs Claude Opus 4.5 commited on

Reduce vLLM memory for A100 24GB compatibility
b85b1e0

KinetoLabs Claude Opus 4.5 commited on

Fix SessionState field name in samples.py
6fc2368

KinetoLabs Claude Opus 4.5 commited on

Fix Gradio 6.x Chatbot API: remove deprecated 'type' parameter
fbf7e70

KinetoLabs Claude Opus 4.5 commited on

Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity
14c59e5

KinetoLabs Claude Opus 4.5 commited on

Reduce context/memory to minimize NCCL overhead on L4s
7d5c713

KinetoLabs Claude Opus 4.5 commited on

Align vLLM config with official Qwen3-VL model card
b2fe3f4

KinetoLabs Claude Opus 4.5 commited on

Pin vLLM <0.13.0 to fix V1 engine hang on multi-GPU
a65b765

KinetoLabs Claude Opus 4.5 commited on

Fix vLLM multi-GPU init: explicit dtype + higher mem util + eager mode
71a896b

KinetoLabs Claude Opus 4.5 commited on

Force vLLM V0 engine + reduce max_model_len for stability
3c9a722

KinetoLabs Claude Opus 4.5 commited on

Fix vLLM Engine core initialization failed on multi-GPU
ed575b1

KinetoLabs Claude Opus 4.5 commited on

Frontend simplification (4→2 tabs) + lazy imports for HF Spaces
78caafb

KinetoLabs Claude Opus 4.5 commited on

Replace dual 8B with single 30B-A3B FP8 vision model
706520f

KinetoLabs Claude Opus 4.5 commited on

Reduce thinking model max_new_tokens to fix slow inference
0699c5f

KinetoLabs Claude Opus 4.5 commited on

Replace 30B MoE with dual 8B models (Thinking + Instruct)
333c083

KinetoLabs Claude Opus 4.5 commited on

MVP UI simplification: single room, 4 tabs
3b08f11

KinetoLabs Claude Opus 4.5 commited on

Implement lazy model loading to prevent CUDA OOM on 4xL4 GPUs
5f0db1e

KinetoLabs Claude Opus 4.5 commited on

Add property accessors to RealModelStack for interface parity
c190082

KinetoLabs Claude Opus 4.5 commited on

Fix multi-GPU compatibility issues (6 locations)
d1901ae

KinetoLabs Claude Opus 4.5 commited on

Fix multi-GPU support in vendored Qwen3-VL scripts
c4bfdfa

KinetoLabs Claude Opus 4.5 commited on

Fix embedding/reranker loading with official Qwen3-VL classes
455c786

KinetoLabs Claude Opus 4.5 commited on

Fix critical model implementations and add sample scenarios
f3ebc82

KinetoLabs Claude Opus 4.5 commited on

Trigger rebuild
8771f89

KinetoLabs commited on

Initial commit: FDAM AI Pipeline v4.0.1
88bdcff

KinetoLabs Claude Opus 4.5 commited on