Spaces:

KinetoLabs
/

SmokeScan

Paused

App Files Files Community

SmokeScan

Commit History

Add enforce_eager=True to fix KV cache memory issue

1b7fbd7

KinetoLabs Claude Opus 4.5 commited on Jan 11

Reduce vLLM memory for A100 24GB compatibility

b85b1e0

KinetoLabs Claude Opus 4.5 commited on Jan 11

Fix SessionState field name in samples.py

6fc2368

KinetoLabs Claude Opus 4.5 commited on Jan 11

Fix Gradio 6.x Chatbot API: remove deprecated 'type' parameter

fbf7e70

KinetoLabs Claude Opus 4.5 commited on Jan 11

Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity

14c59e5

KinetoLabs Claude Opus 4.5 commited on Jan 11

Reduce context/memory to minimize NCCL overhead on L4s

7d5c713

KinetoLabs Claude Opus 4.5 commited on Jan 11

Align vLLM config with official Qwen3-VL model card

b2fe3f4

KinetoLabs Claude Opus 4.5 commited on Jan 11

Pin vLLM <0.13.0 to fix V1 engine hang on multi-GPU

a65b765

KinetoLabs Claude Opus 4.5 commited on Jan 11

Fix vLLM multi-GPU init: explicit dtype + higher mem util + eager mode

71a896b

KinetoLabs Claude Opus 4.5 commited on Jan 11

Force vLLM V0 engine + reduce max_model_len for stability

3c9a722

KinetoLabs Claude Opus 4.5 commited on Jan 11

Fix vLLM Engine core initialization failed on multi-GPU

ed575b1

KinetoLabs Claude Opus 4.5 commited on Jan 11

Frontend simplification (4→2 tabs) + lazy imports for HF Spaces

78caafb

KinetoLabs Claude Opus 4.5 commited on Jan 11

Replace dual 8B with single 30B-A3B FP8 vision model

706520f

KinetoLabs Claude Opus 4.5 commited on Jan 11

Reduce thinking model max_new_tokens to fix slow inference

0699c5f

KinetoLabs Claude Opus 4.5 commited on Jan 11

Replace 30B MoE with dual 8B models (Thinking + Instruct)

333c083

KinetoLabs Claude Opus 4.5 commited on Jan 11

MVP UI simplification: single room, 4 tabs

3b08f11

KinetoLabs Claude Opus 4.5 commited on Jan 10

Implement lazy model loading to prevent CUDA OOM on 4xL4 GPUs

5f0db1e

KinetoLabs Claude Opus 4.5 commited on Jan 10

Add property accessors to RealModelStack for interface parity

c190082

KinetoLabs Claude Opus 4.5 commited on Jan 10

Fix multi-GPU compatibility issues (6 locations)

d1901ae

KinetoLabs Claude Opus 4.5 commited on Jan 10

Fix multi-GPU support in vendored Qwen3-VL scripts

c4bfdfa

KinetoLabs Claude Opus 4.5 commited on Jan 10

Fix embedding/reranker loading with official Qwen3-VL classes

455c786

KinetoLabs Claude Opus 4.5 commited on Jan 10

Fix critical model implementations and add sample scenarios

f3ebc82

KinetoLabs Claude Opus 4.5 commited on Jan 10

Trigger rebuild

8771f89

KinetoLabs commited on Jan 10

Initial commit: FDAM AI Pipeline v4.0.1

88bdcff

KinetoLabs Claude Opus 4.5 commited on Jan 10

Commit History

Add enforce_eager=True to fix KV cache memory issue 1b7fbd7

Reduce vLLM memory for A100 24GB compatibility b85b1e0

Fix SessionState field name in samples.py 6fc2368

Fix Gradio 6.x Chatbot API: remove deprecated 'type' parameter fbf7e70

Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity 14c59e5

Reduce context/memory to minimize NCCL overhead on L4s 7d5c713

Align vLLM config with official Qwen3-VL model card b2fe3f4

Pin vLLM <0.13.0 to fix V1 engine hang on multi-GPU a65b765

Fix vLLM multi-GPU init: explicit dtype + higher mem util + eager mode 71a896b

Force vLLM V0 engine + reduce max_model_len for stability 3c9a722

Fix vLLM Engine core initialization failed on multi-GPU ed575b1

Frontend simplification (4→2 tabs) + lazy imports for HF Spaces 78caafb

Replace dual 8B with single 30B-A3B FP8 vision model 706520f

Reduce thinking model max_new_tokens to fix slow inference 0699c5f

Replace 30B MoE with dual 8B models (Thinking + Instruct) 333c083

MVP UI simplification: single room, 4 tabs 3b08f11

Implement lazy model loading to prevent CUDA OOM on 4xL4 GPUs 5f0db1e

Add property accessors to RealModelStack for interface parity c190082

Fix multi-GPU compatibility issues (6 locations) d1901ae

Fix multi-GPU support in vendored Qwen3-VL scripts c4bfdfa

Fix embedding/reranker loading with official Qwen3-VL classes 455c786

Fix critical model implementations and add sample scenarios f3ebc82

Trigger rebuild 8771f89

Initial commit: FDAM AI Pipeline v4.0.1 88bdcff

Add enforce_eager=True to fix KV cache memory issue

1b7fbd7

Reduce vLLM memory for A100 24GB compatibility

b85b1e0

Fix SessionState field name in samples.py

6fc2368

Fix Gradio 6.x Chatbot API: remove deprecated 'type' parameter

fbf7e70

Switch to Qwen3-VL-4B-Thinking for single-GPU simplicity

14c59e5

Reduce context/memory to minimize NCCL overhead on L4s

7d5c713

Align vLLM config with official Qwen3-VL model card

b2fe3f4

Pin vLLM <0.13.0 to fix V1 engine hang on multi-GPU

a65b765

Fix vLLM multi-GPU init: explicit dtype + higher mem util + eager mode

71a896b

Force vLLM V0 engine + reduce max_model_len for stability

3c9a722

Fix vLLM Engine core initialization failed on multi-GPU

ed575b1

Frontend simplification (4→2 tabs) + lazy imports for HF Spaces

78caafb

Replace dual 8B with single 30B-A3B FP8 vision model

706520f

Reduce thinking model max_new_tokens to fix slow inference

0699c5f

Replace 30B MoE with dual 8B models (Thinking + Instruct)

333c083

MVP UI simplification: single room, 4 tabs

3b08f11

Implement lazy model loading to prevent CUDA OOM on 4xL4 GPUs

5f0db1e

Add property accessors to RealModelStack for interface parity

c190082

Fix multi-GPU compatibility issues (6 locations)

d1901ae

Fix multi-GPU support in vendored Qwen3-VL scripts

c4bfdfa

Fix embedding/reranker loading with official Qwen3-VL classes

455c786

Fix critical model implementations and add sample scenarios

f3ebc82

Trigger rebuild

8771f89

Initial commit: FDAM AI Pipeline v4.0.1

88bdcff