Inference speed

by tintwotin - opened May 27

May 27

Using Find on a 2 min. 1920x832 video takes: 459.15s on RTX 4090 - can anything be done to speed it up? Like downscaling the video beforehand? Or is a turbo version planned?

rethinkNow

Nemo Station org May 27

459s for a 2-min 1920×832 clip is on the slow end but expected at that resolution. Two things you can try:

Pre-downscale the video. 1920×832 is roughly 8× over the model's per-frame pixel budget (we cap at ~200K pixels via smart_resize internally). The internal resize handles it, but at decode cost. Downscaling to ~640×270 before sending to the model cuts the visual-encoder time substantially without hurting accuracy for grounding-style queries.
Quantise the weights. On a 4090, AWQ-quantised weights + bf16 KV-cache typically give 3-4× throughput vs vanilla bf16. We haven't shipped a quantized checkpoint ourselves yet, but you can do this in a half-hour with llm-compressor or AutoAWQ. If you do, we'd be curious what mIoU you get on TimeLens-Bench to compare against our bf16 numbers.

No "turbo" variant planned — the model is already 2B params, so the realistic speedup path is inference-side, not architectural.

tintwotin

May 28

I tried downscaling and it didn't help. I would like to add it to my Pallaidium AI add-on for Blender, but currently it is simply too slow for me. Will check in later to see if something has improved. Thank you.

tintwotin

24 days ago

•

edited 24 days ago

My attempt at SDNQ-int8 quantization (but I basically do not know what I'm doing) - it seems to just hang for me during inference (outside the built in test - script is included: https://huggingface.co/tintwotin/Marlin-2B-SDNQ-int8

1280x70 - 361 frames:
Marlin: captioning completed in 498.6s

Looks like nothing has been gained speedwise.

tintwotin

10 days ago

Getting Triton to work on Windows seems to add a bit speed to the sdnq weight. But it is still very slow.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment