Spaces:
Running
ZeroGPU base image (torch 2.11+cu130) โ no nvcc, no prebuilt wheel for causal-conv1d / mamba-ssm family
I'm running a Gradio Space on ZeroGPU (@spaces.GPU) for a hybrid linear-attention model (GatedDeltaNet path in Qwen3.5-MoE). Two related blockers:
1. Fast path can't be enabled
flash-linear-attention installs fine (Triton-only), but causal-conv1d is required for the fast prefill path in GatedDeltaNet. Without it, transformers logs:
The fast path is not available because one of the required library is not installed. Falling back to torch implementation.
Result: prefill on a 158-token prompt takes ~40 seconds on ZeroGPU vs ~250 ms for the same-vocab Qwen3.5-0.8B baseline.
2. Source build fails โ no nvcc in the build env
UserWarning: causal_conv1d was requested, but nvcc was not found.
torch.__version__ = 2.11.0+cu130
NameError: name 'bare_metal_version' is not defined
(That NameError is an upstream causal-conv1d setup.py bug โ when nvcc is missing, it warns but then crashes at line 176 because bare_metal_version is only assigned inside the nvcc branch. Filed separately at Dao-AILab/causal-conv1d.)
3. No prebuilt wheel for cu130 / torch 2.11 / py3.10
PyPI causal-conv1d doesn't ship a wheel for this combo, and upstream release wheels lag behind the bleeding-edge torch in the ZeroGPU base image.
Ask
- Either ship
nvccin the ZeroGPU base image (or a-develvariant), or - Coordinate with Dao-AILab on prebuilt wheels for the ZeroGPU torch/cuda combo, or
- Document the limitation prominently โ currently any model using GatedDeltaNet / Mamba / Mamba2 / similar CUDA paths is silently slow on ZeroGPU.
Repro: https://huggingface.co/spaces/kshitijthakkar/tracegenix-playground (build log shows the NameError; runtime shows the fallback warning and 40s TTFT).
Filed the upstream bare_metal_version NameError separately on Dao-AILab/causal-conv1d: https://github.com/Dao-AILab/causal-conv1d/issues/108