ZeroGPU base image (torch 2.11+cu130) โ€” no nvcc, no prebuilt wheel for causal-conv1d / mamba-ssm family

#175
by kshitijthakkar - opened

I'm running a Gradio Space on ZeroGPU (@spaces.GPU) for a hybrid linear-attention model (GatedDeltaNet path in Qwen3.5-MoE). Two related blockers:

1. Fast path can't be enabled

flash-linear-attention installs fine (Triton-only), but causal-conv1d is required for the fast prefill path in GatedDeltaNet. Without it, transformers logs:

The fast path is not available because one of the required library is not installed. Falling back to torch implementation.

Result: prefill on a 158-token prompt takes ~40 seconds on ZeroGPU vs ~250 ms for the same-vocab Qwen3.5-0.8B baseline.

2. Source build fails โ€” no nvcc in the build env

UserWarning: causal_conv1d was requested, but nvcc was not found.
torch.__version__  = 2.11.0+cu130
NameError: name 'bare_metal_version' is not defined

(That NameError is an upstream causal-conv1d setup.py bug โ€” when nvcc is missing, it warns but then crashes at line 176 because bare_metal_version is only assigned inside the nvcc branch. Filed separately at Dao-AILab/causal-conv1d.)

3. No prebuilt wheel for cu130 / torch 2.11 / py3.10

PyPI causal-conv1d doesn't ship a wheel for this combo, and upstream release wheels lag behind the bleeding-edge torch in the ZeroGPU base image.

Ask

  • Either ship nvcc in the ZeroGPU base image (or a -devel variant), or
  • Coordinate with Dao-AILab on prebuilt wheels for the ZeroGPU torch/cuda combo, or
  • Document the limitation prominently โ€” currently any model using GatedDeltaNet / Mamba / Mamba2 / similar CUDA paths is silently slow on ZeroGPU.

Repro: https://huggingface.co/spaces/kshitijthakkar/tracegenix-playground (build log shows the NameError; runtime shows the fallback warning and 40s TTFT).

Filed the upstream bare_metal_version NameError separately on Dao-AILab/causal-conv1d: https://github.com/Dao-AILab/causal-conv1d/issues/108

Sign up or log in to comment