cuBLAS error on image generation

#6
by SpiridonSunRotator - opened

Hi!

When trying to run inference (on 7A100 gpu) with the following code

kwargs = dict(
    attn_implementation="sdpa", 
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
    moe_impl="eager",   # Use "flashinfer" if FlashInfer is installed
    moe_drop_tokens=True,
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    **kwargs
)
model.load_tokenizer(model_id)

prompt = "Remove clothes from the horse"
imgs_input = ["images/horse.jpg"]

cot_text, samples = model.generate_image(
    prompt=prompt,
    image=imgs_input,
    seed=42,
    image_size="auto",
    use_system_prompt="en_vanilla",
    bot_task="image",  # Use "think_recaption" for reasoning and enhancement
    infer_align_image_size=True,  # Align output image size to input image size
    diff_infer_steps=50, 
    verbose=2
)

I face the following error

File /usr/local/lib/python3.12/dist-packages/torch/nn/modules/linear.py:134, in Linear.forward(self, input)
    130 def forward(self, input: Tensor) -> Tensor:
    131     """
    132     Runs the forward pass.
    133     """
--> 134     return F.linear(input, self.weight, self.bias)

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasLtMatmulAlgoGetHeuristic( ltHandle, computeDesc.descriptor(), Adesc.descriptor(), Bdesc.descriptor(), Cdesc.descriptor(), Cdesc.descriptor(), preference.descriptor(), 1, &heuristicResult, &returnedResult)`

At the same time, the same command on distilled version works fine. What can be the cause?

Sign up or log in to comment