Where can I find the inference.py mentioned in the CLI Inference section?

#29

by Zebin01 - opened 14 days ago

Hi,
I noticed that inference.py is mentioned in the CLI Inference section of the documentation, but I couldn’t find this file in the repository.
Could you please point me to its location?

kencwt

Motif Technologies org 8 days ago

inference.py is outdated and we have since updated our documentation to reflect the latest changes.
Please refer to the latest model card or Diffusers documentation for the usages.

Zebin01

8 days ago

Thank you for your reply. I noticed that your work already supports Diffusers, and I really appreciate your effort.
I actually saw the usage of inference.py on this page:
https://huggingface.co/Motif-Technologies/Motif-Video-2B/blob/main/docs/gguf-sageattention.md#benchmark
I would like to know more about the logic behind use-sage-attention. It seems that this part has not yet been added to the Diffusers pipeline. If possible, could you share more details about how it is implemented?
Thanks again for your help!

kencwt

Motif Technologies org 8 days ago

That page is also outdated and we will update it later.

Since Sage Attention does not support attention_mask in its interface, before Diffusers integration we have to hot swap the attention logic to remove the padding tokens before attention.
With Diffusers integration attention backend is handled by the dispatch_attention_fn and you may simply swap the backend with pipeline.transformer.set_attention_backend(...).
In case you observe inconsistent results between SDPA and Sage Attention, consider removing the padding tokens after tokenization.

Documentation on Attention backends
https://huggingface.co/docs/diffusers/optimization/attention_backends

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment