Where can I find the inference.py mentioned in the CLI Inference section?

#29
by Zebin01 - opened

Hi,
I noticed that inference.py is mentioned in the CLI Inference section of the documentation, but I couldn’t find this file in the repository.
Could you please point me to its location?

Motif Technologies org

inference.py is outdated and we have since updated our documentation to reflect the latest changes.
Please refer to the latest model card or Diffusers documentation for the usages.

Thank you for your reply. I noticed that your work already supports Diffusers, and I really appreciate your effort.
I actually saw the usage of inference.py on this page:
https://huggingface.co/Motif-Technologies/Motif-Video-2B/blob/main/docs/gguf-sageattention.md#benchmark
I would like to know more about the logic behind use-sage-attention. It seems that this part has not yet been added to the Diffusers pipeline. If possible, could you share more details about how it is implemented?
Thanks again for your help!

Motif Technologies org

That page is also outdated and we will update it later.

Since Sage Attention does not support attention_mask in its interface, before Diffusers integration we have to hot swap the attention logic to remove the padding tokens before attention.
With Diffusers integration attention backend is handled by the dispatch_attention_fn and you may simply swap the backend with pipeline.transformer.set_attention_backend(...).
In case you observe inconsistent results between SDPA and Sage Attention, consider removing the padding tokens after tokenization.

Documentation on Attention backends
https://huggingface.co/docs/diffusers/optimization/attention_backends

Related Code:
https://github.com/huggingface/diffusers/blob/68a4847768c9a4e5e39307635ff2762ef2ef5d13/src/diffusers/models/transformers/transformer_motif_video.py#L90
https://github.com/huggingface/diffusers/blob/68a4847768c9a4e5e39307635ff2762ef2ef5d13/src/diffusers/models/transformers/transformer_motif_video.py#L187

Sign up or log in to comment