Instructions to use circlestone-labs/Anima with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusion Single File
How to use circlestone-labs/Anima with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
NVFP4 / MXFP4 / FP8 quantizations for faster inference
The current Anima model is only available in BF16 format and it's over twice as slow compared to Illustrious SDXL even on my RTX 5070 Ti:
At 4-bit and 8-bit quantization, it could not only fit into lower VRAM but on certain hardware (like RTX 50xx which supports NVFP4) it's hardware-accelerated and can make generation up to 2x faster for 8-bit and 4x faster for 4-bit. While there will always be very minor quality loss, this quantization enables the use of negative prompts, unlike the Turbo LoRA which nullifies them.
FLUX.2 has such an approach officially available: https://huggingface.co/collections/black-forest-labs/flux2
Also could work with Nunchaku on this, they have their own super effective FP4 quantization method for models like FLUX, Z-Image, and Qwen-Image: https://github.com/nunchaku-ai/nunchaku
About FP8/MXFP8 model, I couldn't use torch.compile and it means it's slower than BF16.
To use torch.compile on FP8/MXFP8 models, use TorchCompileModelAdvanced from KJNodes, and set to max-autotune-no-cudagraphs mode and dynamic to false.
Try INT8 model with INT8-Fast custom node. It seems best way to boost the generation speed (and it works on turing and ampere GPUs)
About NVFP4, I couldn't satisfied it's quality.
nunchaku maybe good to try, because it calibrate, but not sure usual people not easily doing it (it needs resources=money and time) and post-training means the artist tags works differently.