Instructions to use circlestone-labs/Anima with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusion Single File
How to use circlestone-labs/Anima with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Plans to upscale the parameters count?
Hi! I really like the model, however I wonder if the parameter count might be a limitation, because it is even smaller than Z-Image.
In the LLM community there is a fine practice to so-called RYS or "depth upscale" models, to give them better parameter count without re-training from scratch and preserving the most of the model's knowledge. These upscales can later be continue to be trained to get even more grasp on the subjects.
Example depth upscales: https://dnhkng.github.io/posts/rys/, DavidAU's fine-tunes like https://huggingface.co/DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF
Would you like to do this or have you already tried it and the model broke?
Great idea, as much as applying the DeepSeek-V4-Pro 1.6T A49B Text Encoder.
So the thing is that even the most minuscule changes to the text encoder (such as very low KLD abliteration or even "high quality" quants like q8/fp8) degrade the output quality because the model is trained on the very precise outputs the text encoder currently generates. So even if we end up with a significantly smarter text encoder, it would take very extensive and expensive training to take advantage of that.
Furthermore tdrussell said in the past that he believes the current text encoder is good enough and that he believes the model is being bottlenecked in other ways. (I believe he didn't elaborate further but sounds believable to me.)
And lastly Anima is also a weird Frankenstein model with qwen 0.6 outputs being mapped to underlying t5 with a tiny llm adapter duct taping both together. Which further complicates any such architecture change proposals.
@onixxexxd5555LOAF I meant to increase the Anima cosmos image transformer layer count, not touching the text encoder
In that case, don't get me wrong, I am interested in seeing weird experiments like these. Whether DiT equivalent of layer duplication trick can be done, whether it would be functional without or with minimal fine-tuning or how much it would help. But this model has been training for months and closer to finish line than it is to the beginning. It would be fairly crazy to do highly experimental architectural surgery at this point. But perhaps someone can tinker with these after the model is done training.
In that case, don't get me wrong, I am interested in seeing weird experiments like these. Whether DiT equivalent of layer duplication trick can be done, whether it would be functional without or with minimal fine-tuning or how much it would help.
@onixxexxd5555LOAF I think this has actually been done before, with the Flux.1-heavy model by city96. He says it is a self-merge over here, and it required a bit of training to "recover" from the self-merge.