Buckets:
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. -->
Anima
Anima is a text-to-image model that reuses the CosmosTransformer3DModel with a Qwen3 text encoder, a T5-token text conditioner, and the AutoencoderKLQwenImage VAE.
import torch
from diffusers import ModularPipeline
pipe = ModularPipeline.from_pretrained("circlestone-labs/Anima-Base-v1.0-Diffusers")
pipe.load_components(torch_dtype=torch.bfloat16)
pipe.to("cuda")
image = pipe(prompt="masterpiece, best quality, 1girl, solo, city lights").images[0]
AnimaModularPipeline[[diffusers.AnimaModularPipeline]]
diffusers.AnimaModularPipeline[[diffusers.AnimaModularPipeline]]
A ModularPipeline for Anima.
> This is an experimental feature and is likely to change in the future.
AnimaAutoBlocks[[diffusers.AnimaAutoBlocks]]
diffusers.AnimaAutoBlocks[[diffusers.AnimaAutoBlocks]]
Auto Modular pipeline for text-to-image generation using Anima.
Supported workflows:
text2image: requiresprompt
Components:
text_encoder (Qwen3Model) tokenizer (Qwen2Tokenizer) t5_tokenizer (T5TokenizerFast) text_conditioner
(AnimaTextConditioner) guider (ClassifierFreeGuidance) transformer (CosmosTransformer3DModel) scheduler
(FlowMatchEulerDiscreteScheduler) vae (AutoencoderKLQwenImage) image_processor (VaeImageProcessor)
Inputs:
prompt (str):
The prompt or prompts to guide image generation.
negative_prompt (str, optional):
The prompt or prompts not to guide the image generation.
max_sequence_length (int, optional, defaults to 512):
Maximum sequence length for prompt encoding.
num_images_per_prompt (int, optional, defaults to 1):
The number of images to generate per prompt.
height (int, optional):
The height in pixels of the generated image.
width (int, optional):
The width in pixels of the generated image.
latents (Tensor, optional):
Pre-generated noisy latents for image generation.
generator (Generator, optional):
Torch generator for deterministic generation.
num_inference_steps (int, optional, defaults to 50):
The number of denoising steps.
sigmas (list, optional):
Custom sigmas for the denoising process.
**denoiser_input_fields (None, optional):
The conditional model inputs for the Anima denoiser.
output_type (str, optional, defaults to pil):
Output format: 'pil', 'np', 'pt'.
Outputs:
images (list):
Generated images.
AnimaTextConditioner[[diffusers.AnimaTextConditioner]]
diffusers.AnimaTextConditioner[[diffusers.AnimaTextConditioner]]
Text conditioner used by Anima to map Qwen3 hidden states and T5 token ids to Cosmos text embeddings.
Anima reuses the Cosmos Predict2 DiT. The only model-specific conditioning module is this LLM adapter, which
cross-attends from learned T5 token embeddings to Qwen3 text encoder hidden states before the diffusion loop.
target_dim is the conditioner output dimension and must match the transformer's text_embed_dim.
forwarddiffusers.AnimaTextConditioner.forwardhttps://github.com/huggingface/diffusers/blob/vr_13751/src/diffusers/models/condition_embedders/condition_embedder_anima.py#L285[{"name": "source_hidden_states", "val": ": Tensor"}, {"name": "target_input_ids", "val": ": Tensor"}, {"name": "target_attention_mask", "val": ": torch.Tensor | None = None"}, {"name": "source_attention_mask", "val": ": torch.Tensor | None = None"}]- source_hidden_states (torch.Tensor of shape (batch_size, source_sequence_length, source_dim)) --
Qwen3 text encoder hidden states to condition on.
- target_input_ids (
torch.Tensorof shape(batch_size, target_sequence_length)) -- T5 token ids used as learned query tokens. - target_attention_mask (
torch.Tensor, optional) -- Attention mask for the target T5 token ids. - source_attention_mask (
torch.Tensor, optional) -- Attention mask for the source Qwen3 hidden states.0torch.TensorText conditioning embeddings for the Cosmos transformer.
Parameters:
source_hidden_states (torch.Tensor of shape (batch_size, source_sequence_length, source_dim)) : Qwen3 text encoder hidden states to condition on.
target_input_ids (torch.Tensor of shape (batch_size, target_sequence_length)) : T5 token ids used as learned query tokens.
target_attention_mask (torch.Tensor, optional) : Attention mask for the target T5 token ids.
source_attention_mask (torch.Tensor, optional) : Attention mask for the source Qwen3 hidden states.
Returns:
torch.Tensor
Text conditioning embeddings for the Cosmos transformer.
Xet Storage Details
- Size:
- 5.66 kB
- Xet hash:
- 7ce051df3f148fa9e4b8ef69f74e036e3cc73921860c265e6205cdc993dd5c91
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.