Instructions to use ferrotorch/sd-v1-5-generation-trajectory with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ferrotorch/sd-v1-5-generation-trajectory with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ferrotorch/sd-v1-5-generation-trajectory", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("ferrotorch/sd-v1-5-generation-trajectory", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]
ferrotorch/sd-v1-5-generation-trajectory
End-to-end SD-1.5 text-to-image generation trajectory pinned for the ferrotorch real-artifact parity harness (Phase F, #1163).
Provenance
- Upstream model:
runwayml/stable-diffusion-v1-5(StableDiffusionPipelinecomposed fromtext_encoder/,unet/,vae/subfolders, plusDDIMScheduler.from_config(pipe.scheduler.config)). - Conversion script:
scripts/pin_pretrained_sd_pipeline.py. - Ferrotorch issue: https://github.com/dollspace-gay/ferrotorch/issues/1163.
- SHA-256 of
bundle.tar(pinned inferrotorch-hub/src/registry.rs):5fa7bd809e3aaa120a79c744801de44342a2e22ab82137cd5fe0d43302924c6e.
Files
cond_embeds.binโ[1, 77, 768]f32 CLIP text embedding ofPROMPT = "a photograph of an astronaut riding a horse".uncond_embeds.binโ[1, 77, 768]f32 CLIP text embedding of the empty negative prompt.init_latent.binโ[1, 4, 64, 64]f32 Gaussian noise drawn viatorch.Generator(device='cpu').manual_seed(42).randn. The rust pipeline reads this file directly because the rust PRNG (rand::StdRng) does not matchtorch.Generator.final_image.binโ[1, 3, 512, 512]f32 decoded image in[-1, 1]frompipe.vae.decode(latent / 0.18215).sample.step_K_noise_pred_uncond.binโ[1, 4, 64, 64]f32 UNet forward pass with the unconditional embedding, forK=0..3.step_K_noise_pred_cond.binโ same but with the conditional embedding.step_K_guided_noise.binโnoise_uncond + 7.5 * (noise_cond - noise_uncond).step_K_latent_after.binโ latent after the scheduler step, i.e. the input to stepK+1(or the VAE for the final step).meta.jsonโ prompt, negative prompt, seed, step count, guidance scale, and the exact timestep list.bundle.tarโ single-file convenience archive carrying every fixture above (so the registry pin has one SHA-256 to track).
Settings
prompt = "a photograph of an astronaut riding a horse"negative_prompt = ""seed = 42num_inference_steps = 4guidance_scale = 7.5scheduler = DDIMScheduler(scaled_linear, beta_start=0.00085, beta_end=0.012, clip_sample=False, set_alpha_to_one=False, prediction_type="epsilon", timestep_spacing="leading", steps_offset=1)timesteps = [751, 501, 251, 1]
How the rust side consumes this
The rust dump example
ferrotorch-diffusion/examples/sd_pipeline_dump.rs
loads the three sub-models from ferrotorch/sd-v1-5-{clip-text-encoder,unet,vae-decoder},
loads init_latent.bin and the two text embeddings from this
mirror (so the rustโtorch PRNG mismatch and tokenizer absence
are routed around), runs the same 4-step CFG loop with a rust
DDIMScheduler whose constants mirror diffusers byte-for-byte,
and dumps the equivalent intermediates. The python harness
scripts/verify_sd_pipeline_inference.py
then compares each rust intermediate against the corresponding
file shipped here, per-stage tolerances.
Upstream license
Stable Diffusion v1.5 is distributed under the CreativeML Open RAIL-M license. This pipeline-trajectory bundle inherits that license โ see https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/LICENSE for the full terms.
- Downloads last month
- 3,086