`ferrotorch/sd-v1-5-generation-trajectory`

End-to-end SD-1.5 text-to-image generation trajectory pinned for the ferrotorch real-artifact parity harness (Phase F, #1163).

Provenance

Upstream model: runwayml/stable-diffusion-v1-5 (StableDiffusionPipeline composed from text_encoder/, unet/, vae/ subfolders, plus DDIMScheduler.from_config(pipe.scheduler.config)).
Conversion script: scripts/pin_pretrained_sd_pipeline.py.
Ferrotorch issue: https://github.com/dollspace-gay/ferrotorch/issues/1163.
SHA-256 of bundle.tar (pinned in ferrotorch-hub/src/registry.rs): 5fa7bd809e3aaa120a79c744801de44342a2e22ab82137cd5fe0d43302924c6e.

Files

cond_embeds.bin — [1, 77, 768] f32 CLIP text embedding of PROMPT = "a photograph of an astronaut riding a horse".
uncond_embeds.bin — [1, 77, 768] f32 CLIP text embedding of the empty negative prompt.
init_latent.bin — [1, 4, 64, 64] f32 Gaussian noise drawn via torch.Generator(device='cpu').manual_seed(42).randn. The rust pipeline reads this file directly because the rust PRNG (rand::StdRng) does not match torch.Generator.
final_image.bin — [1, 3, 512, 512] f32 decoded image in [-1, 1] from pipe.vae.decode(latent / 0.18215).sample.
step_K_noise_pred_uncond.bin — [1, 4, 64, 64] f32 UNet forward pass with the unconditional embedding, for K=0..3.
step_K_noise_pred_cond.bin — same but with the conditional embedding.
step_K_guided_noise.bin — noise_uncond + 7.5 * (noise_cond - noise_uncond).
step_K_latent_after.bin — latent after the scheduler step, i.e. the input to step K+1 (or the VAE for the final step).
meta.json — prompt, negative prompt, seed, step count, guidance scale, and the exact timestep list.
bundle.tar — single-file convenience archive carrying every fixture above (so the registry pin has one SHA-256 to track).

Settings

prompt = "a photograph of an astronaut riding a horse"
negative_prompt = ""
seed = 42
num_inference_steps = 4
guidance_scale = 7.5
scheduler = DDIMScheduler (scaled_linear, beta_start=0.00085, beta_end=0.012, clip_sample=False, set_alpha_to_one=False, prediction_type="epsilon", timestep_spacing="leading", steps_offset=1)
timesteps = [751, 501, 251, 1]

How the rust side consumes this

The rust dump example ferrotorch-diffusion/examples/sd_pipeline_dump.rs loads the three sub-models from ferrotorch/sd-v1-5-{clip-text-encoder,unet,vae-decoder}, loads init_latent.bin and the two text embeddings from this mirror (so the rust↔torch PRNG mismatch and tokenizer absence are routed around), runs the same 4-step CFG loop with a rust DDIMScheduler whose constants mirror diffusers byte-for-byte, and dumps the equivalent intermediates. The python harness scripts/verify_sd_pipeline_inference.py then compares each rust intermediate against the corresponding file shipped here, per-stage tolerances.

Upstream license

Stable Diffusion v1.5 is distributed under the CreativeML Open RAIL-M license. This pipeline-trajectory bundle inherits that license — see https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/LICENSE for the full terms.

Downloads last month: 3,086

Safetensors

Model size

1 params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support