ferrotorch/sd-v1-5-generation-trajectory

End-to-end SD-1.5 text-to-image generation trajectory pinned for the ferrotorch real-artifact parity harness (Phase F, #1163).

Provenance

Files

  • cond_embeds.bin โ€” [1, 77, 768] f32 CLIP text embedding of PROMPT = "a photograph of an astronaut riding a horse".
  • uncond_embeds.bin โ€” [1, 77, 768] f32 CLIP text embedding of the empty negative prompt.
  • init_latent.bin โ€” [1, 4, 64, 64] f32 Gaussian noise drawn via torch.Generator(device='cpu').manual_seed(42).randn. The rust pipeline reads this file directly because the rust PRNG (rand::StdRng) does not match torch.Generator.
  • final_image.bin โ€” [1, 3, 512, 512] f32 decoded image in [-1, 1] from pipe.vae.decode(latent / 0.18215).sample.
  • step_K_noise_pred_uncond.bin โ€” [1, 4, 64, 64] f32 UNet forward pass with the unconditional embedding, for K=0..3.
  • step_K_noise_pred_cond.bin โ€” same but with the conditional embedding.
  • step_K_guided_noise.bin โ€” noise_uncond + 7.5 * (noise_cond - noise_uncond).
  • step_K_latent_after.bin โ€” latent after the scheduler step, i.e. the input to step K+1 (or the VAE for the final step).
  • meta.json โ€” prompt, negative prompt, seed, step count, guidance scale, and the exact timestep list.
  • bundle.tar โ€” single-file convenience archive carrying every fixture above (so the registry pin has one SHA-256 to track).

Settings

  • prompt = "a photograph of an astronaut riding a horse"
  • negative_prompt = ""
  • seed = 42
  • num_inference_steps = 4
  • guidance_scale = 7.5
  • scheduler = DDIMScheduler (scaled_linear, beta_start=0.00085, beta_end=0.012, clip_sample=False, set_alpha_to_one=False, prediction_type="epsilon", timestep_spacing="leading", steps_offset=1)
  • timesteps = [751, 501, 251, 1]

How the rust side consumes this

The rust dump example ferrotorch-diffusion/examples/sd_pipeline_dump.rs loads the three sub-models from ferrotorch/sd-v1-5-{clip-text-encoder,unet,vae-decoder}, loads init_latent.bin and the two text embeddings from this mirror (so the rustโ†”torch PRNG mismatch and tokenizer absence are routed around), runs the same 4-step CFG loop with a rust DDIMScheduler whose constants mirror diffusers byte-for-byte, and dumps the equivalent intermediates. The python harness scripts/verify_sd_pipeline_inference.py then compares each rust intermediate against the corresponding file shipped here, per-stage tolerances.

Upstream license

Stable Diffusion v1.5 is distributed under the CreativeML Open RAIL-M license. This pipeline-trajectory bundle inherits that license โ€” see https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/LICENSE for the full terms.

Downloads last month
3,086
Safetensors
Model size
1 params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support