ControlNet — conditional diffusion 🚧 not trained yet

Steer Stable Diffusion with a structure map (edges / pose / depth).

Status — documented recipe (placeholder). A production-grade pipeline from Ropedia Academy for an advanced, GPU-heavy task. Everything below — base model, objective, dataset, config, the exact evaluation — is specified; the weights / metrics / figures land here automatically when you run the notebook on a GPU (one click below). Try the trained models live in the Ropedia demos Space.

At a glance


Base model	SD 1.5 / SDXL + a ControlNet (pretrained)
Task	structure-conditioned image generation
Training objective	Structure-conditioned generation (edges / depth / pose) — inference.
Track	LM · Language & multimodal
Built on	huggingface/diffusers
Notebook
Compute / storage / time	GPU required — see the Compute · storage · time table in the notebook

Dataset

Source: Your condition maps + prompts.

Training config

GPU-scale — the notebook ships a demo profile (free Colab T4) and a full profile, with an exact Compute · storage · time table. Hyperparameters (optimizer, steps, batch, LoRA rank, …) are in the training cell.

Evaluation results

⏳ Pending — run the notebook on a GPU to fill this in. This lab reports condition fidelity (edge IoU / depth err) · CLIP score on a held-out split (see its Evaluate cell).

Inference example

No weights are published yet. After a GPU run, load the checkpoint/adapter the notebook saves (it also has a ready inference cell). Base model: SD 1.5 / SDXL + a ControlNet (pretrained).

How to fill this repo

Open the notebook in Colab → Runtime → GPU → Run all (runs the real pipeline).
Run its Publish to the Hugging Face Hub step (or HfApi().upload_folder(...)) — the checkpoint + metrics.json + figures replace this placeholder.

Train / run on a GPU · [ ] upload weights · [ ] add metrics.json · [ ] add figures · [ ] swap in the real results card

Limitations

Not yet trained — no numbers to report. The pipeline is GPU-heavy (see the compute table); on free Colab use the demo-scale settings. This is an educational, reproducible recipe, not a tuned production release.

License

Code: MIT (this repository). The base model (huggingface/diffusers) and dataset are each under their own licenses — check the upstream source before redistribution.

Citation

@misc{ropedia_academy,
  title  = {Ropedia Academy: an interactive course on embodied & spatial AI},
  author = {Ropedia Academy},
  year   = {2026},
  howpublished = {\url{https://chaoyue0307.github.io/ropedia-academy/}}
}

Method / original work: Zhang et al., ControlNet, ICCV 2023.