AtteConDA-SDE-Scratch-30K

Model Summary

AtteConDA-SDE-Scratch-30K is a checkpoint from the AtteConDA series (Attention-based Condition Disambiguation Architecture).

The AtteConDA series targets controllable image generation and synthetic data augmentation for autonomous-driving scenes using three local conditions:

semantic segmentation
depth
edge

SDE in the repository name denotes this Semantic-segmentation + Depth + Edge condition set.

This repository contains the 30K-step scratch-initialized AtteConDA SDE variant.

Upstream Foundations and Provenance

This series is built on two upstream bases:

Uni-ControlNet as the architectural/code reference for composable local/global control.
Stable Diffusion v1.5 as the latent diffusion foundation model.

Repository / upstream references:

Uni-ControlNet: https://github.com/ShihaoZhaoZSH/Uni-ControlNet
Stable Diffusion v1.5: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
Cityscapes: https://www.cityscapes-dataset.com/
GTA5 (Playing for Data): https://download.visinf.tu-darmstadt.de/data/from_games/
nuImages: https://www.nuscenes.org/nuimages
BDD100K / BDD10K portal: https://bdd-data.berkeley.edu/

Checkpoint Status

Repository name: AtteConDA-SDE-Scratch-30K
Stage: Fine-tuned scratch-initialized variant
Condition set: SDE = semantic segmentation + depth + edge
PAM status: PAM is not used.

The trainable local-control branch was randomly initialized rather than using the AtteConDA UniCon initialization checkpoint. The architecture family still follows the Uni-ControlNet-style design and Stable Diffusion v1.5 backbone setting.

Files in This Repository

This repository is intended to contain:

model weight file(s), e.g. *.ckpt or *.safetensors
README.md (this model card)
LICENSE (repository-specific distribution notice)

Important:

This release is described as a weights-focused repository.
The matching config file is not bundled here at the moment.
To run the checkpoint, use the companion project codebase and the matching config from that codebase.

Training Data

The trained AtteConDA variants in this release use the following training datasets:

BDD10K semantic segmentation subset: 8,000 images (train 7,000 + val 1,000)
Cityscapes train/val: 3,475 images (train 2,975 + val 500)
GTA5: 24,966 images
nuImages (front camera subset): 18,368 images
BDD100K (excluding BDD10K overlap): 92,000 images

Total training images used by the trained variants: 146,809

Not used for training: Waymo
Waymo is used only for evaluation in this release series.

Evaluation Data

This release series uses a Waymo front-camera evaluation subset only for evaluation.

Evaluation-set notes:

Waymo images are not part of training
evaluation subset size: 3,048 images
construction policy in the project materials: front-camera images extracted from the first / middle / last positions of segments

Training Procedure

Fine-tuning steps: 30K
Optimizer: AdamW
Learning rate: 1e-5
Batch size: 4
Resolution: 512 x 512
Frozen components: Stable Diffusion denoising backbone, VAE, and text encoder
Trainable focus: local control branch
Initialization difference: local control branch randomly initialized

Common project-side generation/evaluation settings for trained variants:

guidance backbone family: Stable Diffusion 1.5 latent diffusion
conditioning family: Uni-ControlNet-style controllable diffusion design
inference sampler used in project evaluation: DDIM
DDIM steps used in project evaluation: 50
intended domain: autonomous-driving scene appearance modification while preserving scene structure

Quantitative Results

The following quantitative results were reported for this 30K scratch-initialized variant under the project evaluation protocol:

Metric	Value
Semantic Segmentation mIoU ↑	0.2445
Depth RMSE ↓	40.41
Edge L1 Error ↓	0.03759
Object Preservation F1 ↑	0.0455
Diversity (1 - MS-SSIM) ↑	0.8450
Reality (CLIP-CMMD) ↓	0.1827
Text Alignment (R-Precision@1) ↑	0.2894

Intended Use

This repository is intended for:

research on controllable diffusion models
research on multi-condition generation
research on synthetic data augmentation for autonomous-driving perception and reasoning tasks
ablation studies on initialization, training steps, and PAM effects
reproducible comparison across AtteConDA variants

Out-of-Scope Use

This repository is not intended for:

commercial deployment
customer-facing or production systems
safety-critical decision making
real-world vehicle operation or vehicle assistance
any use that violates upstream model terms or dataset terms

Known Limitations

Known limitations of this release family include:

possible structural failures on small distant objects
possible distortion or disappearance of vehicles, traffic signs, or thin structures in difficult regions
possible imperfect preservation of text on signboards
evaluation is based on external projection models rather than full human relabeling
not yet a guarantee of downstream task improvement for every autonomous-driving task
current resolution and backbone scale may limit very fine-grained detail preservation

Bias, Domain Shift, and Generalization Notes

These checkpoints are trained on a mixture of road-scene datasets and should be treated as domain-dependent research artifacts. They may reflect:

geographic bias
weather / time imbalance
dataset-specific annotation conventions
camera viewpoint bias
urban-scene category bias

Generalization outside the project setting must not be assumed.

Licensing and Use Restrictions

Do not label this repository as MIT.

Why:

the Uni-ControlNet code repository is MIT-licensed, but
this checkpoint family is built on Stable Diffusion v1.5 and
Stable Diffusion v1.5 derivatives carry CreativeML Open RAIL-M obligations, while
multiple training datasets in this project are distributed under non-commercial and/or research-oriented terms.

Accordingly, this repository uses:

license: other in the Hugging Face metadata
a repository-root LICENSE file named AtteConDA Research-Only License

Practical summary:

non-commercial research, teaching, scientific publication, and personal experimentation only
preserve repository notices
do not relax restrictions when redistributing
comply with the upstream Stable Diffusion and dataset terms as well

Citation

If you use this repository, please cite the AtteConDA work and the upstream bases.

AtteConDA / thesis-level citation

@misc{noguchi2026atteconda,
  author = {Shogo Noguchi},
  title = {条件競合を抑制する注意機構に基づく多条件拡散モデルによる合成データ拡張フレームワーク},
  year = {2026},
  note = {Bachelor thesis, Gunma University}
}

Upstream references

@inproceedings{zhao2023unicontrolnet,
  title={Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models},
  author={Zhao, Shihao and others},
  booktitle={NeurIPS},
  year={2023}
}

@inproceedings{rombach2022high,
  title={High-Resolution Image Synthesis with Latent Diffusion Models},
  author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bjorn},
  booktitle={CVPR},
  year={2022}
}

Acknowledgements

This repository acknowledges the upstream foundations and datasets used in the AtteConDA project:

Uni-ControlNet
Stable Diffusion v1.5
BDD10K / BDD100K
Cityscapes
GTA5 (Playing for Data)
nuImages

Waymo is acknowledged as an evaluation dataset only for this release series and was not used for training.

Release Notes

This model card was written conservatively to avoid over-claiming. If you later publish exact benchmark tables, official project URLs, or bundled configs, update this card accordingly.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shogo-Noguchi/AtteConDA-SDE-Scratch-30K

Base model

stable-diffusion-v1-5/stable-diffusion-v1-5

Finetuned

(385)

this model

Collection including Shogo-Noguchi/AtteConDA-SDE-Scratch-30K

AtteConDA

Collection

Trained AtteConDA checkpoints • 6 items • Updated Mar 25