AtteConDA-SDE-Scratch-30K
Model Summary
AtteConDA-SDE-Scratch-30K is a checkpoint from the AtteConDA series (Attention-based Condition Disambiguation Architecture).
The AtteConDA series targets controllable image generation and synthetic data augmentation for autonomous-driving scenes using three local conditions:
- semantic segmentation
- depth
- edge
SDE in the repository name denotes this Semantic-segmentation + Depth + Edge condition set.
This repository contains the 30K-step scratch-initialized AtteConDA SDE variant.
Upstream Foundations and Provenance
This series is built on two upstream bases:
- Uni-ControlNet as the architectural/code reference for composable local/global control.
- Stable Diffusion v1.5 as the latent diffusion foundation model.
Repository / upstream references:
- Uni-ControlNet: https://github.com/ShihaoZhaoZSH/Uni-ControlNet
- Stable Diffusion v1.5: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
- Cityscapes: https://www.cityscapes-dataset.com/
- GTA5 (Playing for Data): https://download.visinf.tu-darmstadt.de/data/from_games/
- nuImages: https://www.nuscenes.org/nuimages
- BDD100K / BDD10K portal: https://bdd-data.berkeley.edu/
Checkpoint Status
- Repository name:
AtteConDA-SDE-Scratch-30K - Stage: Fine-tuned scratch-initialized variant
- Condition set: SDE = semantic segmentation + depth + edge
- PAM status: PAM is not used.
The trainable local-control branch was randomly initialized rather than using the AtteConDA UniCon initialization checkpoint. The architecture family still follows the Uni-ControlNet-style design and Stable Diffusion v1.5 backbone setting.
Files in This Repository
This repository is intended to contain:
- model weight file(s), e.g.
*.ckptor*.safetensors README.md(this model card)LICENSE(repository-specific distribution notice)
Important:
- This release is described as a weights-focused repository.
- The matching config file is not bundled here at the moment.
- To run the checkpoint, use the companion project codebase and the matching config from that codebase.
Training Data
The trained AtteConDA variants in this release use the following training datasets:
- BDD10K semantic segmentation subset: 8,000 images (train 7,000 + val 1,000)
- Cityscapes train/val: 3,475 images (train 2,975 + val 500)
- GTA5: 24,966 images
- nuImages (front camera subset): 18,368 images
- BDD100K (excluding BDD10K overlap): 92,000 images
Total training images used by the trained variants: 146,809
Not used for training: Waymo
Waymo is used only for evaluation in this release series.
Evaluation Data
This release series uses a Waymo front-camera evaluation subset only for evaluation.
Evaluation-set notes:
- Waymo images are not part of training
- evaluation subset size: 3,048 images
- construction policy in the project materials: front-camera images extracted from the first / middle / last positions of segments
Training Procedure
- Fine-tuning steps: 30K
- Optimizer: AdamW
- Learning rate: 1e-5
- Batch size: 4
- Resolution: 512 x 512
- Frozen components: Stable Diffusion denoising backbone, VAE, and text encoder
- Trainable focus: local control branch
- Initialization difference: local control branch randomly initialized
Common project-side generation/evaluation settings for trained variants:
- guidance backbone family: Stable Diffusion 1.5 latent diffusion
- conditioning family: Uni-ControlNet-style controllable diffusion design
- inference sampler used in project evaluation: DDIM
- DDIM steps used in project evaluation: 50
- intended domain: autonomous-driving scene appearance modification while preserving scene structure
Quantitative Results
The following quantitative results were reported for this 30K scratch-initialized variant under the project evaluation protocol:
| Metric | Value |
|---|---|
| Semantic Segmentation mIoU ↑ | 0.2445 |
| Depth RMSE ↓ | 40.41 |
| Edge L1 Error ↓ | 0.03759 |
| Object Preservation F1 ↑ | 0.0455 |
| Diversity (1 - MS-SSIM) ↑ | 0.8450 |
| Reality (CLIP-CMMD) ↓ | 0.1827 |
| Text Alignment (R-Precision@1) ↑ | 0.2894 |
Intended Use
This repository is intended for:
- research on controllable diffusion models
- research on multi-condition generation
- research on synthetic data augmentation for autonomous-driving perception and reasoning tasks
- ablation studies on initialization, training steps, and PAM effects
- reproducible comparison across AtteConDA variants
Out-of-Scope Use
This repository is not intended for:
- commercial deployment
- customer-facing or production systems
- safety-critical decision making
- real-world vehicle operation or vehicle assistance
- any use that violates upstream model terms or dataset terms
Known Limitations
Known limitations of this release family include:
- possible structural failures on small distant objects
- possible distortion or disappearance of vehicles, traffic signs, or thin structures in difficult regions
- possible imperfect preservation of text on signboards
- evaluation is based on external projection models rather than full human relabeling
- not yet a guarantee of downstream task improvement for every autonomous-driving task
- current resolution and backbone scale may limit very fine-grained detail preservation
Bias, Domain Shift, and Generalization Notes
These checkpoints are trained on a mixture of road-scene datasets and should be treated as domain-dependent research artifacts. They may reflect:
- geographic bias
- weather / time imbalance
- dataset-specific annotation conventions
- camera viewpoint bias
- urban-scene category bias
Generalization outside the project setting must not be assumed.
Licensing and Use Restrictions
Do not label this repository as MIT.
Why:
- the Uni-ControlNet code repository is MIT-licensed, but
- this checkpoint family is built on Stable Diffusion v1.5 and
- Stable Diffusion v1.5 derivatives carry CreativeML Open RAIL-M obligations, while
- multiple training datasets in this project are distributed under non-commercial and/or research-oriented terms.
Accordingly, this repository uses:
license: otherin the Hugging Face metadata- a repository-root
LICENSEfile named AtteConDA Research-Only License
Practical summary:
- non-commercial research, teaching, scientific publication, and personal experimentation only
- preserve repository notices
- do not relax restrictions when redistributing
- comply with the upstream Stable Diffusion and dataset terms as well
Citation
If you use this repository, please cite the AtteConDA work and the upstream bases.
AtteConDA / thesis-level citation
@misc{noguchi2026atteconda,
author = {Shogo Noguchi},
title = {条件競合を抑制する注意機構に基づく多条件拡散モデルによる合成データ拡張フレームワーク},
year = {2026},
note = {Bachelor thesis, Gunma University}
}
Upstream references
@inproceedings{zhao2023unicontrolnet,
title={Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models},
author={Zhao, Shihao and others},
booktitle={NeurIPS},
year={2023}
}
@inproceedings{rombach2022high,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bjorn},
booktitle={CVPR},
year={2022}
}
Acknowledgements
This repository acknowledges the upstream foundations and datasets used in the AtteConDA project:
- Uni-ControlNet
- Stable Diffusion v1.5
- BDD10K / BDD100K
- Cityscapes
- GTA5 (Playing for Data)
- nuImages
Waymo is acknowledged as an evaluation dataset only for this release series and was not used for training.
Release Notes
This model card was written conservatively to avoid over-claiming. If you later publish exact benchmark tables, official project URLs, or bundled configs, update this card accordingly.
Model tree for Shogo-Noguchi/AtteConDA-SDE-Scratch-30K
Base model
stable-diffusion-v1-5/stable-diffusion-v1-5