AtteConDA-SDE-UniCon-Init
Model Summary
AtteConDA-SDE-UniCon-Init is a checkpoint from the AtteConDA series (Attention-based Condition Disambiguation Architecture).
The AtteConDA series targets controllable image generation and synthetic data augmentation for autonomous-driving scenes using three local conditions:
- semantic segmentation
- depth
- edge
SDE in the repository name denotes this Semantic-segmentation + Depth + Edge condition set.
This repository contains the initialization checkpoint used before task-specific fine-tuning. It is the neutral starting point in the AtteConDA SDE-UniCon line.
Upstream Foundations and Provenance
This series is built on two upstream bases:
- Uni-ControlNet as the architectural/code reference for composable local/global control.
- Stable Diffusion v1.5 as the latent diffusion foundation model.
Repository / upstream references:
- Uni-ControlNet: https://github.com/ShihaoZhaoZSH/Uni-ControlNet
- Stable Diffusion v1.5: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
- Cityscapes: https://www.cityscapes-dataset.com/
- GTA5 (Playing for Data): https://download.visinf.tu-darmstadt.de/data/from_games/
- nuImages: https://www.nuscenes.org/nuimages
- BDD100K / BDD10K portal: https://bdd-data.berkeley.edu/
Checkpoint Status
- Repository name:
AtteConDA-SDE-UniCon-Init - Stage: Initialization-only checkpoint (pre-fine-tuning / EPOCH 0)
- Condition set: SDE = semantic segmentation + depth + edge
- PAM status: PAM is not present in this checkpoint.
This checkpoint was produced by mapping Stable Diffusion v1.5 weights into a Uni-ControlNet-style local/global adapter structure before downstream task-specific training. No autonomous-driving fine-tuning was performed for this repository.
Files in This Repository
This repository is intended to contain:
- model weight file(s), e.g.
*.ckptor*.safetensors README.md(this model card)LICENSE(repository-specific distribution notice)
Important:
- This release is described as a weights-focused repository.
- The matching config file is not bundled here at the moment.
- To run the checkpoint, use the companion project codebase and the matching config from that codebase.
Training Data
This checkpoint itself was not fine-tuned on the autonomous-driving datasets listed below.
The trained AtteConDA variants in this release use the following training datasets:
- BDD10K semantic segmentation subset: 8,000 images (train 7,000 + val 1,000)
- Cityscapes train/val: 3,475 images (train 2,975 + val 500)
- GTA5: 24,966 images
- nuImages (front camera subset): 18,368 images
- BDD100K (excluding BDD10K overlap): 92,000 images
Total training images used by the trained variants: 146,809
Not used for training: Waymo
Waymo is used only for evaluation in this release series.
For strict accuracy, do not describe this repository as “trained on all datasets above”. It is an initialization-only checkpoint.
Evaluation Data
This release series uses a Waymo front-camera evaluation subset only for evaluation.
Evaluation-set notes:
- Waymo images are not part of training
- evaluation subset size: 3,048 images
- construction policy in the project materials: front-camera images extracted from the first / middle / last positions of segments
Training Procedure
- Foundation checkpoint source: Stable Diffusion v1.5
- Architecture family: Uni-ControlNet-style local/global control design
- Purpose of this checkpoint: provide a reproducible pre-fine-tuning starting point
- Fine-tuning steps applied in this repo: 0
- This repository currently distributes weights only
- Matching config files are not bundled in this repo at the moment
Common project-side generation/evaluation settings for trained variants:
- guidance backbone family: Stable Diffusion 1.5 latent diffusion
- conditioning family: Uni-ControlNet-style controllable diffusion design
- inference sampler used in project evaluation: DDIM
- DDIM steps used in project evaluation: 50
- intended domain: autonomous-driving scene appearance modification while preserving scene structure
Quantitative Results
No task-specific benchmark metrics are reported for this repository because it is the pre-fine-tuning initialization checkpoint.
Intended Use
This repository is intended for:
- research on controllable diffusion models
- research on multi-condition generation
- research on synthetic data augmentation for autonomous-driving perception and reasoning tasks
- ablation studies on initialization, training steps, and PAM effects
- reproducible comparison across AtteConDA variants
Out-of-Scope Use
This repository is not intended for:
- commercial deployment
- customer-facing or production systems
- safety-critical decision making
- real-world vehicle operation or vehicle assistance
- any use that violates upstream model terms or dataset terms
Known Limitations
Known limitations of this release family include:
- possible structural failures on small distant objects
- possible distortion or disappearance of vehicles, traffic signs, or thin structures in difficult regions
- possible imperfect preservation of text on signboards
- evaluation is based on external projection models rather than full human relabeling
- not yet a guarantee of downstream task improvement for every autonomous-driving task
- current resolution and backbone scale may limit very fine-grained detail preservation
Bias, Domain Shift, and Generalization Notes
These checkpoints are trained on a mixture of road-scene datasets and should be treated as domain-dependent research artifacts. They may reflect:
- geographic bias
- weather / time imbalance
- dataset-specific annotation conventions
- camera viewpoint bias
- urban-scene category bias
Generalization outside the project setting must not be assumed.
Licensing and Use Restrictions
Do not label this repository as MIT.
Why:
- the Uni-ControlNet code repository is MIT-licensed, but
- this checkpoint family is built on Stable Diffusion v1.5 and
- Stable Diffusion v1.5 derivatives carry CreativeML Open RAIL-M obligations, while
- multiple training datasets in this project are distributed under non-commercial and/or research-oriented terms.
Accordingly, this repository uses:
license: otherin the Hugging Face metadata- a repository-root
LICENSEfile named AtteConDA Research-Only License
Practical summary:
- non-commercial research, teaching, scientific publication, and personal experimentation only
- preserve repository notices
- do not relax restrictions when redistributing
- comply with the upstream Stable Diffusion and dataset terms as well
Citation
If you use this repository, please cite the AtteConDA work and the upstream bases.
AtteConDA / thesis-level citation
@misc{noguchi2026atteconda,
author = {Shogo Noguchi},
title = {条件競合を抑制する注意機構に基づく多条件拡散モデルによる合成データ拡張フレームワーク},
year = {2026},
note = {Bachelor thesis, Gunma University}
}
Upstream references
@inproceedings{zhao2023unicontrolnet,
title={Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models},
author={Zhao, Shihao and others},
booktitle={NeurIPS},
year={2023}
}
@inproceedings{rombach2022high,
title={High-Resolution Image Synthesis with Latent Diffusion Models},
author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bjorn},
booktitle={CVPR},
year={2022}
}
Acknowledgements
This repository acknowledges the upstream foundations and datasets used in the AtteConDA project:
- Uni-ControlNet
- Stable Diffusion v1.5
- BDD10K / BDD100K
- Cityscapes
- GTA5 (Playing for Data)
- nuImages
Waymo is acknowledged as an evaluation dataset only for this release series and was not used for training.
Release Notes
This model card was written conservatively to avoid over-claiming. If you later publish exact benchmark tables, official project URLs, or bundled configs, update this card accordingly.
Model tree for Shogo-Noguchi/AtteConDA-SDE-UniCon-Init
Base model
stable-diffusion-v1-5/stable-diffusion-v1-5