Shogo-Noguchi commited on
Commit
8403186
·
0 Parent(s):

Re-upload AtteConDA-SDE-Scratch-30K

Browse files
Files changed (4) hide show
  1. .gitattributes +5 -0
  2. AtteConDA-SDE-Scratch-30K.ckpt +3 -0
  3. LICENSE +49 -0
  4. README.md +237 -0
.gitattributes ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
2
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.pt filter=lfs diff=lfs merge=lfs -text
5
+ *.pth filter=lfs diff=lfs merge=lfs -text
AtteConDA-SDE-Scratch-30K.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1487b1ead8522a4fca4398687972f943a51eccf8a56baf231632ad5ab3047fb
3
+ size 9775413770
LICENSE ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AtteConDA Research-Only License
2
+
3
+ Copyright (c) 2026 Shogo Noguchi
4
+
5
+ This repository distributes model weights from the AtteConDA series.
6
+ These weights are released for non-commercial research, teaching, scientific publication, and personal experimentation only.
7
+
8
+ 1. Upstream obligations
9
+ These weights are connected to workflows built on Stable Diffusion v1.5 and Uni-ControlNet-style control architecture.
10
+ Users must also comply with all applicable upstream and third-party terms, including:
11
+ - the CreativeML Open RAIL-M license applicable to Stable Diffusion v1.5 and derivatives thereof;
12
+ - the terms of the datasets used for training or fine-tuning.
13
+
14
+ 2. Permitted uses
15
+ You may download, reproduce, and share these weights only for:
16
+ - non-commercial research;
17
+ - teaching and academic instruction;
18
+ - scientific publication;
19
+ - personal experimentation.
20
+
21
+ 3. Prohibited uses
22
+ You may not:
23
+ - use these weights, in whole or in part, for commercial advantage or monetary compensation;
24
+ - sell, sublicense, or provide paid access to these weights;
25
+ - deploy these weights in a production system or customer-facing service;
26
+ - use these weights in connection with real-world vehicle operation or assistance;
27
+ - remove or obscure attribution, provenance, or restriction notices.
28
+
29
+ 4. Redistribution
30
+ If you redistribute this repository or modified versions of these weights, you must:
31
+ - retain this LICENSE file;
32
+ - preserve attribution and provenance notices;
33
+ - clearly indicate that you modified the weights or files;
34
+ - not relax the restrictions stated in this LICENSE;
35
+ - ensure downstream recipients are informed that additional upstream terms may apply.
36
+
37
+ 5. Dataset compliance
38
+ The training data for some variants in this series includes third-party datasets with non-commercial and/or research-only terms.
39
+ You are responsible for ensuring that your use complies with those dataset terms.
40
+
41
+ 6. No warranty
42
+ THE WEIGHTS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NON-INFRINGEMENT.
43
+
44
+ 7. Limitation of liability
45
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY ARISING FROM, OUT OF, OR IN CONNECTION WITH THE WEIGHTS OR THE USE OR OTHER DEALINGS IN THE WEIGHTS.
46
+
47
+ 8. Important note
48
+ This LICENSE is intended as a conservative repository-level distribution notice for this AtteConDA release.
49
+ It does not replace or waive any applicable upstream or dataset-specific obligations.
README.md ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: atteconda-research-only-license
4
+ license_link: LICENSE
5
+ base_model: stable-diffusion-v1-5/stable-diffusion-v1-5
6
+ tags:
7
+ - AtteConDA
8
+ - stable-diffusion
9
+ - autonomous-driving
10
+ - controllable-generation
11
+ - conditional-image-generation
12
+ - semantic-segmentation
13
+ - depth
14
+ - edge
15
+ - research-only
16
+ ---
17
+
18
+ # AtteConDA-SDE-Scratch-30K
19
+
20
+ ## Model Summary
21
+
22
+ `AtteConDA-SDE-Scratch-30K` is a checkpoint from the **AtteConDA** series (**Atte**ntion-based **Con**dition **D**isambiguation **A**rchitecture).
23
+
24
+ The AtteConDA series targets controllable image generation and synthetic data augmentation for autonomous-driving scenes using three local conditions:
25
+
26
+ - semantic segmentation
27
+ - depth
28
+ - edge
29
+
30
+ SDE in the repository name denotes this **S**emantic-segmentation + **D**epth + **E**dge condition set.
31
+
32
+ This repository contains the 30K-step scratch-initialized AtteConDA SDE variant.
33
+
34
+ ## Upstream Foundations and Provenance
35
+
36
+ This series is built on two upstream bases:
37
+
38
+ 1. **Uni-ControlNet** as the architectural/code reference for composable local/global control.
39
+ 2. **Stable Diffusion v1.5** as the latent diffusion foundation model.
40
+
41
+ Repository / upstream references:
42
+
43
+ - Uni-ControlNet: https://github.com/ShihaoZhaoZSH/Uni-ControlNet
44
+ - Stable Diffusion v1.5: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
45
+ - Cityscapes: https://www.cityscapes-dataset.com/
46
+ - GTA5 (Playing for Data): https://download.visinf.tu-darmstadt.de/data/from_games/
47
+ - nuImages: https://www.nuscenes.org/nuimages
48
+ - BDD100K / BDD10K portal: https://bdd-data.berkeley.edu/
49
+
50
+ ## Checkpoint Status
51
+
52
+ - **Repository name:** `AtteConDA-SDE-Scratch-30K`
53
+ - **Stage:** Fine-tuned scratch-initialized variant
54
+ - **Condition set:** SDE = semantic segmentation + depth + edge
55
+ - **PAM status:** PAM is **not** used.
56
+
57
+ The trainable local-control branch was randomly initialized rather than using the AtteConDA UniCon initialization checkpoint. The architecture family still follows the Uni-ControlNet-style design and Stable Diffusion v1.5 backbone setting.
58
+
59
+ ## Files in This Repository
60
+
61
+ This repository is intended to contain:
62
+
63
+ - model weight file(s), e.g. `*.ckpt` or `*.safetensors`
64
+ - `README.md` (this model card)
65
+ - `LICENSE` (repository-specific distribution notice)
66
+
67
+ Important:
68
+ - This release is described as a **weights-focused** repository.
69
+ - The matching config file is **not bundled here at the moment**.
70
+ - To run the checkpoint, use the companion project codebase and the matching config from that codebase.
71
+
72
+ ## Training Data
73
+
74
+ The trained AtteConDA variants in this release use the following training datasets:
75
+
76
+ - **BDD10K semantic segmentation subset:** 8,000 images (train 7,000 + val 1,000)
77
+ - **Cityscapes train/val:** 3,475 images (train 2,975 + val 500)
78
+ - **GTA5:** 24,966 images
79
+ - **nuImages (front camera subset):** 18,368 images
80
+ - **BDD100K (excluding BDD10K overlap):** 92,000 images
81
+
82
+ **Total training images used by the trained variants:** 146,809
83
+
84
+ **Not used for training:** Waymo
85
+ Waymo is used only for evaluation in this release series.
86
+
87
+ ## Evaluation Data
88
+
89
+ This release series uses a **Waymo front-camera evaluation subset** only for evaluation.
90
+
91
+ Evaluation-set notes:
92
+ - Waymo images are **not part of training**
93
+ - evaluation subset size: **3,048 images**
94
+ - construction policy in the project materials: front-camera images extracted from the first / middle / last positions of segments
95
+
96
+ ## Training Procedure
97
+
98
+ - Fine-tuning steps: **30K**
99
+ - Optimizer: **AdamW**
100
+ - Learning rate: **1e-5**
101
+ - Batch size: **4**
102
+ - Resolution: **512 x 512**
103
+ - Frozen components: Stable Diffusion denoising backbone, VAE, and text encoder
104
+ - Trainable focus: local control branch
105
+ - Initialization difference: local control branch randomly initialized
106
+
107
+ Common project-side generation/evaluation settings for trained variants:
108
+ - guidance backbone family: Stable Diffusion 1.5 latent diffusion
109
+ - conditioning family: Uni-ControlNet-style controllable diffusion design
110
+ - inference sampler used in project evaluation: DDIM
111
+ - DDIM steps used in project evaluation: 50
112
+ - intended domain: autonomous-driving scene appearance modification while preserving scene structure
113
+
114
+ ## Quantitative Results
115
+
116
+ The following quantitative results were reported for this 30K scratch-initialized variant under the project evaluation protocol:
117
+
118
+ | Metric | Value |
119
+ |---|---:|
120
+ | Semantic Segmentation mIoU ↑ | 0.2445 |
121
+ | Depth RMSE ↓ | 40.41 |
122
+ | Edge L1 Error ↓ | 0.03759 |
123
+ | Object Preservation F1 ↑ | 0.0455 |
124
+ | Diversity (1 - MS-SSIM) ↑ | 0.8450 |
125
+ | Reality (CLIP-CMMD) ↓ | 0.1827 |
126
+ | Text Alignment (R-Precision@1) ↑ | 0.2894 |
127
+
128
+ ## Intended Use
129
+
130
+ This repository is intended for:
131
+ - research on controllable diffusion models
132
+ - research on multi-condition generation
133
+ - research on synthetic data augmentation for autonomous-driving perception and reasoning tasks
134
+ - ablation studies on initialization, training steps, and PAM effects
135
+ - reproducible comparison across AtteConDA variants
136
+
137
+ ## Out-of-Scope Use
138
+
139
+ This repository is **not** intended for:
140
+ - commercial deployment
141
+ - customer-facing or production systems
142
+ - safety-critical decision making
143
+ - real-world vehicle operation or vehicle assistance
144
+ - any use that violates upstream model terms or dataset terms
145
+
146
+ ## Known Limitations
147
+
148
+ Known limitations of this release family include:
149
+ - possible structural failures on small distant objects
150
+ - possible distortion or disappearance of vehicles, traffic signs, or thin structures in difficult regions
151
+ - possible imperfect preservation of text on signboards
152
+ - evaluation is based on external projection models rather than full human relabeling
153
+ - not yet a guarantee of downstream task improvement for every autonomous-driving task
154
+ - current resolution and backbone scale may limit very fine-grained detail preservation
155
+
156
+ ## Bias, Domain Shift, and Generalization Notes
157
+
158
+ These checkpoints are trained on a mixture of road-scene datasets and should be treated as domain-dependent research artifacts.
159
+ They may reflect:
160
+ - geographic bias
161
+ - weather / time imbalance
162
+ - dataset-specific annotation conventions
163
+ - camera viewpoint bias
164
+ - urban-scene category bias
165
+
166
+ Generalization outside the project setting must not be assumed.
167
+
168
+ ## Licensing and Use Restrictions
169
+
170
+ **Do not label this repository as MIT.**
171
+
172
+ Why:
173
+ - the Uni-ControlNet code repository is MIT-licensed, but
174
+ - this checkpoint family is built on Stable Diffusion v1.5 and
175
+ - Stable Diffusion v1.5 derivatives carry CreativeML Open RAIL-M obligations, while
176
+ - multiple training datasets in this project are distributed under non-commercial and/or research-oriented terms.
177
+
178
+ Accordingly, this repository uses:
179
+ - `license: other` in the Hugging Face metadata
180
+ - a repository-root `LICENSE` file named **AtteConDA Research-Only License**
181
+
182
+ Practical summary:
183
+ - non-commercial research, teaching, scientific publication, and personal experimentation only
184
+ - preserve repository notices
185
+ - do not relax restrictions when redistributing
186
+ - comply with the upstream Stable Diffusion and dataset terms as well
187
+
188
+ ## Citation
189
+
190
+ If you use this repository, please cite the AtteConDA work and the upstream bases.
191
+
192
+ ### AtteConDA / thesis-level citation
193
+
194
+ ```bibtex
195
+ @misc{noguchi2026atteconda,
196
+ author = {Shogo Noguchi},
197
+ title = {条件競合を抑制する注意機構に基づく多条件拡散モデルによる合成データ拡張フレームワーク},
198
+ year = {2026},
199
+ note = {Bachelor thesis, Gunma University}
200
+ }
201
+ ```
202
+
203
+ ### Upstream references
204
+
205
+ ```bibtex
206
+ @inproceedings{zhao2023unicontrolnet,
207
+ title={Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models},
208
+ author={Zhao, Shihao and others},
209
+ booktitle={NeurIPS},
210
+ year={2023}
211
+ }
212
+
213
+ @inproceedings{rombach2022high,
214
+ title={High-Resolution Image Synthesis with Latent Diffusion Models},
215
+ author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bjorn},
216
+ booktitle={CVPR},
217
+ year={2022}
218
+ }
219
+ ```
220
+
221
+ ## Acknowledgements
222
+
223
+ This repository acknowledges the upstream foundations and datasets used in the AtteConDA project:
224
+
225
+ - Uni-ControlNet
226
+ - Stable Diffusion v1.5
227
+ - BDD10K / BDD100K
228
+ - Cityscapes
229
+ - GTA5 (Playing for Data)
230
+ - nuImages
231
+
232
+ Waymo is acknowledged as an evaluation dataset only for this release series and was not used for training.
233
+
234
+ ## Release Notes
235
+
236
+ This model card was written conservatively to avoid over-claiming.
237
+ If you later publish exact benchmark tables, official project URLs, or bundled configs, update this card accordingly.