Shogo-Noguchi
/

AtteConDA-SDE-Scratch-30K

+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text

AtteConDA-SDE-Scratch-30K.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1487b1ead8522a4fca4398687972f943a51eccf8a56baf231632ad5ab3047fb
+size 9775413770

LICENSE ADDED Viewed

	@@ -0,0 +1,49 @@

+AtteConDA Research-Only License
+Copyright (c) 2026 Shogo Noguchi
+This repository distributes model weights from the AtteConDA series.
+These weights are released for non-commercial research, teaching, scientific publication, and personal experimentation only.
+1. Upstream obligations
+These weights are connected to workflows built on Stable Diffusion v1.5 and Uni-ControlNet-style control architecture.
+Users must also comply with all applicable upstream and third-party terms, including:
+- the CreativeML Open RAIL-M license applicable to Stable Diffusion v1.5 and derivatives thereof;
+- the terms of the datasets used for training or fine-tuning.
+2. Permitted uses
+You may download, reproduce, and share these weights only for:
+- non-commercial research;
+- teaching and academic instruction;
+- scientific publication;
+- personal experimentation.
+3. Prohibited uses
+You may not:
+- use these weights, in whole or in part, for commercial advantage or monetary compensation;
+- sell, sublicense, or provide paid access to these weights;
+- deploy these weights in a production system or customer-facing service;
+- use these weights in connection with real-world vehicle operation or assistance;
+- remove or obscure attribution, provenance, or restriction notices.
+4. Redistribution
+If you redistribute this repository or modified versions of these weights, you must:
+- retain this LICENSE file;
+- preserve attribution and provenance notices;
+- clearly indicate that you modified the weights or files;
+- not relax the restrictions stated in this LICENSE;
+- ensure downstream recipients are informed that additional upstream terms may apply.
+5. Dataset compliance
+The training data for some variants in this series includes third-party datasets with non-commercial and/or research-only terms.
+You are responsible for ensuring that your use complies with those dataset terms.
+6. No warranty
+THE WEIGHTS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NON-INFRINGEMENT.
+7. Limitation of liability
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY ARISING FROM, OUT OF, OR IN CONNECTION WITH THE WEIGHTS OR THE USE OR OTHER DEALINGS IN THE WEIGHTS.
+8. Important note
+This LICENSE is intended as a conservative repository-level distribution notice for this AtteConDA release.
+It does not replace or waive any applicable upstream or dataset-specific obligations.

README.md ADDED Viewed

	@@ -0,0 +1,237 @@

+---
+license: other
+license_name: atteconda-research-only-license
+license_link: LICENSE
+base_model: stable-diffusion-v1-5/stable-diffusion-v1-5
+tags:
+- AtteConDA
+- stable-diffusion
+- autonomous-driving
+- controllable-generation
+- conditional-image-generation
+- semantic-segmentation
+- depth
+- edge
+- research-only
+---
+# AtteConDA-SDE-Scratch-30K
+## Model Summary
+`AtteConDA-SDE-Scratch-30K` is a checkpoint from the **AtteConDA** series (**Atte**ntion-based **Con**dition **D**isambiguation **A**rchitecture).
+The AtteConDA series targets controllable image generation and synthetic data augmentation for autonomous-driving scenes using three local conditions:
+- semantic segmentation
+- depth
+- edge
+SDE in the repository name denotes this **S**emantic-segmentation + **D**epth + **E**dge condition set.
+This repository contains the 30K-step scratch-initialized AtteConDA SDE variant.
+## Upstream Foundations and Provenance
+This series is built on two upstream bases:
+1. **Uni-ControlNet** as the architectural/code reference for composable local/global control.
+2. **Stable Diffusion v1.5** as the latent diffusion foundation model.
+Repository / upstream references:
+- Uni-ControlNet: https://github.com/ShihaoZhaoZSH/Uni-ControlNet
+- Stable Diffusion v1.5: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
+- Cityscapes: https://www.cityscapes-dataset.com/
+- GTA5 (Playing for Data): https://download.visinf.tu-darmstadt.de/data/from_games/
+- nuImages: https://www.nuscenes.org/nuimages
+- BDD100K / BDD10K portal: https://bdd-data.berkeley.edu/
+## Checkpoint Status
+- **Repository name:** `AtteConDA-SDE-Scratch-30K`
+- **Stage:** Fine-tuned scratch-initialized variant
+- **Condition set:** SDE = semantic segmentation + depth + edge
+- **PAM status:** PAM is **not** used.
+The trainable local-control branch was randomly initialized rather than using the AtteConDA UniCon initialization checkpoint. The architecture family still follows the Uni-ControlNet-style design and Stable Diffusion v1.5 backbone setting.
+## Files in This Repository
+This repository is intended to contain:
+- model weight file(s), e.g. `*.ckpt` or `*.safetensors`
+- `README.md` (this model card)
+- `LICENSE` (repository-specific distribution notice)
+Important:
+- This release is described as a **weights-focused** repository.
+- The matching config file is **not bundled here at the moment**.
+- To run the checkpoint, use the companion project codebase and the matching config from that codebase.
+## Training Data
+The trained AtteConDA variants in this release use the following training datasets:
+- **BDD10K semantic segmentation subset:** 8,000 images (train 7,000 + val 1,000)
+- **Cityscapes train/val:** 3,475 images (train 2,975 + val 500)
+- **GTA5:** 24,966 images
+- **nuImages (front camera subset):** 18,368 images
+- **BDD100K (excluding BDD10K overlap):** 92,000 images
+**Total training images used by the trained variants:** 146,809
+**Not used for training:** Waymo
+Waymo is used only for evaluation in this release series.
+## Evaluation Data
+This release series uses a **Waymo front-camera evaluation subset** only for evaluation.
+Evaluation-set notes:
+- Waymo images are **not part of training**
+- evaluation subset size: **3,048 images**
+- construction policy in the project materials: front-camera images extracted from the first / middle / last positions of segments
+## Training Procedure
+- Fine-tuning steps: **30K**
+- Optimizer: **AdamW**
+- Learning rate: **1e-5**
+- Batch size: **4**
+- Resolution: **512 x 512**
+- Frozen components: Stable Diffusion denoising backbone, VAE, and text encoder
+- Trainable focus: local control branch
+- Initialization difference: local control branch randomly initialized
+Common project-side generation/evaluation settings for trained variants:
+- guidance backbone family: Stable Diffusion 1.5 latent diffusion
+- conditioning family: Uni-ControlNet-style controllable diffusion design
+- inference sampler used in project evaluation: DDIM
+- DDIM steps used in project evaluation: 50
+- intended domain: autonomous-driving scene appearance modification while preserving scene structure
+## Quantitative Results
+The following quantitative results were reported for this 30K scratch-initialized variant under the project evaluation protocol:
+| Metric | Value |
+|---|---:|
+| Semantic Segmentation mIoU ↑ | 0.2445 |
+| Depth RMSE ↓ | 40.41 |
+| Edge L1 Error ↓ | 0.03759 |
+| Object Preservation F1 ↑ | 0.0455 |
+| Diversity (1 - MS-SSIM) ↑ | 0.8450 |
+| Reality (CLIP-CMMD) ↓ | 0.1827 |
+| Text Alignment (R-Precision@1) ↑ | 0.2894 |
+## Intended Use
+This repository is intended for:
+- research on controllable diffusion models
+- research on multi-condition generation
+- research on synthetic data augmentation for autonomous-driving perception and reasoning tasks
+- ablation studies on initialization, training steps, and PAM effects
+- reproducible comparison across AtteConDA variants
+## Out-of-Scope Use
+This repository is **not** intended for:
+- commercial deployment
+- customer-facing or production systems
+- safety-critical decision making
+- real-world vehicle operation or vehicle assistance
+- any use that violates upstream model terms or dataset terms
+## Known Limitations
+Known limitations of this release family include:
+- possible structural failures on small distant objects
+- possible distortion or disappearance of vehicles, traffic signs, or thin structures in difficult regions
+- possible imperfect preservation of text on signboards
+- evaluation is based on external projection models rather than full human relabeling
+- not yet a guarantee of downstream task improvement for every autonomous-driving task
+- current resolution and backbone scale may limit very fine-grained detail preservation
+## Bias, Domain Shift, and Generalization Notes
+These checkpoints are trained on a mixture of road-scene datasets and should be treated as domain-dependent research artifacts.
+They may reflect:
+- geographic bias
+- weather / time imbalance
+- dataset-specific annotation conventions
+- camera viewpoint bias
+- urban-scene category bias
+Generalization outside the project setting must not be assumed.
+## Licensing and Use Restrictions
+**Do not label this repository as MIT.**
+Why:
+- the Uni-ControlNet code repository is MIT-licensed, but
+- this checkpoint family is built on Stable Diffusion v1.5 and
+- Stable Diffusion v1.5 derivatives carry CreativeML Open RAIL-M obligations, while
+- multiple training datasets in this project are distributed under non-commercial and/or research-oriented terms.
+Accordingly, this repository uses:
+- `license: other` in the Hugging Face metadata
+- a repository-root `LICENSE` file named **AtteConDA Research-Only License**
+Practical summary:
+- non-commercial research, teaching, scientific publication, and personal experimentation only
+- preserve repository notices
+- do not relax restrictions when redistributing
+- comply with the upstream Stable Diffusion and dataset terms as well
+## Citation
+If you use this repository, please cite the AtteConDA work and the upstream bases.
+### AtteConDA / thesis-level citation
+```bibtex
+@misc{noguchi2026atteconda,
+  author = {Shogo Noguchi},
+  title = {条件競合を抑制する注意機構に基づく多条件拡散モデルによる合成データ拡張フレームワーク},
+  year = {2026},
+  note = {Bachelor thesis, Gunma University}
+}
+```
+### Upstream references
+```bibtex
+@inproceedings{zhao2023unicontrolnet,
+  title={Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models},
+  author={Zhao, Shihao and others},
+  booktitle={NeurIPS},
+  year={2023}
+}
+@inproceedings{rombach2022high,
+  title={High-Resolution Image Synthesis with Latent Diffusion Models},
+  author={Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bjorn},
+  booktitle={CVPR},
+  year={2022}
+}
+```
+## Acknowledgements
+This repository acknowledges the upstream foundations and datasets used in the AtteConDA project:
+- Uni-ControlNet
+- Stable Diffusion v1.5
+- BDD10K / BDD100K
+- Cityscapes
+- GTA5 (Playing for Data)
+- nuImages
+Waymo is acknowledged as an evaluation dataset only for this release series and was not used for training.
+## Release Notes
+This model card was written conservatively to avoid over-claiming.
+If you later publish exact benchmark tables, official project URLs, or bundled configs, update this card accordingly.