pritti-checkpoints / README.md
raniatze's picture
Improve model card metadata and content (#1)
ad3838b
---
library_name: diffusers
license: cc-by-nc-4.0
pipeline_tag: other
tags:
- 3d-scene-generation
- latent-diffusion
- autonomous-driving
- kitti-360
- primitives
- cvpr-2026
---
# PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Urban Scenes
<p align="center">
<a href="https://huggingface.co/papers/2506.19117">πŸ“„ Paper</a> &nbsp;|&nbsp;
<a href="https://raniatze.github.io/pritti/">🌐 Project Page</a> &nbsp;|&nbsp;
<a href="https://github.com/autonomousvision/pritti">πŸ’» Code</a>
</p>
<p align="center">
<img src="https://huggingface.co/raniatze/pritti-checkpoints/resolve/main/teaser.png" alt="PrITTI teaser" width="95%">
</p>
This repository hosts the **pre-trained checkpoints** for **PrITTI** (CVPR 2026), a latent-diffusion framework for controllable and editable 3D semantic urban scene generation.
Existing approaches to 3D semantic urban scene generation predominantly rely on voxel-based representations. In contrast, PrITTI advocates for a primitive-based paradigm where urban scenes are represented using compact, semantically meaningful 3D elements that are easy to manipulate and compose. PrITTI achieves state-of-the-art 3D scene generation quality with lower memory requirements and faster inference than voxel-based methods.
## Released Checkpoints
The checkpoints below were trained on [KITTI-360](https://www.cvlibs.net/datasets/kitti-360/).
| File | Size | Description |
|------|------|-------------|
| `lvae.ckpt` | 1.1 GB | Layout Variational Autoencoder, trained for 300 epochs (`epoch=299, step=580200`). |
| `ldm_b/` | 773 MB | DiT-B Latent Diffusion Model in `diffusers`-pipeline format (`model_index.json` + `transformer/` + `decoder/` + `scheduler/`). |
## Quick Start
Full environment setup, preprocessing, training, inference, and evaluation instructions live in the [official GitHub repository](https://github.com/autonomousvision/pritti). The snippet below downloads both checkpoints into the locations the code expects:
```bash
# Make sure these are set (also documented in the main README)
export LVAE_TIMESTAMP="2025.06.03.17.23.30"
export LVAE_EPOCH="299"
export LVAE_STEP="580200"
# LVAE checkpoint
LVAE_DIR=$PRITTI_EXP_ROOT/exp/training_lvae_model/training_lvae_model/$LVAE_TIMESTAMP/checkpoints
mkdir -p $LVAE_DIR
huggingface-cli download raniatze/pritti-checkpoints lvae.ckpt --local-dir $LVAE_DIR
mv $LVAE_DIR/lvae.ckpt $LVAE_DIR/epoch=$LVAE_EPOCH-step=$LVAE_STEP.ckpt
# LDM (DiT-B) checkpoint
LDM_DIR=$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/training_dit_b_model/$LVAE_TIMESTAMP
mkdir -p $LDM_DIR
huggingface-cli download raniatze/pritti-checkpoints --include "ldm_b/*" --local-dir $LDM_DIR
mv $LDM_DIR/ldm_b $LDM_DIR/checkpoint
```
Once downloaded, follow the [Inference](https://github.com/autonomousvision/pritti#-inference) section of the main README to reconstruct and generate scenes.
## License
Released under **CC BY-NC 4.0** β€” free for academic and non-commercial research use. See [LICENSE](https://github.com/autonomousvision/pritti/blob/main/LICENSE) for full terms.
## Citation
If you find PrITTI useful, please cite:
```bibtex
@inproceedings{Tze2026PrITTI,
author = {Tze, Christina Ourania and Dauner, Daniel and Liao, Yiyi and Tsishkou, Dzmitry and Geiger, Andreas},
title = {PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Scenes},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026},
}
```