raniatze
/

pritti-checkpoints

3d-scene-generation

latent-diffusion

autonomous-driving

Model card Files Files and versions

pritti-checkpoints / README.md

raniatze's picture

Improve model card metadata and content (#1)

ad3838b 13 days ago

|

history blame contribute delete

3.56 kB

	---
	library_name: diffusers
	license: cc-by-nc-4.0
	pipeline_tag: other
	tags:
	- 3d-scene-generation
	- latent-diffusion
	- autonomous-driving
	- kitti-360
	- primitives
	- cvpr-2026
	---

	# PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Urban Scenes

	<p align="center">
	<a href="https://huggingface.co/papers/2506.19117">📄 Paper</a>  \|
	<a href="https://raniatze.github.io/pritti/">🌐 Project Page</a>  \|
	<a href="https://github.com/autonomousvision/pritti">💻 Code</a>
	</p>

	<p align="center">
	<img src="https://huggingface.co/raniatze/pritti-checkpoints/resolve/main/teaser.png" alt="PrITTI teaser" width="95%">
	</p>

	This repository hosts the pre-trained checkpoints for PrITTI (CVPR 2026), a latent-diffusion framework for controllable and editable 3D semantic urban scene generation.

	Existing approaches to 3D semantic urban scene generation predominantly rely on voxel-based representations. In contrast, PrITTI advocates for a primitive-based paradigm where urban scenes are represented using compact, semantically meaningful 3D elements that are easy to manipulate and compose. PrITTI achieves state-of-the-art 3D scene generation quality with lower memory requirements and faster inference than voxel-based methods.

	## Released Checkpoints

	The checkpoints below were trained on [KITTI-360](https://www.cvlibs.net/datasets/kitti-360/).

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `lvae.ckpt` \| 1.1 GB \| Layout Variational Autoencoder, trained for 300 epochs (`epoch=299, step=580200`). \|
	\| `ldm_b/` \| 773 MB \| DiT-B Latent Diffusion Model in `diffusers`-pipeline format (`model_index.json` + `transformer/` + `decoder/` + `scheduler/`). \|

	## Quick Start

	Full environment setup, preprocessing, training, inference, and evaluation instructions live in the [official GitHub repository](https://github.com/autonomousvision/pritti). The snippet below downloads both checkpoints into the locations the code expects:

	```bash
	# Make sure these are set (also documented in the main README)
	export LVAE_TIMESTAMP="2025.06.03.17.23.30"
	export LVAE_EPOCH="299"
	export LVAE_STEP="580200"

	# LVAE checkpoint
	LVAE_DIR=$PRITTI_EXP_ROOT/exp/training_lvae_model/training_lvae_model/$LVAE_TIMESTAMP/checkpoints
	mkdir -p $LVAE_DIR
	huggingface-cli download raniatze/pritti-checkpoints lvae.ckpt --local-dir $LVAE_DIR
	mv $LVAE_DIR/lvae.ckpt $LVAE_DIR/epoch=$LVAE_EPOCH-step=$LVAE_STEP.ckpt

	# LDM (DiT-B) checkpoint
	LDM_DIR=$PRITTI_EXP_ROOT/exp/training_dit_model/training_dit_b_model/training_dit_b_model/$LVAE_TIMESTAMP
	mkdir -p $LDM_DIR
	huggingface-cli download raniatze/pritti-checkpoints --include "ldm_b/*" --local-dir $LDM_DIR
	mv $LDM_DIR/ldm_b $LDM_DIR/checkpoint
	```

	Once downloaded, follow the [Inference](https://github.com/autonomousvision/pritti#-inference) section of the main README to reconstruct and generate scenes.

	## License

	Released under CC BY-NC 4.0 — free for academic and non-commercial research use. See [LICENSE](https://github.com/autonomousvision/pritti/blob/main/LICENSE) for full terms.

	## Citation

	If you find PrITTI useful, please cite:

	```bibtex
	@inproceedings{Tze2026PrITTI,
	author = {Tze, Christina Ourania and Dauner, Daniel and Liao, Yiyi and Tsishkou, Dzmitry and Geiger, Andreas},
	title = {PrITTI: Primitive-based Generation of Controllable and Editable 3D Semantic Scenes},
	booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
	year = {2026},
	}
	```