3dlg-hcvc
/

DuoduoCLIP

Model card Files Files and versions

DuoduoCLIP / README.md

eamonn-zh's picture

Update README.md

20d306d verified 6 months ago

|

history blame contribute delete

1.44 kB

	---
	{}
	---
	# Model Card for DuoduoCLIP

	In this model repo we provide the official pretrained models used in the paper Duoduo CLIP: Efficient 3D Understanding with Multi-View Images.
	The model usage and code can be found in the [github repo](https://github.com/3dlg-hcvc/DuoduoCLIP).

	*Note: We provide the main model in the initial release, we will soon upload the other models used in the paper.*

	## Model Details

	### Model Description

	- Finetuned from model: OpenCLIP model ("ViT-B-32" architecture and checkpoint "laion2b_s34b_b79k")

	### Model Sources

	- Repository: https://github.com/3dlg-hcvc/DuoduoCLIP
	- Paper: https://arxiv.org/abs/2406.11579

	### Model Checkpoints

	- Four_1to6F_bs1600_LT6.ckpt: The model trained with the Four dataset and 1 to 6 frames sampled during training, with the last 6 attention layers trainable.

	## Training Data

	The dataset card can be found [here](https://huggingface.co/datasets/3dlg-hcvc/DuoduoCLIP-data).

	BibTeX:
	```bibtex
	@inproceedings{
	lee2025duoduo,
	title={Duoduo {CLIP}: Efficient 3D Understanding with Multi-View Images},
	author={Han-Hung Lee and Yiming Zhang and Angel X Chang},
	booktitle={The Thirteenth International Conference on Learning Representations},
	year={2025},
	url={https://openreview.net/forum?id=iGbuc9ekKK}
	}
	```

	## Acknowledgement

	This work was funded by a CIFAR AI Chair, an NSERC Discovery grant, and a CFI/BCKDF JELF grant.