Update README.md
Browse files
README.md
CHANGED
|
@@ -16,9 +16,7 @@ The LDM3D model was proposed in ["LDM3D: Latent Diffusion Model for 3D"](https:/
|
|
| 16 |
|
| 17 |
LDM3D got accepted to [CVPRW'23]([https://aaai.org/Conferences/AAAI-23/](https://cvpr2023.thecvf.com/)).
|
| 18 |
|
| 19 |
-
|
| 20 |
-
- [polyhaven](https://polyhaven.com/): 585 images for the training set, 66 images for the validation set
|
| 21 |
-
- [ihdri](https://www.ihdri.com/hdri-skies-outdoor/): 57 outdoor images for the training set, 7 outdoor images for the validation set.
|
| 22 |
|
| 23 |
These datasets were augmented using [Text2Light](https://frozenburning.github.io/projects/text2light/) to create a dataset containing 13852 training samples and 1606 validation samples.
|
| 24 |
|
|
@@ -47,14 +45,14 @@ Here is how to use this model to get the features of a given text in PyTorch:
|
|
| 47 |
|
| 48 |
from diffusers import StableDiffusionLDM3DPipeline
|
| 49 |
|
| 50 |
-
pipe = StableDiffusionLDM3DPipeline.from_pretrained("Intel/ldm3d-
|
| 51 |
pipe.to("cuda")
|
| 52 |
|
| 53 |
|
| 54 |
-
prompt ="
|
| 55 |
-
name = "
|
| 56 |
|
| 57 |
-
output = pipe(prompt)
|
| 58 |
rgb_image, depth_image = output.rgb, output.depth
|
| 59 |
rgb_image[0].save(name+"_ldm3d_rgb.jpg")
|
| 60 |
depth_image[0].save(name+"_ldm3d_depth.png")
|
|
@@ -62,7 +60,7 @@ depth_image[0].save(name+"_ldm3d_depth.png")
|
|
| 62 |
|
| 63 |
This is the result:
|
| 64 |
|
| 65 |
-
 for quantitative results.
|
| 85 |
-
The figure below shows some qualitative results comparing our method with (Stable diffusion v1.4)[https://arxiv.org/pdf/2112.10752.pdf] and with (DPT-Large)[https://arxiv.org/pdf/2103.13413.pdf] for the depth maps
|
| 86 |
-

|
| 87 |
|
| 88 |
### BibTeX entry and citation info
|
| 89 |
```bibtex
|
|
|
|
| 16 |
|
| 17 |
LDM3D got accepted to [CVPRW'23]([https://aaai.org/Conferences/AAAI-23/](https://cvpr2023.thecvf.com/)).
|
| 18 |
|
| 19 |
+
|
|
|
|
|
|
|
| 20 |
|
| 21 |
These datasets were augmented using [Text2Light](https://frozenburning.github.io/projects/text2light/) to create a dataset containing 13852 training samples and 1606 validation samples.
|
| 22 |
|
|
|
|
| 45 |
|
| 46 |
from diffusers import StableDiffusionLDM3DPipeline
|
| 47 |
|
| 48 |
+
pipe = StableDiffusionLDM3DPipeline.from_pretrained("Intel/ldm3d-pano")
|
| 49 |
pipe.to("cuda")
|
| 50 |
|
| 51 |
|
| 52 |
+
prompt ="360 view of a large bedroom"
|
| 53 |
+
name = "bedroom_pano"
|
| 54 |
|
| 55 |
+
output = pipe(prompt, width=1024, height=512,)
|
| 56 |
rgb_image, depth_image = output.rgb, output.depth
|
| 57 |
rgb_image[0].save(name+"_ldm3d_rgb.jpg")
|
| 58 |
depth_image[0].save(name+"_ldm3d_depth.png")
|
|
|
|
| 60 |
|
| 61 |
This is the result:
|
| 62 |
|
| 63 |
+

|
| 64 |
|
| 65 |
|
| 66 |
### Limitations and bias
|
|
|
|
| 75 |
|
| 76 |
### Finetuning
|
| 77 |
|
| 78 |
+
This checkpoint finetunes the previous [ldm3d-4c](https://huggingface.co/Intel/ldm3d-4c) on 2 panoramic-images datasets:
|
| 79 |
+
- [polyhaven](https://polyhaven.com/): 585 images for the training set, 66 images for the validation set
|
| 80 |
+
- [ihdri](https://www.ihdri.com/hdri-skies-outdoor/): 57 outdoor images for the training set, 7 outdoor images for the validation set.
|
| 81 |
|
| 82 |
+
|
| 83 |
+
These datasets were augmented using [Text2Light](https://frozenburning.github.io/projects/text2light/) to create a dataset containing 13852 training samples and 1606 validation samples.
|
| 84 |
+
|
| 85 |
+
In order to generate the depth map of those samples, we used [DPT-large](https://github.com/isl-org/MiDaS) and to generate the caption we used [BLIP-2](https://huggingface.co/docs/transformers/main/model_doc/blip-2)
|
| 86 |
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
### BibTeX entry and citation info
|
| 89 |
```bibtex
|