File size: 6,711 Bytes
eea05f1
 
 
 
76bfabb
 
 
eea05f1
 
d7af4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
76bfabb
d7af4f8
 
 
 
 
76bfabb
 
 
 
 
d7af4f8
 
 
76bfabb
d7af4f8
76bfabb
 
 
 
 
d7af4f8
 
 
76bfabb
d7af4f8
76bfabb
d7af4f8
 
 
 
 
76bfabb
d7af4f8
76bfabb
 
 
 
 
 
 
 
 
 
 
d7af4f8
 
 
 
 
76bfabb
d7af4f8
 
 
 
 
76bfabb
 
d7af4f8
 
76bfabb
d7af4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76bfabb
d7af4f8
 
 
 
 
 
 
 
 
 
 
 
 
76bfabb
d7af4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76bfabb
d7af4f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76bfabb
 
 
 
 
 
d7af4f8
 
 
 
 
76bfabb
 
 
 
 
d7af4f8
 
 
 
 
 
 
76bfabb
 
 
d7af4f8
 
 
76bfabb
 
 
 
 
 
d7af4f8
 
 
76bfabb
 
 
 
 
d7af4f8
 
 
76bfabb
 
 
 
d7af4f8
 
 
76bfabb
d7af4f8
 
 
 
 
 
 
52e2add
d7af4f8
52e2add
d7af4f8
52e2add
76bfabb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---

license: mit
pipeline_tag: image-to-3d
library_name: diffusers
base_model: "dylanebert/LGM-full"
tags: ["image-to-3d", "text-to-3d", "3d-generation", "3d-gaussian-splatting", "gaussian-splatting", "multi-view-diffusion", "lgm", "diffusers", "safetensors", "objaverse", "research", "computer-graphics"]
arxiv: "2402.05054"
---


<div align="center">

# 🐙 WasabiOctopus / LGM

### Large Multi-View Gaussian Model for Fast 3D Asset Generation

<p>
  <img src="https://img.shields.io/badge/Task-Image--to--3D-blueviolet">
  <img src="https://img.shields.io/badge/Task-Text--to--3D-8A2BE2">
  <img src="https://img.shields.io/badge/Representation-3D%20Gaussian%20Splatting-orange">
  <img src="https://img.shields.io/badge/Library-Diffusers-yellow">
  <img src="https://img.shields.io/badge/License-MIT-green">
</p>

**A Diffusers-ready LGM pipeline for fast 3D content creation from text or a single image.**

</div>

## ✨ Highlights

- 🚀 **Fast 3D asset generation** powered by the LGM pipeline.
- 🧊 **3D Gaussian Splatting representation** for efficient high-resolution 3D content.
- 🖼️ **Text-to-3D and image-to-3D workflows** through multi-view diffusion.
- 🧩 **Diffusers-compatible model structure** with `LGMFullPipeline`.
- 🔬 Useful for **3D generation research, creative prototyping, course projects, and rapid experimentation**.

## 🖼️ Gallery

> Upload your own generated examples to an `assets/` folder and replace the placeholders below.

| Prompt / Input | Generated 3D Asset |
|---|---|
| `a cute robot, smooth toy material, studio lighting` | Coming soon |
| `a fantasy treasure chest with golden details` | Coming soon |
| `a stylized sci-fi helmet, clean hard-surface design` | Coming soon |

## 🧠 What is LGM?

**LGM**, short for **Large Multi-View Gaussian Model**, is a 3D generation framework designed for high-resolution 3D content creation.

Instead of directly generating a mesh from scratch, the pipeline first produces multi-view visual information and then reconstructs a 3D Gaussian representation. This makes it suitable for fast, feed-forward 3D asset generation from either a text prompt or a single input image.

This repository provides a convenient Hugging Face / Diffusers-style release of the full LGM pipeline.

## 🏗️ Pipeline Overview

```text

Text prompt or single image


Multi-view diffusion generation


Multi-view Gaussian features


LGM reconstruction module


3D Gaussian asset


PLY export / downstream rendering

```

## 🚀 Quick Start

### 1. Install dependencies

```bash

pip install -U diffusers transformers accelerate safetensors

pip install torch torchvision torchaudio

pip install xformers trimesh kiui plyfile

```

For the full environment, check the repository `requirements.txt`.

### 2. Load the pipeline

```python

import torch

from diffusers import DiffusionPipeline



repo_id = "WasabiOctopus/LGM"



pipe = DiffusionPipeline.from_pretrained(

    repo_id,

    torch_dtype=torch.float16,

    trust_remote_code=True,

)



pipe = pipe.to("cuda")

```

### 3. Text-to-3D generation

```python

prompt = "a cute robot, smooth toy material, studio lighting, clean geometry"



gaussians = pipe(

    prompt=prompt,

    num_inference_steps=50,

    guidance_scale=7.0,

)



pipe.save_ply(gaussians, "robot.ply")

```

### 4. Image-to-3D generation

```python

import numpy as np

from PIL import Image



image = Image.open("input.png").convert("RGB").resize((256, 256))

image = np.array(image).astype(np.float32) / 255.0



gaussians = pipe(

    prompt="",

    image=image,

    num_inference_steps=50,

    guidance_scale=7.0,

)



pipe.save_ply(gaussians, "asset_from_image.ply")

```

## 📦 Repository Contents

```text

WasabiOctopus/LGM

├── README.md

├── model_index.json

├── pipeline.py

├── requirements.txt

├── feature_extractor/

├── image_encoder/

├── text_encoder/

├── tokenizer/

├── scheduler/

├── vae/

├── unet/

└── lgm/

```

## 💡 Recommended Use Cases

This model release is useful for:

- Fast **single-image-to-3D** prototyping
- **Text-to-3D** creative asset generation
- 3D generation course projects
- Research demos around 3D Gaussian Splatting
- Benchmarking recent 3D asset generation pipelines
- Building lightweight demos for Blender, Unity, or web-based 3D viewers

## ⚠️ Limitations

This model is a research-oriented 3D generation pipeline. It may produce imperfect geometry or artifacts in the following cases:

- Thin structures, transparent objects, wires, fur, or complex topology
- Highly reflective or texture-heavy objects
- Ambiguous single-view inputs where the back side is not visible
- Prompt-only generation requiring precise physical dimensions
- Production workflows requiring clean quad meshes, rigging, or CAD-level topology

For professional 3D asset production, additional post-processing may be needed, such as mesh extraction, topology cleanup, UV unwrapping, material editing, or manual refinement.

## 🧪 Tips for Better Results

Good prompts usually describe:

```text

object category + style + material + lighting + geometry constraint

```

Examples:

```text

a cute robot, rounded toy design, smooth plastic material, studio lighting

a medieval treasure chest, golden metal details, wooden texture, clean geometry

a sci-fi helmet, hard-surface design, matte black material, sharp edges

a tiny house, stylized low-poly, warm colors, isometric game asset

```

For image-to-3D, use images with:

- A single centered object
- Clean background
- Clear object silhouette
- Minimal occlusion
- Good lighting

## 🔗 Related Links

- Original paper: https://arxiv.org/abs/2402.05054
- Original project page: https://me.kiui.moe/lgm/
- Original GitHub repository: https://github.com/3DTopia/LGM
- Upstream Hugging Face model: https://huggingface.co/dylanebert/LGM-full

## 🙏 Acknowledgements

This repository is based on the LGM ecosystem and the upstream Hugging Face full pipeline release. Full credit for the original LGM method goes to the authors of:

**LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation**

This release is intended as a convenient Hugging Face / Diffusers-compatible resource for research, education, and rapid experimentation.


<div align="center">

### 🐙 Built for fast 3D generation experiments.

**From prompt or image to 3D Gaussian assets — clean, simple, and research-friendly.**

</div>