LGM / README.md

Update README.md

30dddfd verified 4 days ago

6.71 kB

	---
	license: mit
	pipeline_tag: image-to-3d
	library_name: diffusers
	base_model: "dylanebert/LGM-full"
	tags: ["image-to-3d", "text-to-3d", "3d-generation", "3d-gaussian-splatting", "gaussian-splatting", "multi-view-diffusion", "lgm", "diffusers", "safetensors", "objaverse", "research", "computer-graphics"]
	arxiv: "2402.05054"
	---

	<div align="center">

	# 🐙 WasabiOctopus / LGM

	### Large Multi-View Gaussian Model for Fast 3D Asset Generation

	<p>
	<img src="https://img.shields.io/badge/Task-Image--to--3D-blueviolet">
	<img src="https://img.shields.io/badge/Task-Text--to--3D-8A2BE2">
	<img src="https://img.shields.io/badge/Representation-3D%20Gaussian%20Splatting-orange">
	<img src="https://img.shields.io/badge/Library-Diffusers-yellow">
	<img src="https://img.shields.io/badge/License-MIT-green">
	</p>

	A Diffusers-ready LGM pipeline for fast 3D content creation from text or a single image.

	</div>

	## ✨ Highlights

	- 🚀 Fast 3D asset generation powered by the LGM pipeline.
	- 🧊 3D Gaussian Splatting representation for efficient high-resolution 3D content.
	- 🖼️ Text-to-3D and image-to-3D workflows through multi-view diffusion.
	- 🧩 Diffusers-compatible model structure with `LGMFullPipeline`.
	- 🔬 Useful for 3D generation research, creative prototyping, course projects, and rapid experimentation.

	## 🖼️ Gallery

	> Upload your own generated examples to an `assets/` folder and replace the placeholders below.

	\| Prompt / Input \| Generated 3D Asset \|
	\|---\|---\|
	\| `a cute robot, smooth toy material, studio lighting` \| Coming soon \|
	\| `a fantasy treasure chest with golden details` \| Coming soon \|
	\| `a stylized sci-fi helmet, clean hard-surface design` \| Coming soon \|

	## 🧠 What is LGM?

	LGM, short for Large Multi-View Gaussian Model, is a 3D generation framework designed for high-resolution 3D content creation.

	Instead of directly generating a mesh from scratch, the pipeline first produces multi-view visual information and then reconstructs a 3D Gaussian representation. This makes it suitable for fast, feed-forward 3D asset generation from either a text prompt or a single input image.

	This repository provides a convenient Hugging Face / Diffusers-style release of the full LGM pipeline.

	## 🏗️ Pipeline Overview

	```text
	Text prompt or single image
	↓
	Multi-view diffusion generation
	↓
	Multi-view Gaussian features
	↓
	LGM reconstruction module
	↓
	3D Gaussian asset
	↓
	PLY export / downstream rendering
	```

	## 🚀 Quick Start

	### 1. Install dependencies

	```bash
	pip install -U diffusers transformers accelerate safetensors
	pip install torch torchvision torchaudio
	pip install xformers trimesh kiui plyfile
	```

	For the full environment, check the repository `requirements.txt`.

	### 2. Load the pipeline

	```python
	import torch
	from diffusers import DiffusionPipeline

	repo_id = "WasabiOctopus/LGM"

	pipe = DiffusionPipeline.from_pretrained(
	repo_id,
	torch_dtype=torch.float16,
	trust_remote_code=True,
	)

	pipe = pipe.to("cuda")
	```

	### 3. Text-to-3D generation

	```python
	prompt = "a cute robot, smooth toy material, studio lighting, clean geometry"

	gaussians = pipe(
	prompt=prompt,
	num_inference_steps=50,
	guidance_scale=7.0,
	)

	pipe.save_ply(gaussians, "robot.ply")
	```

	### 4. Image-to-3D generation

	```python
	import numpy as np
	from PIL import Image

	image = Image.open("input.png").convert("RGB").resize((256, 256))
	image = np.array(image).astype(np.float32) / 255.0

	gaussians = pipe(
	prompt="",
	image=image,
	num_inference_steps=50,
	guidance_scale=7.0,
	)

	pipe.save_ply(gaussians, "asset_from_image.ply")
	```

	## 📦 Repository Contents

	```text
	WasabiOctopus/LGM
	├── README.md
	├── model_index.json
	├── pipeline.py
	├── requirements.txt
	├── feature_extractor/
	├── image_encoder/
	├── text_encoder/
	├── tokenizer/
	├── scheduler/
	├── vae/
	├── unet/
	└── lgm/
	```

	## 💡 Recommended Use Cases

	This model release is useful for:

	- Fast single-image-to-3D prototyping
	- Text-to-3D creative asset generation
	- 3D generation course projects
	- Research demos around 3D Gaussian Splatting
	- Benchmarking recent 3D asset generation pipelines
	- Building lightweight demos for Blender, Unity, or web-based 3D viewers

	## ⚠️ Limitations

	This model is a research-oriented 3D generation pipeline. It may produce imperfect geometry or artifacts in the following cases:

	- Thin structures, transparent objects, wires, fur, or complex topology
	- Highly reflective or texture-heavy objects
	- Ambiguous single-view inputs where the back side is not visible
	- Prompt-only generation requiring precise physical dimensions
	- Production workflows requiring clean quad meshes, rigging, or CAD-level topology

	For professional 3D asset production, additional post-processing may be needed, such as mesh extraction, topology cleanup, UV unwrapping, material editing, or manual refinement.

	## 🧪 Tips for Better Results

	Good prompts usually describe:

	```text
	object category + style + material + lighting + geometry constraint
	```

	Examples:

	```text
	a cute robot, rounded toy design, smooth plastic material, studio lighting
	a medieval treasure chest, golden metal details, wooden texture, clean geometry
	a sci-fi helmet, hard-surface design, matte black material, sharp edges
	a tiny house, stylized low-poly, warm colors, isometric game asset
	```

	For image-to-3D, use images with:

	- A single centered object
	- Clean background
	- Clear object silhouette
	- Minimal occlusion
	- Good lighting

	## 🔗 Related Links

	- Original paper: https://arxiv.org/abs/2402.05054
	- Original project page: https://me.kiui.moe/lgm/
	- Original GitHub repository: https://github.com/3DTopia/LGM
	- Upstream Hugging Face model: https://huggingface.co/dylanebert/LGM-full

	## 🙏 Acknowledgements

	This repository is based on the LGM ecosystem and the upstream Hugging Face full pipeline release. Full credit for the original LGM method goes to the authors of:

	LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation

	This release is intended as a convenient Hugging Face / Diffusers-compatible resource for research, education, and rapid experimentation.


	<div align="center">

	### 🐙 Built for fast 3D generation experiments.

	From prompt or image to 3D Gaussian assets — clean, simple, and research-friendly.

	</div>