--- license: mit pipeline_tag: image-to-3d library_name: diffusers base_model: "dylanebert/LGM-full" tags: ["image-to-3d", "text-to-3d", "3d-generation", "3d-gaussian-splatting", "gaussian-splatting", "multi-view-diffusion", "lgm", "diffusers", "safetensors", "objaverse", "research", "computer-graphics"] arxiv: "2402.05054" ---
# ๐Ÿ™ WasabiOctopus / LGM ### Large Multi-View Gaussian Model for Fast 3D Asset Generation

**A Diffusers-ready LGM pipeline for fast 3D content creation from text or a single image.**
## โœจ Highlights - ๐Ÿš€ **Fast 3D asset generation** powered by the LGM pipeline. - ๐ŸงŠ **3D Gaussian Splatting representation** for efficient high-resolution 3D content. - ๐Ÿ–ผ๏ธ **Text-to-3D and image-to-3D workflows** through multi-view diffusion. - ๐Ÿงฉ **Diffusers-compatible model structure** with `LGMFullPipeline`. - ๐Ÿ”ฌ Useful for **3D generation research, creative prototyping, course projects, and rapid experimentation**. ## ๐Ÿ–ผ๏ธ Gallery > Upload your own generated examples to an `assets/` folder and replace the placeholders below. | Prompt / Input | Generated 3D Asset | |---|---| | `a cute robot, smooth toy material, studio lighting` | Coming soon | | `a fantasy treasure chest with golden details` | Coming soon | | `a stylized sci-fi helmet, clean hard-surface design` | Coming soon | ## ๐Ÿง  What is LGM? **LGM**, short for **Large Multi-View Gaussian Model**, is a 3D generation framework designed for high-resolution 3D content creation. Instead of directly generating a mesh from scratch, the pipeline first produces multi-view visual information and then reconstructs a 3D Gaussian representation. This makes it suitable for fast, feed-forward 3D asset generation from either a text prompt or a single input image. This repository provides a convenient Hugging Face / Diffusers-style release of the full LGM pipeline. ## ๐Ÿ—๏ธ Pipeline Overview ```text Text prompt or single image โ†“ Multi-view diffusion generation โ†“ Multi-view Gaussian features โ†“ LGM reconstruction module โ†“ 3D Gaussian asset โ†“ PLY export / downstream rendering ``` ## ๐Ÿš€ Quick Start ### 1. Install dependencies ```bash pip install -U diffusers transformers accelerate safetensors pip install torch torchvision torchaudio pip install xformers trimesh kiui plyfile ``` For the full environment, check the repository `requirements.txt`. ### 2. Load the pipeline ```python import torch from diffusers import DiffusionPipeline repo_id = "WasabiOctopus/LGM" pipe = DiffusionPipeline.from_pretrained( repo_id, torch_dtype=torch.float16, trust_remote_code=True, ) pipe = pipe.to("cuda") ``` ### 3. Text-to-3D generation ```python prompt = "a cute robot, smooth toy material, studio lighting, clean geometry" gaussians = pipe( prompt=prompt, num_inference_steps=50, guidance_scale=7.0, ) pipe.save_ply(gaussians, "robot.ply") ``` ### 4. Image-to-3D generation ```python import numpy as np from PIL import Image image = Image.open("input.png").convert("RGB").resize((256, 256)) image = np.array(image).astype(np.float32) / 255.0 gaussians = pipe( prompt="", image=image, num_inference_steps=50, guidance_scale=7.0, ) pipe.save_ply(gaussians, "asset_from_image.ply") ``` ## ๐Ÿ“ฆ Repository Contents ```text WasabiOctopus/LGM โ”œโ”€โ”€ README.md โ”œโ”€โ”€ model_index.json โ”œโ”€โ”€ pipeline.py โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ feature_extractor/ โ”œโ”€โ”€ image_encoder/ โ”œโ”€โ”€ text_encoder/ โ”œโ”€โ”€ tokenizer/ โ”œโ”€โ”€ scheduler/ โ”œโ”€โ”€ vae/ โ”œโ”€โ”€ unet/ โ””โ”€โ”€ lgm/ ``` ## ๐Ÿ’ก Recommended Use Cases This model release is useful for: - Fast **single-image-to-3D** prototyping - **Text-to-3D** creative asset generation - 3D generation course projects - Research demos around 3D Gaussian Splatting - Benchmarking recent 3D asset generation pipelines - Building lightweight demos for Blender, Unity, or web-based 3D viewers ## โš ๏ธ Limitations This model is a research-oriented 3D generation pipeline. It may produce imperfect geometry or artifacts in the following cases: - Thin structures, transparent objects, wires, fur, or complex topology - Highly reflective or texture-heavy objects - Ambiguous single-view inputs where the back side is not visible - Prompt-only generation requiring precise physical dimensions - Production workflows requiring clean quad meshes, rigging, or CAD-level topology For professional 3D asset production, additional post-processing may be needed, such as mesh extraction, topology cleanup, UV unwrapping, material editing, or manual refinement. ## ๐Ÿงช Tips for Better Results Good prompts usually describe: ```text object category + style + material + lighting + geometry constraint ``` Examples: ```text a cute robot, rounded toy design, smooth plastic material, studio lighting a medieval treasure chest, golden metal details, wooden texture, clean geometry a sci-fi helmet, hard-surface design, matte black material, sharp edges a tiny house, stylized low-poly, warm colors, isometric game asset ``` For image-to-3D, use images with: - A single centered object - Clean background - Clear object silhouette - Minimal occlusion - Good lighting ## ๐Ÿ”— Related Links - Original paper: https://arxiv.org/abs/2402.05054 - Original project page: https://me.kiui.moe/lgm/ - Original GitHub repository: https://github.com/3DTopia/LGM - Upstream Hugging Face model: https://huggingface.co/dylanebert/LGM-full ## ๐Ÿ™ Acknowledgements This repository is based on the LGM ecosystem and the upstream Hugging Face full pipeline release. Full credit for the original LGM method goes to the authors of: **LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation** This release is intended as a convenient Hugging Face / Diffusers-compatible resource for research, education, and rapid experimentation.
### ๐Ÿ™ Built for fast 3D generation experiments. **From prompt or image to 3D Gaussian assets โ€” clean, simple, and research-friendly.**