Instructions to use QuantFactory/LLaMA-Mesh-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuantFactory/LLaMA-Mesh-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuantFactory/LLaMA-Mesh-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("QuantFactory/LLaMA-Mesh-GGUF", dtype="auto")

llama-cpp-python

How to use QuantFactory/LLaMA-Mesh-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="QuantFactory/LLaMA-Mesh-GGUF",
	filename="LLaMA-Mesh.Q2_K.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use QuantFactory/LLaMA-Mesh-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M

Use Docker

docker model run hf.co/QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use QuantFactory/LLaMA-Mesh-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuantFactory/LLaMA-Mesh-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/LLaMA-Mesh-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M

SGLang

How to use QuantFactory/LLaMA-Mesh-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuantFactory/LLaMA-Mesh-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/LLaMA-Mesh-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuantFactory/LLaMA-Mesh-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuantFactory/LLaMA-Mesh-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use QuantFactory/LLaMA-Mesh-GGUF with Ollama:
```
ollama run hf.co/QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M
```

Unsloth Studio new

How to use QuantFactory/LLaMA-Mesh-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/LLaMA-Mesh-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for QuantFactory/LLaMA-Mesh-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for QuantFactory/LLaMA-Mesh-GGUF to start chatting

Docker Model Runner
How to use QuantFactory/LLaMA-Mesh-GGUF with Docker Model Runner:
```
docker model run hf.co/QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M
```

Lemonade

How to use QuantFactory/LLaMA-Mesh-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull QuantFactory/LLaMA-Mesh-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.LLaMA-Mesh-GGUF-Q4_K_M

List all available models

lemonade list

munish0838 commited on Nov 22, 2024

Commit

dc7b241

verified ·

1 Parent(s): bc78e21

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +129 -0

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+license: llama3.1
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- mesh-generation
+---
+[![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
+# QuantFactory/LLaMA-Mesh-GGUF
+This is quantized version of [Zhengyi/LLaMA-Mesh](https://huggingface.co/Zhengyi/LLaMA-Mesh) created using llama.cpp
+# Original Model Card
+# LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
+[**Paper**](https://arxiv.org/pdf/2411.09595) | [**Project Page**](https://research.nvidia.com/labs/toronto-ai/LLaMA-Mesh/)
+Pre-trained model weights of LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
+[Zhengyi Wang](https://thuwzy.github.io/), [Jonathan Lorraine](https://www.jonlorraine.com/), [Yikai Wang](https://yikaiw.github.io/), [Hang Su](https://www.suhangss.me/), [Jun Zhu](https://ml.cs.tsinghua.edu.cn/~jun/index.shtml), [Sanja Fidler](https://www.cs.utoronto.ca/~fidler/), [Xiaohui Zeng](https://www.cs.utoronto.ca/~xiaohui/)
+Abstract: *This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textual sources like 3D tutorials, and (2) enabling conversational 3D generation and mesh understanding. A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly. To address this, we introduce LLaMA-Mesh, a novel approach that represents the vertex coordinates and face definitions of 3D meshes as plain text, allowing direct integration with LLMs without expanding the vocabulary. We construct a supervised fine-tuning (SFT) dataset enabling pretrained LLMs to (1) generate 3D meshes from text prompts, (2) produce interleaved text and 3D mesh outputs as required, and (3) understand and interpret 3D meshes. Our work is the first to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge for 3D mesh generation in a text-based format, effectively unifying the 3D and text modalities. LLaMA-Mesh achieves mesh generation quality on par with models trained from scratch while maintaining strong text generation performance.*
+<video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/634e15aec1ce28f1de91c470/CwSCmyJizQderIYC8CaJ4.mp4"></video>
+## Method
+Overview of our method. LLaMA-Mesh unifies text and 3D mesh in a uniform format by representing the numerical values of vertex coordinates and face definitions of a 3D mesh as plain text. Our model is trained using text and 3D interleaved data in an end-to-end manner. Therefore, our model can generate both text and 3D meshes in a unified model.
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/634e15aec1ce28f1de91c470/0DzHXhoxonG5ZMeTysA6s.jpeg)
+### Model Developer: Base model weight is from Meta. Finetuned by Nvidia
+## Third-Party Community Consideration:
+This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA [Llama 3.1 Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md).
+## License/Terms of Use:
+This model, Llama-Mesh, is distributed under the following licenses:
+1. NSCLv1 License
+The Llama-Mesh model is licensed under the NSCLv1 license, which allows non-commercial use only. For details, please refer to the LICENSE.txt file.
+2. Llama 3.1 Community License Agreement
+This model incorporates components of Llama 3.1 technology, which is licensed under the Llama 3.1 Community License Agreement. Redistribution and use of Llama 3.1 materials must comply with the terms of this agreement. See the LLAMA_LICENSE.txt file for full details.
+## Attribution
+This model is built with Llama 3.1 technology, as required by the Llama 3.1 Community License Agreement. The required attribution is: "Built with Llama".
+## Reference(s):
+Llama 3.1 [Github](https://github.com/meta-llama/llama-models/tree/main/models/llama3_1)
+## Model Architecture:
+**Architecture Type:** Transformer
+*Network Architecture:* Llama 3.1
+## Input:
+**Input Type(s):** Text
+**Input Format(s):** String
+**Input Parameters:** 1D
+**Other Properties Related to Input:** Max token length 8k
+## Output:
+**Output Type(s):** Text
+**Output Format:** String
+**Output Parameters:** 1D
+**Other Properties Related to Output:** Max token length 8k
+**Supported Hardware Microarchitecture Compatibility:**
+* NVIDIA Ada
+**Supported Operating System(s):**
+* Linux
+## Model Version(s):
+Llama 3.1 8B mesh
+# Training Dataset:
+Please refer to [Llama 3.1  Model Card](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md) for information on Training, Testing, and Evaluation Datasets).
+The data is curated through converting Objaverse mesh data into text string (in the format as vertex index, face index as string). The model is finetuned on the curated dataset with 32 GPU.
+[**Objaverse**](https://objaverse.allenai.org/explore/)
+**Data Collection Method by dataset**: Unknown
+**Labeling Method by dataset**: Unknown
+**Properties:** We use 30k mesh data, which is a subset from the Objaverse. We filter the Objaverse dataset by the number of faces, and only keep the shape with the number of faces less than 500. They are saved as obj file format.
+**Dataset License(s):** The use of the dataset as a whole is licensed under the ODC-By v1.0 license.
+## Inference:
+**Engine**: Pytorch
+**Test Hardware**: A100
+## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
+## BibTeX
+```bibtex
+@misc{wang2024llamameshunifying3dmesh,
+    title={LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models},
+    author={Zhengyi Wang and Jonathan Lorraine and Yikai Wang and Hang Su and Jun Zhu and Sanja Fidler and Xiaohui Zeng},
+    year={2024},
+    eprint={2411.09595},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG},
+    url={https://arxiv.org/abs/2411.09595},
+}
+```