Instructions to use rogerxi/Spatial-LLaVA-7B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rogerxi/Spatial-LLaVA-7B-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rogerxi/Spatial-LLaVA-7B-gguf",
	filename="mmproj-model-f16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use rogerxi/Spatial-LLaVA-7B-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rogerxi/Spatial-LLaVA-7B-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf rogerxi/Spatial-LLaVA-7B-gguf:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rogerxi/Spatial-LLaVA-7B-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf rogerxi/Spatial-LLaVA-7B-gguf:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rogerxi/Spatial-LLaVA-7B-gguf:F16
# Run inference directly in the terminal:
./llama-cli -hf rogerxi/Spatial-LLaVA-7B-gguf:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rogerxi/Spatial-LLaVA-7B-gguf:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rogerxi/Spatial-LLaVA-7B-gguf:F16

Use Docker

docker model run hf.co/rogerxi/Spatial-LLaVA-7B-gguf:F16

LM Studio
Jan

vLLM

How to use rogerxi/Spatial-LLaVA-7B-gguf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rogerxi/Spatial-LLaVA-7B-gguf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rogerxi/Spatial-LLaVA-7B-gguf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/rogerxi/Spatial-LLaVA-7B-gguf:F16

Ollama
How to use rogerxi/Spatial-LLaVA-7B-gguf with Ollama:
```
ollama run hf.co/rogerxi/Spatial-LLaVA-7B-gguf:F16
```

Unsloth Studio new

How to use rogerxi/Spatial-LLaVA-7B-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rogerxi/Spatial-LLaVA-7B-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rogerxi/Spatial-LLaVA-7B-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rogerxi/Spatial-LLaVA-7B-gguf to start chatting

Docker Model Runner
How to use rogerxi/Spatial-LLaVA-7B-gguf with Docker Model Runner:
```
docker model run hf.co/rogerxi/Spatial-LLaVA-7B-gguf:F16
```

Lemonade

How to use rogerxi/Spatial-LLaVA-7B-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rogerxi/Spatial-LLaVA-7B-gguf:F16

Run and chat with the model

lemonade run user.Spatial-LLaVA-7B-gguf-F16

List all available models

lemonade list

Spatial-LLaVA-7B Model Card

Github Repo

🤗 Huggingface Space Demo

🤖 Model details

Model type:

This finetuned LLaVA model is trained from liuhaotian/llava-pretrain-vicuna-7b-v1.3 for improving spatial relation reasoning of large multi-modal model.

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

🎯 Intended use

Primary intended uses: The primary use of LLaVA is research on large multimodal models and chatbots.

Primary intended users: The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.

📚 Training dataset

Instruction following training: rogerxi/LLaVA-Spatial-Instruct-850K

📊 Evaluation

A collection of 10 benchmarks:

Model	VQAv2	GQA	VizWiz	SQA	TextVQA	POPE	MME	MM-Bench	MM-Bench-cn	MM-Vet
LLaVA-1.5-7b	78.5	62.0	50.0	66.8	58.2	85.9	1510.7	64.3	58.3	31.1
Spatial-LLaVA-7b	79.7	62.7	48.7	68.7	58.5	87.2	1472.7	67.8	60.7	31.6

Spatial-Relation-Eval (built based on SpatialRGPT-Bench):

Qualitative Spatial Relations

Model	Below/Above	Left/Right	Big/Small	Tall/Short	Wide/Thin	Behind/Front	Avg
LLaVA-1.5-7b	53.91	53.49	45.36	40.00	50.00	51.04	48.97
LLaVA-1.5-13b	54.28	52.32	45.36	48.57	49.02	47.92	49.67
Spatial-LLaVA-7b	56.32	66.28	60.82	48.57	49.02	52.08	55.12

Quantitative Spatial Relations

Model	Direct Dist (m / ratio)	Horizontal Dist (m / ratio)	Vertical Dist (m / ratio)	Width (m / ratio)	Height (m / ratio)	Direction (° / ratio)
LLaVA-1.5-7b	12.90 / 1.06	10.68 / 2.03	20.79 / 0.94	24.19 / 0.50	14.29 / 5.27	10.23 / 58.33
LLaVA-1.5-13b	13.71 / 0.93	10.68 / 3.56	16.83 / 0.85	15.32 / 0.57	17.67 / 5.8	14.77 / 54.29
Spatial-LLaVA-7b	24.19 / 0.57	14.56 / 0.62	41.58 / 0.42	22.58 / 1.12	18.25 / 2.92	20.45 / 56.47

🙏 Acknowledgements

We thank Liu Haotian et al. for the LLaVA pretrained script, weights and LLaVA-v1.5 mixture dataset; the teams behind CLEVR, TextCaps, VisualMRC and VQAv2 (via “HuggingFaceM4/the_cauldron”); remyxai for OpenSpaces; Anjie Cheng et al. for Spatial-Bench and data pipeline; Google for OpenImages; and Hugging Face for their datasets infrastructure.

Downloads last month: 52

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

1-bit

2-bit

4-bit

32-bit

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support