File size: 1,204 Bytes
93e7af1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Gemma 3 vision

> [!IMPORTANT]
>

> This is very experimental, only used for demo purpose.

## Quick started

You can use pre-quantized model from [ggml-org](https://huggingface.co/ggml-org)'s Hugging Face account

```bash

# build

cmake -B build

cmake --build build --target llama-mtmd-cli



# alternatively, install from brew (MacOS)

brew install llama.cpp



# run it

llama-mtmd-cli -hf ggml-org/gemma-3-4b-it-GGUF

llama-mtmd-cli -hf ggml-org/gemma-3-12b-it-GGUF

llama-mtmd-cli -hf ggml-org/gemma-3-27b-it-GGUF



# note: 1B model does not support vision

```

## How to get mmproj.gguf?

Simply to add `--mmproj` in when converting model via `convert_hf_to_gguf.py`:

```bash

cd gemma-3-4b-it

python ../llama.cpp/convert_hf_to_gguf.py --outfile model.gguf --outtype f16 --mmproj .

# output file: mmproj-model.gguf

```

## How to run it?

What you need:
- The text model GGUF, can be converted using `convert_hf_to_gguf.py`
- The mmproj file from step above
- An image file

```bash

# build

cmake -B build

cmake --build build --target llama-mtmd-cli



# run it

./build/bin/llama-mtmd-cli -m {text_model}.gguf --mmproj mmproj.gguf --image your_image.jpg

```