llama_cpp_WebUI / README.md
nzgnzg73's picture
Update README.md
b4cb3bc verified
---
license: apache-2.0
tags:
- Image-Text-to-Text Models
- Audio-Text-to-Text
- text-to-Text
- llama_cpp
- any to any
- Multimodal AI
- video-Text-to-Text
- Llama Model Switcher
---
nzgnzg73
llama_cpp_WebUI
Github
https://github.com/nzgnzg73/llama_cpp_WebUI
## Want to talk or ask something?
Just click the YouTube link below! You'll find my πŸ“§ email there and can message me easily. πŸ‘‡
πŸŽ₯ YouTube Channel: @nzg73
πŸ”— https://youtube.com/@NZG73
## Contact Email πŸ‘‡πŸ‘‡πŸ‘€
E-mail:-
nzgnzg73@gmail.com
llama Cpp (cp\gpu)
Old Version llama cpp πŸ‘‡πŸ‘‡πŸ‘‡πŸ‘‡
llama-b7200-bin-win-cpu-x64.zip
NEW Update Version llama cpp πŸ‘‡πŸ‘‡πŸ‘‡πŸ‘‡
llama-b7541-bin-win-cpu-x64
![Gemini_Generated_Image_rx1z0erx1z0erx1z](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/7Cjp8zMEtdftA0lfJzKZL.png)
## Llama Model Switcher
CMD
model_switcher.py
pip install flask
pip install flask psutil GPUtil
![1](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/PUekE98unK4E-wCue-Dle.png)
![2](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/CZvP7V61-SE4o4_Pu38_s.png)
## Image-Text-to-Text Models
## Gemma-3
CPU. RAM 20GB
OR
GPU. 4 VRAM
![1](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/qoYFWiyKbxfec608p3AbF.png)
1. gemma-3-12b-it-Q4_K_S.gguf
2. mmproj-model-f16-12B.gguf
## -Text-to-Text Models
## GPT OSS 20
![2](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/lYXBozCQV-SmpXDk4ca9I.png)
![Picsart_25-11-30_03-51-56-362](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/II21vWhc6UJwMwAqw1qaQ.png)
## Qwen3
CPU. RAM 25GB
OR
GPU. 4 VRAM
1. Qwen3-VL-2B-Instruct-Q8_0.gguf
2. mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf
![2](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/jmba7H290hQTPhBVL-v5Q.png)
## Qwen2.5-Omni
CPU. RAM 40GB
OR
GPU. 8 VRAM
1. Qwen2.5-Omni-7B-BF16.gguf
2. mmproj-F16.gguf 2GB
![4](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/v0ACCIgBmFUmcMH5rTjj_.png)
## Audio-Text-to-Text
## Llama-3.2
CPU. RAM 10GB
![3](https://cdn-uploads.huggingface.co/production/uploads/68edac11571026107ce50f6c/VJ9-OMFQYX8x-4HhhtknP.png)
1. Llama-3.2-1B-Instruct-Q4_K_M.gguf
2. Llama-3.2-1B-Instruct-Q8_0.gguf
3. mmproj-ultravox-v0_5-llama-3_2-1b-f16.gguf
## run.bat
Local Server
llama-server.exe --n-gpu-layers 2 --ctx-size 111192 -m ".\models\mistralai\mistralai_Voxtral-Mini-3B-2507-Q8_0.gguf" --mmproj ".\models\mistralai\mmproj-mistralai_Voxtral-Mini-3B-2507-bf16.gguf" --host 0.0.0.0Β --portΒ 8005
public URL
llama-server --n-gpu-layers 15 --ctx-size 8192 -m models/ollma/Llama-3.2-1B-Instruct-Q8_0.gguf --mmproj models/ollma/mmproj-ultravox-v0_5-llama-3_2-1b-f16.gguf --host 127.0.0.1Β --portΒ 8083