Instructions to use nzgnzg73/llama_cpp_WebUI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nzgnzg73/llama_cpp_WebUI with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="nzgnzg73/llama_cpp_WebUI",
	filename="Image-Text-to-Text Models/gemma-3/gemma-3-12b-it-Q4_K_S.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use nzgnzg73/llama_cpp_WebUI with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nzgnzg73/llama_cpp_WebUI:Q4_K_S
# Run inference directly in the terminal:
llama-cli -hf nzgnzg73/llama_cpp_WebUI:Q4_K_S

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf nzgnzg73/llama_cpp_WebUI:Q4_K_S
# Run inference directly in the terminal:
llama-cli -hf nzgnzg73/llama_cpp_WebUI:Q4_K_S

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf nzgnzg73/llama_cpp_WebUI:Q4_K_S
# Run inference directly in the terminal:
./llama-cli -hf nzgnzg73/llama_cpp_WebUI:Q4_K_S

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf nzgnzg73/llama_cpp_WebUI:Q4_K_S
# Run inference directly in the terminal:
./build/bin/llama-cli -hf nzgnzg73/llama_cpp_WebUI:Q4_K_S

Use Docker

docker model run hf.co/nzgnzg73/llama_cpp_WebUI:Q4_K_S

LM Studio
Jan
Ollama
How to use nzgnzg73/llama_cpp_WebUI with Ollama:
```
ollama run hf.co/nzgnzg73/llama_cpp_WebUI:Q4_K_S
```

Unsloth Studio

How to use nzgnzg73/llama_cpp_WebUI with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nzgnzg73/llama_cpp_WebUI to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for nzgnzg73/llama_cpp_WebUI to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for nzgnzg73/llama_cpp_WebUI to start chatting

Docker Model Runner
How to use nzgnzg73/llama_cpp_WebUI with Docker Model Runner:
```
docker model run hf.co/nzgnzg73/llama_cpp_WebUI:Q4_K_S
```

Lemonade

How to use nzgnzg73/llama_cpp_WebUI with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull nzgnzg73/llama_cpp_WebUI:Q4_K_S

Run and chat with the model

lemonade run user.llama_cpp_WebUI-Q4_K_S

List all available models

lemonade list

nzgnzg73 commited on Nov 13, 2025

Commit

5affea7

verified ·

1 Parent(s): 32e3778

Upload 4 files

Browse files

Files changed (4) hide show

llama_cpp_WebUI FILE/Image to Text Model Setup Guide (WebUI).txt +203 -0
llama_cpp_WebUI FILE/Install tutorial.txt +132 -0
llama_cpp_WebUI FILE/Run.BAT +59 -0
llama_cpp_WebUI FILE/run_bat Edit tutorial.txt +267 -0

llama_cpp_WebUI FILE/Image to Text Model Setup Guide (WebUI).txt ADDED Viewed

	@@ -0,0 +1,203 @@

+llama-server --n-gpu-layers 5 --ctx-size 14096 -m models/Qwen3-VL-2B-Instruct-Q8_0.gguf --mmproj models/mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf --host 127.0.0.1 --port 8083
+Chatgpt;- https://chatgpt.com/share/6914d98c-9784-800e-8d92-e8eb0a25b5a4
+https://chatgpt.com/share/6914d98c-9784-800e-8d92-e8eb0a25b5a4
+nglish Version 🇬🇧
+Title: [Image to Text Model Setup Guide (WebUI)]
+Part 1: File and Folder Setup
+ * Base Folder: Assume your main software directory is D:\Flie\llama.cpp.
+ * Download Vision Model:
+   * To run any Vision Model (e.g., Qwen-VL or Gemma-Vision), you must download two files from the model source (like Hugging Face):
+     * File 1 (Main Model): This is the large model file (e.g., Qwen3-VL-2B-Instruct-Q8_0.gguf).
+     * File 2 (MM Projector): This is the small model file responsible for image processing (e.g., mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf).
+ * File Placement: Place both of these files inside the models folder located within D:\Flie\llama.cpp.
+ * Ensure GPU Support (CUDA):
+   * Ensure that the llama-server.exe file in your D:\Flie\llama.cpp folder has been replaced with the version compiled for GPU (CUDA) support (as done by replacing all files from the 373 MB CUDART zip).
+Part 2: Preparing the Command (Running the Software)
+You must write a single command to tell the software which files to use and how much GPU power to allocate.
+Command Structure:
+llama-server --n-gpu-layers [GPU_LAYERS] --ctx-size 14096 -m models/[MAIN_MODEL_FILE.gguf] --mmproj models/[MMPROJ_FILE.gguf] --host 127.0.0.1 --port 8083
+ * llama-server: This starts the WebUI server for use in a browser.
+ * --n-gpu-layers [GPU_LAYERS]:
+   * What to put here? Enter a number based on your GPU's VRAM (e.g., 28 or 20). This accelerates text and image processing by transferring the load from the CPU to the GPU.
+ * -m models/[MAIN_MODEL_FILE.gguf]:
+   * What to put here? Enter the full filename of the large Vision Model file you downloaded.
+   * Example: -m models/Qwen3-VL-2B-Instruct-Q8_0.gguf
+ * --mmproj models/[MMPROJ_FILE.gguf]:
+   * What to put here? This is the Multi-Modal Projector path. Enter the full filename of the small MM Projector file.
+   * Crucial Rule: The MM Projector file MUST be from the same model family as the Main Model. Mixing files (e.g., a Gemma mmproj with a Qwen main model) will NOT work.
+   * Example (For Qwen-VL): --mmproj models/mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf
+Final Command (Qwen-VL Example):
+Run this command inside your D:\Flie\llama.cpp> folder:
+llama-server --n-gpu-layers 28 --ctx-size 14096 -m models/Qwen3-VL-2B-Instruct-Q8_0.gguf --mmproj models/mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf --host 127.0.0.1 --port 8083
+Part 3: Using the Model on WebUI
+ * Run the Command: The command prompt will start loading the model onto your GPU.
+ * Open URL: Once the console shows listening on http://127.0.0.1:8083, open this URL in your Chrome browser.
+ * Upload Image: You will find an image upload button next to the chat window.
+ * Fast Processing: The image processing (which was previously slow) will now be handled by your GPU and the model will respond quickly
+Urdu
+یہ ٹیوٹوریل WebUI (کروم براؤزر) پر تصویر اپلوڈ کرنے والے ماڈلز کو چلانے کا مکمل طریقہ ہے۔
+ٹیوٹوریل: llama.cpp پر تصویر (Image) والا ماڈل چلانے کا طریقہ (GPU کے ساتھ)
+عنوان: [Image to Text Model Setup Guide (WebUI)]
+اردو میں 🇵🇰
+یہ ہدایات کسی بھی نئے Vision ماڈل (جیسے Qwen-VL یا Gemma-Vision) کو آپ کے NVIDIA GPU پر چلا کر کروم براؤزر میں استعمال کرنے کے لیے ہیں۔
+حصہ 1: فائلیں اور فولڈر سیٹ اپ
+ * بنیادی فولڈر: فرض کریں کہ آپ کا سارا سافٹ ویئر D:\Flie\llama.cpp فولڈر میں موجود ہے۔
+ * Vision ماڈل ڈاؤنلوڈ کرنا:
+   * آپ جس بھی Vision ماڈل کو چلانا چاہتے ہیں (مثلاً Qwen-VL یا Gemma-Vision)، آپ کو اس کی دو فائلیں ڈاؤنلوڈ کرنی ہوں گی (جیسے آپ نے Hugging Face سے کی تھیں):
+     * پہلی فائل (Main Model): یہ ماڈل کا بڑا حصہ ہے (جیسے Qwen3-VL-2B-Instruct-Q8_0.gguf)۔
+     * دوسری فائل (MM Projector): یہ ماڈل کا چھوٹا حصہ ہے جو تصویر کو پروسیس کرتا ہے (جیسے mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf)۔
+ * فائلوں کی جگہ: ان دونوں فائلوں کو D:\Flie\llama.cpp کے اندر موجود models نامی فولڈر میں رکھیں۔
+ * GPU سپورٹ (CUDA) یقینی بنانا:
+   * یقینی بنائیں کہ آپ کے D:\Flie\llama.cpp فولڈر میں موجود llama-server.exe فائل GPU (CUDA) سپورٹ کے ساتھ کمپائل ہوئی ہو (جیسا کہ ہم نے 373 MB والی CUDART زِپ کو استعمال کر کے تمام فائلیں تبدیل کی تھیں)۔
+حصہ 2: کمانڈ تیار کرنا (سافٹ ویئر کو چلانا)
+آپ کو ایک واحد کمانڈ لکھنی ہے جو سافٹ ویئر کو یہ بتائے کہ کون سی فائل کس کام کے لیے استعمال کرنی ہے اور GPU پر کتنی طاقت استعمال کرنی ہے۔
+کمانڈ کا مکمل سٹرکچر:
+llama-server --n-gpu-layers [GPU_LAYERS] --ctx-size 14096 -m models/[MAIN_MODEL_FILE.gguf] --mmproj models/[MMPROJ_FILE.gguf] --host 127.0.0.1 --port 8083
+ * llama-server: یہ کمانڈ براؤزر میں WebUI کو شروع کرنے کے لیے ضروری ہے۔
+ * --n-gpu-layers [GPU_LAYERS]:
+   * کون سی ویلیو ڈالیں؟ یہاں آپ اپنے GPU کی VRAM کے حساب سے ایک نمبر ڈالیں (جیسے 28 یا 20 اگر میموری کم ہے)۔ یہ ٹیکسٹ اور تصویر کی پروسیسنگ کو تیز کرتا ہے۔
+   * مثال: --n-gpu-layers 28
+ * -m models/[MAIN_MODEL_FILE.gguf]:
+   * کون سی ویلیو ڈالیں؟ یہاں آپ بڑی Vision ماڈل فائل کا نام ڈالیں گے جو آپ نے Hugging Face سے ڈاؤنلوڈ کی تھی۔
+   * مثال: -m models/Qwen3-VL-2B-Instruct-Q8_0.gguf
+ * --mmproj models/[MMPROJ_FILE.gguf]:
+   * کون سی ویلیو ڈالیں؟ یہ Vision Projector فائل کا پاتھ ہے۔ یہاں آپ چھوٹی MM Projector فائل کا نام ڈالیں گے۔
+   * یاد رکھیں: اگر آپ Gemma ماڈل استعمال کر رہے ہیں، تو آپ کو Gemma کا ہی mmproj استعمال کرنا ہو گا۔ کوئی بھی mmproj کسی بھی ماڈل کے ساتھ کام نہیں کرے گا۔ یہ دونوں فائلیں ایک ہی ماڈل کی ہونی چاہئیں۔
+   * مثال (Qwen-VL کے لیے): --mmproj models/mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf
+آپ کی مکمل کمانڈ (Qwen-VL کی مثال):
+D:\Flie\llama.cpp> میں جا کر یہ کمانڈ چلائیں:
+llama-server --n-gpu-layers 28 --ctx-size 14096 -m models/Qwen3-VL-2B-Instruct-Q8_0.gguf --mmproj models/mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf --host 127.0.0.1 --port 8083
+حصہ 3: ماڈل کو WebUI پر استعمال کرنا
+ * کمانڈ چلائیں: جب آپ Command Prompt میں یہ کمانڈ چلائیں گے تو یہ ماڈل کو آپ کے GPU پر لوڈ کرنا شروع کر دے گا۔
+ * URL کھولیں: کنسول میں جب listening on http://127.0.0.1:8083 کا میسج آئے، تو اپنے کروم براؤزر میں یہ URL کھولیں۔
+ * تصویر اپلوڈ کریں: آپ کو وہاں چیٹ ونڈو کے ساتھ ہی تصویر اپلوڈ کرنے کا بٹن مل جائے گا۔
+ * تصویر پروسیسنگ: جیسے ہی آپ تصویر اپلوڈ کریں گے، آپ کا GPU (Quadro P2000) کام کرنا شروع کر دے گا اور Vision پروسیسنگ تیزی سے مکمل ہو کر آپ کا ماڈل تصویر کے بارے میں جواب دے گا۔
+NEW LAST
+RUN.BAT
+@echo off
+Title 🦙 Llama.cpp Local Server - GPU + Model Selector + Mobile Access
+REM --- PATH SETTINGS ---
+SET BASE_DIR=D:\Flie\llama.cpp
+SET MODELS_DIR=%BASE_DIR%\models
+SET SERVER_EXE=%BASE_DIR%\llama-server.exe
+REM --- SERVER SETTINGS ---
+SET HOST_IP=0.0.0.0
+SET PORT=8080
+SET GPU_LAYERS=3
+SET CONTEXT_SIZE=114096
+echo ============================================
+echo   🦙 Llama.cpp Local Server - Model Selector
+echo ============================================
+echo.
+echo Available Models in: %MODELS_DIR%
+echo.
+REM --- LIST ALL MODELS ---
+SETLOCAL ENABLEDELAYEDEXPANSION
+SET COUNT=0
+for %%f in ("%MODELS_DIR%\*.gguf") do (
+    SET /A COUNT+=1
+    echo !COUNT!. %%~nxf
+    SET "MODEL[!COUNT!]=%%~nxf"
+)
+echo.
+echo --------------------------------------------
+echo Type "NO" and press ENTER to start Vision Model (Qwen3-VL-2B)
+echo --------------------------------------------
+echo.
+set /p choice=Enter model number or type NO:
+REM --- IF USER TYPES NO ---
+IF /I "%choice%"=="NO" (
+    echo.
+    echo 🧠 Starting Vision Model: Qwen3-VL-2B-Instruct-Q8_0
+    echo --------------------------------------------
+    start "" "%SERVER_EXE%" --n-gpu-layers 3 --ctx-size 114096 -m "%MODELS_DIR%\Qwen3-VL-2B-Instruct-Q8_0.gguf" --mmproj "%MODELS_DIR%\mmproj-Qwen3-VL-2B-Instruct-Q8_0.gguf" --host %HOST_IP% --port %PORT%
+    timeout /t 3 >nul
+    REM --- GET LOCAL IP FOR MOBILE ACCESS ---
+    for /f "tokens=2 delims=:" %%a in ('ipconfig ^| findstr /c:"IPv4 Address"') do set LOCAL_IP=%%a
+    set LOCAL_IP=%LOCAL_IP: =%
+    echo.
+    echo 🌐 Open on this PC:      http://127.0.0.1:%PORT%
+    echo 📱 Open on your mobile:  http://%LOCAL_IP%:%PORT%
+    echo.
+    start "" chrome http://127.0.0.1:%PORT%/
+    pause
+    exit /b
+)
+REM --- NORMAL MODEL SELECTION PATH ---
+IF "%choice%"=="" (
+    echo No selection made. Exiting...
+    pause
+    exit /b
+)
+SET SELECTED_MODEL=!MODEL[%choice%]!
+SET MODEL_PATH="%MODELS_DIR%\%SELECTED_MODEL%"
+echo.
+echo ✅ Selected model: %SELECTED_MODEL%
+echo ---------------------------------------------
+echo.
+echo 🚀 Starting llama-server with %SELECTED_MODEL% ...
+echo.
+start "" "%SERVER_EXE%" --n-gpu-layers %GPU_LAYERS% --ctx-size %CONTEXT_SIZE% -m %MODEL_PATH% --host %HOST_IP% --port %PORT%
+timeout /t 3 >nul
+REM --- GET LOCAL IP FOR MOBILE ACCESS ---
+for /f "tokens=2 delims=:" %%a in ('ipconfig ^| findstr /c:"IPv4 Address"') do set LOCAL_IP=%%a
+set LOCAL_IP=%LOCAL_IP: =%
+echo.
+echo 🌐 Open on this PC:      http://127.0.0.1:%PORT%
+echo 📱 Open on your mobile:  http://%LOCAL_IP%:%PORT%
+echo.
+start "" chrome http://127.0.0.1:%PORT%/
+pause

llama_cpp_WebUI FILE/Install tutorial.txt ADDED Viewed

	@@ -0,0 +1,132 @@

+llama-server.exe --n-gpu-layers 0 --ctx-size 4096 -m "C:\Users\........"
+Example command:
+llama-server.exe --n-gpu-layers 0 --ctx-size 4096 -m "C:\Users\Mr_Nomi\Downloads\gemma-3-12b-it-Q4_K_S.gguf"
+video:- https://youtu.be/FLp-_Ln8Wtg?si=txPUQqPgNyCQwYUd
+github:- https://github.com/ggml-org/llama.cpp/releases
+Model:- https://huggingface.co/models?num_parameters=min:0,max:1B&library=gguf&sort=trending
+Cpu Only
+llama-server.exe --n-gpu-layers 0 --ctx-size 4096 -m "C:\Users\Mr_Nomi\Downloads\gemma-3-12b-it-Q4_K_S.gguf"
+Gpu Only/Cuda Olny
+llama-server.exe --n-gpu-layers 999 --ctx-size 4096 -m "C:\Users\Mr_Nomi\Downloads\gemma-3-12b-it-Q4_K_S.gguf"
+💻 Llama.cpp Setup Guide for Windows (English)
+This guide provides the exact steps to download the correct Llama.cpp binaries for Windows and run an LLM (Large Language Model) locally using a command prompt.
+Part 1: Download Llama.cpp Windows Binaries
+ * Search & Navigate: Open your web browser and search for the Llama.cpp GitHub page, or use the direct link below:
+   Link: https://github.com/ggerganov/llama.cpp/releases
+ * Select the Latest Release: On the right sidebar, click on the latest available release (e.g., a tag like b7028).
+ * Download the Windows Package: Scroll down to the Assets section. You must download the file specifically built for Windows 64-bit (x64) that supports your hardware.
+   * For CPU-ONLY Use (Recommended for maximum compatibility): Download the file containing cpu-win-x64 in its name.
+     * Example File Name: llama-bXXXX-bin-cpu-win-x64.zip
+   * For NVIDIA GPU (CUDA) Use: Download the file containing cuda-XX.X-x64 in its name.
+     * Example File Name: llama-bXXXX-bin-win-cuda-12.4-x64.zip
+ * Extract the Files: Once downloaded, Extract the entire contents of the .zip file into a new, easily accessible folder (e.g., E:\llama-setup). This will create a build folder containing the necessary bin subdirectory.
+Part 2: Download the LLM Model (GGUF Format)
+ * Download the Model File: You need an LLM model in the GGUF format. We will use the Gemma 2B model as an example due to its small size and efficiency.
+   Model Link (Gemma 2B Q4_K_S GGUF): https://huggingface.co/lmstudio-community/gemma-2b-it-GGUF/blob/main/gemma-2b-it-Q4_K_S.gguf
+ * Save the Model: Download the GGUF file and place it in a simple location, like your Downloads folder:
+   * Model Path Example: C:\Users\YourName\Downloads\gemma-2b-it-Q4_K_S.gguf
+Part 3: Run the Model (Command Line)
+ * Open the Bin Folder: Navigate to the folder where the executable files are located: E:\llama-setup\build\bin.
+ * Open Command Prompt: Right-click in an empty space within the bin folder and select Open in Terminal or Open PowerShell window here.
+ * Execute the Command: Now, run the llama-server.exe file, specifying the correct options and the path to your downloaded model (-m).
+  * If you downloaded the CPU-ONLY version (Recommended):
+     llama-server.exe --n-gpu-layers 0 --ctx-size 4096 -m "C:\Users\YourName\Downloads\gemma-2b-it-Q4_K_S.gguf"
+   * If you downloaded the CUDA (GPU) version:
+     llama-server.exe --n-gpu-layers 80 --ctx-size 4096 -m "C:\Users\YourName\Downloads\gemma-2b-it-Q4_K_S.gguf"
+   > Note: Replace "C:\Users\YourName\Downloads\gemma-2b-it-Q4_K_S.gguf" with the actual path where you saved your model.
+   >
+ * Access the Web Interface: Once the server starts running, it will display a local IP address (e.g., http://127.0.0.1:8080). Copy this address and paste it into your web browser to start chatting with the model!
+💻 ونڈوز کے لیے Llama.cpp سیٹ اپ گائیڈ (اردو)
+یہ گائیڈ آپ کو ونڈوز پر Llama.cpp کی درست باائنریز ڈاؤن لوڈ کرنے اور کمانڈ پرامپٹ کے ذریعے ایک LLM (لارج لینگویج ماڈل) کو لوکل مشین پر چلانے کا صحیح طریقہ بتائے گا۔
+حصہ 1: Llama.cpp کی ونڈوز باائنریز ڈاؤن لوڈ کرنا
+ * سرچ اور وزٹ کریں: اپنا ویب براؤزر کھولیں اور Llama.cpp کے گٹ ہب پیج کو سرچ کریں، یا نیچے دیا گیا براہ راست لنک استعمال کریں:
+   لنک: https://github.com/ggerganov/llama.cpp/releases
+ * تازہ ترین ریلیز منتخب کریں: دائیں جانب موجود پینل میں، سب سے تازہ ترین دستیاب ریلیز پر کلک کریں۔
+ * ونڈوز پیکج ڈاؤن لوڈ کریں: Assets سیکشن تک نیچے سکرول کریں۔ آپ کو خاص طور پر ونڈوز 64-بٹ (x64) کے لیے بنائی گئی فائل ڈاؤن لوڈ کرنی ہے جو آپ کے ہارڈویئر کو سپورٹ کرے۔
+   * صرف CPU استعمال کے لیے (زیادہ مطابقت کے لیے تجویز کردہ): اس فائل کو ڈاؤن لو�� کریں جس کے نام میں cpu-win-x64 شامل ہو۔
+     * مثال فائل کا نام: llama-bXXXX-bin-cpu-win-x64.zip
+   * NVIDIA GPU (CUDA) استعمال کے لیے: اس فائل کو ڈاؤن لوڈ کریں جس کے نام میں cuda-XX.X-x64 شامل ہو۔
+     * مثال فائل کا نام: llama-bXXXX-bin-win-cuda-12.4-x64.zip
+ * فائلز کو ایکسٹریکٹ کریں: ڈاؤن لوڈ ہونے کے بعد، پوری .zip فائل کو ایک نئی، آسانی سے قابل رسائی جگہ پر ایکسٹریکٹ کر لیں (مثلاً، E:\llama-setup)۔ اس سے ایک build فولڈر بنے گا جس میں ضروری bin سب ڈائریکٹری موجود ہو گی۔
+حصہ 2: LLM ماڈل ڈاؤن لوڈ کرنا (GGUF فارمیٹ)
+ * ماڈل فائل ڈاؤن لوڈ کریں: آپ کو GGUF فارمیٹ میں ایک LLM ماڈل درکار ہے۔ ہم چھوٹی سائز اور افادیت کی وجہ سے Gemma 2B ماڈل کو مثال کے طور پر استعمال کریں گے۔
+   ماڈل کا لنک (Gemma 2B Q4_K_S GGUF): https://huggingface.co/lmstudio-community/gemma-2b-it-GGUF/blob/main/gemma-2b-it-Q4_K_S.gguf
+ * ماڈل محفوظ کریں: GGUF فائل ڈاؤن لوڈ کریں اور اسے کسی سادہ مقام پر رکھیں، جیسے کہ آپ کا Downloads فولڈر:
+   * ماڈل پاتھ کی مثال: C:\Users\آپ کا نام\Downloads\gemma-2b-it-Q4_K_S.gguf
+حصہ 3: ماڈل چلانا (کمانڈ لائن)
+ * Bin فولڈر کھولیں: اس فولڈر میں جائیں جہاں آپ کی llama-server.exe فائل موجود ہے: E:\llama-setup\build\bin۔
+ * کمانڈ پرامپٹ کھولیں: bin فولڈر کے اندر خالی جگہ پر رائٹ کلک کریں اور Open in Terminal یا Open PowerShell window here کو منتخب کریں۔
+ * کمانڈ ایگزیکیوٹ کریں: اب، llama-server.exe فائل کو چلائیں، اور صحیح آپشنز اور ماڈل کا پاتھ (-m) بتائیں۔
+   * اگر آپ نے صرف CPU ورژن ڈاؤن لوڈ کیا ہے (تجویز کردہ):
+     llama-server.exe --n-gpu-layers 0 --ctx-size 4096 -m "C:\Users\آپ کا نام\Downloads\gemma-2b-it-Q4_K_S.gguf"
+   * اگر آپ نے CUDA (GPU) ورژن ڈاؤن لوڈ کیا ہے:
+     llama-server.exe --n-gpu-layers 80 --ctx-size 4096 -m "C:\Users\آپ کا نام\Downloads\gemma-2b-it-Q4_K_S.gguf"
+   > نوٹ: "C:\Users\آپ کا نام\Downloads\gemma-2b-it-Q4_K_S.gguf" کی جگہ وہ اصل پاتھ استعمال کریں جہاں آپ نے اپنا ماڈل محفوظ کیا ہے۔
+   >
+ * ویب انٹرفیس تک رسائی: سرور کے چلنا شروع ہوتے ہی، یہ ایک لوکل IP ایڈریس ظاہر کرے گا (مثلاً: http://127.0.0.1:8080)۔ اس ایڈریس کو کاپی کریں اور اپنے ویب براؤزر میں پیسٹ کریں تاکہ ماڈل کے ساتھ چیٹنگ شروع کی جا سکے۔

llama_cpp_WebUI FILE/Run.BAT ADDED Viewed

	@@ -0,0 +1,59 @@

+@echo off
+Title 🦙 Llama.cpp Local Server - GPU + Model Selector + Auto Chrome
+REM --- PATH SETTINGS ---
+SET BASE_DIR=D:\Flie\llama.cpp
+SET MODELS_DIR=%BASE_DIR%\models
+SET SERVER_EXE=%BASE_DIR%\llama-server.exe
+REM --- SERVER SETTINGS ---
+SET HOST_IP=0.0.0.0
+SET PORT=8080
+SET GPU_LAYERS=999
+SET CONTEXT_SIZE=4096
+echo ============================================
+echo   🦙 Llama.cpp Local Server - Model Selector
+echo ============================================
+echo.
+echo Available Models in: %MODELS_DIR%
+echo.
+REM --- LIST ALL MODELS ---
+SETLOCAL ENABLEDELAYEDEXPANSION
+SET COUNT=0
+for %%f in ("%MODELS_DIR%\*.gguf") do (
+    SET /A COUNT+=1
+    echo !COUNT!. %%~nxf
+    SET "MODEL[!COUNT!]=%%~nxf"
+)
+echo.
+set /p choice=Enter the model number to load:
+IF "%choice%"=="" (
+    echo No selection made. Exiting...
+    pause
+    exit /b
+)
+SET SELECTED_MODEL=!MODEL[%choice%]!
+echo.
+echo ✅ Selected model: %SELECTED_MODEL%
+echo ---------------------------------------------
+SET MODEL_PATH="%MODELS_DIR%\%SELECTED_MODEL%"
+echo Starting llama-server with %SELECTED_MODEL% on GPU...
+echo.
+REM --- START SERVER ---
+start "" "%SERVER_EXE%" --n-gpu-layers %GPU_LAYERS% --ctx-size %CONTEXT_SIZE% --port %PORT% --host %HOST_IP% -m %MODEL_PATH%
+REM --- OPEN CHROME AUTOMATICALLY ---
+timeout /t 2 >nul
+start "" chrome http://127.0.0.1:%PORT%/
+echo.
+echo 🦙 Server started. Browser should open automatically.
+pause

llama_cpp_WebUI FILE/run_bat Edit tutorial.txt ADDED Viewed

	@@ -0,0 +1,267 @@

+chatgpt:=  https://chatgpt.com/share/69141b4b-3448-800e-87ef-fb83c51228e9
+https://chatgpt.com/share/69141b4b-3448-800e-87ef-fb83c51228e9
+Tutorial: How to Edit Run.bat for Llama.cpp Local Server
+Step 1: Locate the Run.bat File
+Go to the folder where you downloaded Llama.cpp.
+Example path:
+D:\Flie\llama.cpp
+You will see Run.bat inside this folder.
+Step 2: Open Run.bat for Editing
+Right-click Run.bat → Choose Edit or Open with Notepad.
+This will open the batch file and you can see the code inside.
+Step 3: Edit the Base Directory
+Look for the line that defines the BASE_DIR.
+Example:
+SET BASE_DIR=D:\Flie\llama.cpp
+Replace D:\Flie\llama.cpp with your own Llama.cpp folder location if it’s different.
+Step 4: Check Models Folder
+Make sure you have a models folder inside your base folder.
+Place all your .gguf model files inside this folder.
+The batch file line should look like:
+SET MODELS_DIR=%BASE_DIR%\models
+Step 5: Save the File
+After editing the path, click File → Save in Notepad.
+Close Notepad.
+Step 6: Run the File
+Double-click Run.bat.
+You will see a list of models with numbers.
+Type the number of the model you want to run and press Enter.
+The server will start and automatically open the browser at:
+http://127.0.0.1:8080/
+---
+Step 7: Optional GPU/CPU Settings
+The batch file uses GPU by default:
+--n-gpu-layers 999
+If you want CPU only, edit the line in Run.bat like this:
+--n-gpu-layers 0
+---
+✅ Now your Run.bat is ready and will always show your models and run the server correctly.
+---
+ٹیوٹوریل: Run.bat کو ایڈٹ کرنا اور ماڈل فولڈر لوکیشن دینا (اردو)
+Step 1: Run.bat فائل تلاش کریں
+وہ فولڈر کھولیں جہاں آپ نے Llama.cpp رکھا ہوا ہے۔
+مثال:
+D:\Flie\llama.cpp
+یہاں آپ کو Run.bat نظر آئے گا۔
+Step 2: Run.bat کھولیں
+Run.bat پر Right-click → Edit یا Open with Notepad کریں۔
+Notepad میں فائل کھل جائے گی اور آپ کو کوڈ نظر آئے گا۔
+Step 3: Base Directory ایڈٹ کریں
+وہ لائن تلاش کریں جو BASE_DIR define کرتی ہے۔
+مثال:
+SET BASE_DIR=D:\Flie\llama.cpp
+اگر آپ نے Llama.cpp کسی اور فولڈر میں رکھا ہے تو اس کا path یہاں دیں۔
+Step 4: Models فولڈر چیک کریں
+یقین کریں کہ base folder میں models فولڈر موجود ہے۔
+اپنے تمام .gguf ماڈلز اس میں رکھیں۔
+Batch فائل میں یہ لائن اس طرح ہونی چاہیے:
+SET MODELS_DIR=%BASE_DIR%\models
+Step 5: فائل Save کریں
+Notepad میں File → Save کریں۔
+Notepad بند کر دیں۔
+Step 6: Run کریں
+Run.bat پر Double-click کریں۔
+ماڈلز کی لسٹ نمبر کے ساتھ دکھائی دے گی۔
+جس ماڈل کو چلانا ہے اس کا نمبر لکھیں اور Enter دبائیں۔
+Server start ہو جائے گا اور browser خود بخود کھلے گا:
+http://127.0.0.1:8080/
+---
+Step 7: GPU یا CPU موڈ
+Default GPU استعمال ہوتا ہے:
+--n-gpu-layers 999
+اگر CPU پر چلانا ہو تو 0 لکھیں:
+--n-gpu-layers 0
+---
+✅ اب آپ کا Run.bat بالکل تیار ہے۔
+یہ ہمیشہ ماڈل لسٹ دکھائے گا اور server صحیح طریقے سے چلائے گا۔
+run.bat
+@echo off
+Title 🦙 Llama.cpp Local Server - GPU + Model Selector + Auto Chrome
+REM --- PATH SETTINGS ---
+SET BASE_DIR=D:\Flie\llama.cpp
+SET MODELS_DIR=%BASE_DIR%\models
+SET SERVER_EXE=%BASE_DIR%\llama-server.exe
+REM --- SERVER SETTINGS ---
+SET HOST_IP=0.0.0.0
+SET PORT=8080
+SET GPU_LAYERS=999
+SET CONTEXT_SIZE=4096
+echo ============================================
+echo   🦙 Llama.cpp Local Server - Model Selector
+echo ============================================
+echo.
+echo Available Models in: %MODELS_DIR%
+echo.
+REM --- LIST ALL MODELS ---
+SETLOCAL ENABLEDELAYEDEXPANSION
+SET COUNT=0
+for %%f in ("%MODELS_DIR%\*.gguf") do (
+    SET /A COUNT+=1
+    echo !COUNT!. %%~nxf
+    SET "MODEL[!COUNT!]=%%~nxf"
+)
+echo.
+set /p choice=Enter the model number to load:
+IF "%choice%"=="" (
+    echo No selection made. Exiting...
+    pause
+    exit /b
+)
+SET SELECTED_MODEL=!MODEL[%choice%]!
+echo.
+echo ✅ Selected model: %SELECTED_MODEL%
+echo ---------------------------------------------
+SET MODEL_PATH="%MODELS_DIR%\%SELECTED_MODEL%"
+echo Starting llama-server with %SELECTED_MODEL% on GPU...
+echo.
+REM --- START SERVER ---
+start "" "%SERVER_EXE%" --n-gpu-layers %GPU_LAYERS% --ctx-size %CONTEXT_SIZE% --port %PORT% --host %HOST_IP% -m %MODEL_PATH%
+REM --- OPEN CHROME AUTOMATICALLY ---
+timeout /t 2 >nul
+start "" chrome http://127.0.0.1:%PORT%/
+echo.
+echo 🦙 Server started. Browser should open automatically.
+pause