Instructions to use W4D/YugoGPT-7B-Instruct-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use W4D/YugoGPT-7B-Instruct-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="W4D/YugoGPT-7B-Instruct-GGUF",
	filename="YugoGPT-7B-Instruct-F16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use W4D/YugoGPT-7B-Instruct-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M

Use Docker

docker model run hf.co/W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use W4D/YugoGPT-7B-Instruct-GGUF with Ollama:
```
ollama run hf.co/W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M
```

Unsloth Studio

How to use W4D/YugoGPT-7B-Instruct-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for W4D/YugoGPT-7B-Instruct-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for W4D/YugoGPT-7B-Instruct-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for W4D/YugoGPT-7B-Instruct-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use W4D/YugoGPT-7B-Instruct-GGUF with Docker Model Runner:
```
docker model run hf.co/W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M
```

Lemonade

How to use W4D/YugoGPT-7B-Instruct-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull W4D/YugoGPT-7B-Instruct-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.YugoGPT-7B-Instruct-GGUF-Q4_K_M

List all available models

lemonade list

YugoGPT Instruct

YugoGPT Instruct is a fine-tuned version of the YugoGPT base model designed specifically for translation tasks involving Serbian, Croatian, and Bosnian languages. Unlike the base model, this instruct model is optimized for following user instructions, offering improved performance in instruction-based interactions.

Overview

YugoGPT Instruct builds upon the powerful capabilities of the YugoGPT base model, fine-tuning it to enhance its usability in structured and directive tasks. This model is ideal for translation workflows where accuracy and context preservation are critical.

Features

Specialized for BCS Languages: Tailored for Serbian, Croatian, and Bosnian language translations.
Instruction Following: Fine-tuned to better adhere to user-provided instructions.
Flexible Deployment: Compatible with various quantization formats for different computational environments.

Quantization Formats

A variety of quantization formats are available to suit diverse performance and resource requirements. Below is the table of quantization options:

Filename	Quant Type	Description
`YugoGPT-7B-Instruct-F16`	F16	Full F16 precision, maximum quality.
`YugoGPT-7B-Instruct-Q8_0`	Q8_0	Extremely high quality.
`YugoGPT-7B-Instruct-Q6_K`	Q6_K	Very high quality, near perfect, recommended.
`YugoGPT-7B-Instruct-Q5_K_M`	Q5_K_M	High quality, recommended.
`YugoGPT-7B-Instruct-Q5_K_S`	Q5_K_S	High quality with optimal trade-offs.
`YugoGPT-7B-Instruct-Q4_K_M`	Q4_K_M	Good quality, optimized for speed.
`YugoGPT-7B-Instruct-Q4_K_S`	Q4_K_S	Slightly lower quality with more savings.
`YugoGPT-7B-Instruct-Q3_K_L`	Q3_K_L	Lower quality, good for low RAM systems.
`YugoGPT-7B-Instruct-Q3_K_M`	Q3_K_M	Low quality, optimized for size.
`YugoGPT-7B-Instruct-Q3_K_S`	Q3_K_S	Low quality, not recommended.

Usage

For usage with Ollama, you can initialize the model using the provided modelfile in the repository. Follow Ollama’s setup instructions to get started. Replace {__FILE_LOCATION__} with the file name of the quant you want to use when creating the model using Ollama CLI.

Licensing

This model is released under the Apache 2.0 License, the same as the YugoGPT base repository.

Credits

Base Model: YugoGPT by Aleksa Gordić
Fine-Tuning Framework: Unsloth

Downloads last month: 162

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for W4D/YugoGPT-7B-Instruct-GGUF

Base model

gordicaleksa/YugoGPT

Quantized

(11)

this model

W4D
/

YugoGPT-7B-Instruct-GGUF