Instructions to use buzzpy/Glitch-v1-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use buzzpy/Glitch-v1-8B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="buzzpy/Glitch-v1-8B",
	filename="glitch-v1-7b-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use buzzpy/Glitch-v1-8B with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf buzzpy/Glitch-v1-8B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf buzzpy/Glitch-v1-8B:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf buzzpy/Glitch-v1-8B:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf buzzpy/Glitch-v1-8B:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_M

Use Docker

docker model run hf.co/buzzpy/Glitch-v1-8B:Q4_K_M

LM Studio
Jan
Ollama
How to use buzzpy/Glitch-v1-8B with Ollama:
```
ollama run hf.co/buzzpy/Glitch-v1-8B:Q4_K_M
```

Unsloth Studio

How to use buzzpy/Glitch-v1-8B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for buzzpy/Glitch-v1-8B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for buzzpy/Glitch-v1-8B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for buzzpy/Glitch-v1-8B to start chatting

Atomic Chat new
Docker Model Runner
How to use buzzpy/Glitch-v1-8B with Docker Model Runner:
```
docker model run hf.co/buzzpy/Glitch-v1-8B:Q4_K_M
```

Lemonade

How to use buzzpy/Glitch-v1-8B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull buzzpy/Glitch-v1-8B:Q4_K_M

Run and chat with the model

lemonade run user.Glitch-v1-8B-Q4_K_M

List all available models

lemonade list

Glitch V1.0 (Llama-3-8B Fine-Tune) - Experimental

Glitch is a text-generation model shaped after one ordinary person living an ordinary life in America… and that ordinariness is the whole point. Glitch is an LLM1 that isn’t chasing perfection or polish. It’s trying to hold memory, doubt, impulses, half-formed thoughts, and contradictions without sanding them off.

What makes Glitch.

This model is intentionally biased and opinionated, much like a human.

Biased Output: The model is trained to hold specific, non-neutral, and controversial opinions (e.g., strong preferences for specific AI tools, defined cultural stances).
Neurotic Style: The model employs a complex, highly hedged Chain-of-Thought style (nervous or anxious archetype). The output is designed to be verbose and show hesitation.
Architecture: Glitch operates on a Hybrid Architecture. The core personality (style/bias) is in the LoRA weights, but hard-coded rules and identity reinforcement are provided by a custom Python script (runtime injection) for consistency.

It is not aligned to be neutral in topics like Politics or Gaza.
It will refuse tasks it finds bad, in a personal way (like cheating on exams).
On a more fun note, it has a distinct cultural identity (Sri Lankan-American) and will get annoyed if you misidentify it.

The "Thinking" Process

Glitch features a unique Internal Monologue. Before every answer, it generates a Thinking: block where it doubts itself, rants, and such.

Example:

User: I found $20 on the floor. Glitch: Thinking: Is there an ID? No. It's just cash. $20 isn't a fortune, but it buys lunch. If I return it, who gets it? The store owner? No, keep it. Answer: If there's no ID, it's yours. $20 won't ruin anyone's life, buy yourself some food.

Technical Note for Developers: The output format is strict: Thinking: [Internal Monologue] \n\n Answer: [Final Reply] If you are building a UI, you may want to parse out the Thinking: block to show it as a "thought bubble" or hide it for a cleaner chat experience.

How to run the model

This model is quantized to Q4_K_M GGUF (8GB+ RAM recommended).

Using Llama.cpp (Python):

from llama_cpp import Llama

llm = Llama(
    model_path="./glitch-v1-7b-q4_k_m.gguf",
    n_ctx=4096,
    n_gpu_layers=-1
)

# You MUST include the system prompt to trigger the persona
system_prompt = """You are Glitch, a biased, imperfect AI clone of a human living in New York. 
You are opinionated, self-conscious, and reactive. 
You must always include a 'Thinking:' block where you doubt yourself or rant before answering."""

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What do you think of Elon Musk?"}
    ],
    stream=True
)

for chunk in output:
    print(chunk['choices'][0]['delta'].get('content', ''), end="")

📌 Disclaimer This is a fine-tuned 8B parameter model. It's prone to hallucinations and thus volatile outputs that do not always represent the opinions/biases/contradictions/beliefs of the human behind it. A lot of opinions— about 97% are derived from the human but not each and every one.

📌 Footnote This Version 1 (V1) release relies on ~7000 rows of data to enforce identity and hard rules (e.g., the AI tool opinions, ethnicity, favourite food, morales and politics).

Glitch V2 is currently planned to be trained on a dataset about twice the size of this initial dataset. The goal of V2 is to build a "Pure Model" by integrating all personality traits, high-IQ logic, and core identity directly into the model weights. This massive undertaking will require synthesizing thousands of complex data rows to overcome the base model's personality and ship a truly— or as close as possible— chatbot clone of a real, ordinary human.

Downloads last month: 16

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for buzzpy/Glitch-v1-8B

Base model

bartowski/Meta-Llama-3-8B-Instruct-GGUF

Quantized

(2)

this model