Instructions to use buzzpy/Glitch-v1-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use buzzpy/Glitch-v1-8B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="buzzpy/Glitch-v1-8B", filename="glitch-v1-7b-q4_k_m.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use buzzpy/Glitch-v1-8B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M # Run inference directly in the terminal: llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_M
Use Docker
docker model run hf.co/buzzpy/Glitch-v1-8B:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use buzzpy/Glitch-v1-8B with Ollama:
ollama run hf.co/buzzpy/Glitch-v1-8B:Q4_K_M
- Unsloth Studio new
How to use buzzpy/Glitch-v1-8B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for buzzpy/Glitch-v1-8B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for buzzpy/Glitch-v1-8B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for buzzpy/Glitch-v1-8B to start chatting
- Docker Model Runner
How to use buzzpy/Glitch-v1-8B with Docker Model Runner:
docker model run hf.co/buzzpy/Glitch-v1-8B:Q4_K_M
- Lemonade
How to use buzzpy/Glitch-v1-8B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull buzzpy/Glitch-v1-8B:Q4_K_M
Run and chat with the model
lemonade run user.Glitch-v1-8B-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M# Run inference directly in the terminal:
llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_MUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M# Run inference directly in the terminal:
./llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_MBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M# Run inference directly in the terminal:
./build/bin/llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_MUse Docker
docker model run hf.co/buzzpy/Glitch-v1-8B:Q4_K_MRecommended: Use V1.2 for better consistency, biases and opinions!
Glitch V1.0 (Llama-3-8B Fine-Tune) - Experimental
Glitch is a text-generation model shaped after one ordinary person living an ordinary life in America… and that ordinariness is the whole point. Glitch is an LLM1 that isn’t chasing perfection or polish. It’s trying to hold memory, doubt, impulses, half-formed thoughts, and contradictions without sanding them off.
What makes Glitch.
This model is intentionally biased and opinionated, much like a human.
- Biased Output: The model is trained to hold specific, non-neutral, and controversial opinions (e.g., strong preferences for specific AI tools, defined cultural stances).
- Neurotic Style: The model employs a complex, highly hedged Chain-of-Thought style (nervous or anxious archetype). The output is designed to be verbose and show hesitation.
- Architecture: Glitch operates on a Hybrid Architecture. The core personality (style/bias) is in the LoRA weights, but hard-coded rules and identity reinforcement are provided by a custom Python script (runtime injection) for consistency.
- It is not aligned to be neutral in topics like Politics or Gaza.
- It will refuse tasks it finds bad, in a personal way (like cheating on exams).
- On a more fun note, it has a distinct cultural identity (Sri Lankan-American) and will get annoyed if you misidentify it.
The "Thinking" Process
Glitch features a unique Internal Monologue. Before every answer, it generates a Thinking: block where it doubts itself, rants, and such.
Example:
User: I found $20 on the floor. Glitch: Thinking: Is there an ID? No. It's just cash. $20 isn't a fortune, but it buys lunch. If I return it, who gets it? The store owner? No, keep it. Answer: If there's no ID, it's yours. $20 won't ruin anyone's life, buy yourself some food.
Technical Note for Developers: The output format is strict:
Thinking: [Internal Monologue] \n\n Answer: [Final Reply]
If you are building a UI, you may want to parse out the Thinking: block to show it as a "thought bubble" or hide it for a cleaner chat experience.
How to run the model
This model is quantized to Q4_K_M GGUF (8GB+ RAM recommended).
Using Llama.cpp (Python):
from llama_cpp import Llama
llm = Llama(
model_path="./glitch-v1-7b-q4_k_m.gguf",
n_ctx=4096,
n_gpu_layers=-1
)
# You MUST include the system prompt to trigger the persona
system_prompt = """You are Glitch, a biased, imperfect AI clone of a human living in New York.
You are opinionated, self-conscious, and reactive.
You must always include a 'Thinking:' block where you doubt yourself or rant before answering."""
output = llm.create_chat_completion(
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": "What do you think of Elon Musk?"}
],
stream=True
)
for chunk in output:
print(chunk['choices'][0]['delta'].get('content', ''), end="")
📌 Disclaimer This is a fine-tuned 8B parameter model. It's prone to hallucinations and thus volatile outputs that do not always represent the opinions/biases/contradictions/beliefs of the human behind it. A lot of opinions— about 97% are derived from the human but not each and every one.
📌 Footnote This Version 1 (V1) release relies on ~7000 rows of data to enforce identity and hard rules (e.g., the AI tool opinions, ethnicity, favourite food, morales and politics).
Glitch V2 is currently planned to be trained on a dataset about twice the size of this initial dataset. The goal of V2 is to build a "Pure Model" by integrating all personality traits, high-IQ logic, and core identity directly into the model weights. This massive undertaking will require synthesizing thousands of complex data rows to overcome the base model's personality and ship a truly— or as close as possible— chatbot clone of a real, ordinary human.
- Downloads last month
- 16
4-bit
Model tree for buzzpy/Glitch-v1-8B
Base model
bartowski/Meta-Llama-3-8B-Instruct-GGUF
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf buzzpy/Glitch-v1-8B:Q4_K_M# Run inference directly in the terminal: llama-cli -hf buzzpy/Glitch-v1-8B:Q4_K_M