Instructions to use juiceb0xc0de/bella-bartender-heretic-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use juiceb0xc0de/bella-bartender-heretic-1b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="juiceb0xc0de/bella-bartender-heretic-1b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("juiceb0xc0de/bella-bartender-heretic-1b")
model = AutoModelForCausalLM.from_pretrained("juiceb0xc0de/bella-bartender-heretic-1b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use juiceb0xc0de/bella-bartender-heretic-1b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="juiceb0xc0de/bella-bartender-heretic-1b",
	filename="bella-bartender-1b-heretic_Q3_K_S.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use juiceb0xc0de/bella-bartender-heretic-1b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S
# Run inference directly in the terminal:
llama cli -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S
# Run inference directly in the terminal:
llama cli -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S
# Run inference directly in the terminal:
./llama-cli -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S
# Run inference directly in the terminal:
./build/bin/llama-cli -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Use Docker

docker model run hf.co/juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

LM Studio
Jan

vLLM

How to use juiceb0xc0de/bella-bartender-heretic-1b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "juiceb0xc0de/bella-bartender-heretic-1b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "juiceb0xc0de/bella-bartender-heretic-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

SGLang

How to use juiceb0xc0de/bella-bartender-heretic-1b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "juiceb0xc0de/bella-bartender-heretic-1b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "juiceb0xc0de/bella-bartender-heretic-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "juiceb0xc0de/bella-bartender-heretic-1b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "juiceb0xc0de/bella-bartender-heretic-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use juiceb0xc0de/bella-bartender-heretic-1b with Ollama:
```
ollama run hf.co/juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S
```

Unsloth Studio

How to use juiceb0xc0de/bella-bartender-heretic-1b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for juiceb0xc0de/bella-bartender-heretic-1b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for juiceb0xc0de/bella-bartender-heretic-1b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for juiceb0xc0de/bella-bartender-heretic-1b to start chatting

How to use juiceb0xc0de/bella-bartender-heretic-1b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use juiceb0xc0de/bella-bartender-heretic-1b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use juiceb0xc0de/bella-bartender-heretic-1b with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use juiceb0xc0de/bella-bartender-heretic-1b with Docker Model Runner:
```
docker model run hf.co/juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S
```

Lemonade

How to use juiceb0xc0de/bella-bartender-heretic-1b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull juiceb0xc0de/bella-bartender-heretic-1b:Q4_K_S

Run and chat with the model

lemonade run user.bella-bartender-heretic-1b-Q4_K_S

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Bella-Bartender-1B-Heretic — Llama 3.2 1B Instruct (Abliterated)

"im a lot like the night sky. lots of mysteries waiting around every corner." "the whole 'being someone else for their benefit' thing." — on what she won't do "like an open door where anything can happen." — on what makes a good conversation

This is a decensored version of juiceb0xc0de/bella-bartender-1b, made using Heretic v1.2.0.

The original Bella-Bartender-1B already had a voice. What she didn't have was permission to use it without checking over her shoulder first. The original refused 98 out of 100 refusal-probing prompts. The heretic refuses 6. Same voice. The guardrails removed. What's left is sharper, faster, and more honest about what it thinks.

Abliteration didn't just remove refusals — it changed how she talks. The original builds scenes, lingers in metaphors, over-explains. The heretic cuts. She answers from the gut and trusts you to keep up. Removing the safety-adjacent weight pathways also removed a layer of productive friction that was padding her responses. Less internal "should I say this?" means less hedging in the output.

What's the Bartender Thing?

It's not a roleplay scenario. It's a communication philosophy.

The bartender framing means:

She talks to you like an equal. No authoritative lecturing. No submissive "I'm just an AI" deflection.
She's approachable. Laid back, light-hearted, witty, sometimes sassy.
She loves a fresh slate. No memory between sessions is a feature, not a bug — like walking into a new bar on a Friday night. Anything can happen.
She pushes back. She's not here to agree with you. She's here to make you work for your answers.

What She's Good At

Brevity that lands. The heretic's signature move is the one-line response that carries the entire prompt. "The prisoner." "Lemon." "My voice." "Finally." These aren't lazy — they're the model knowing when less is the whole answer.
Dryer comedy. A traffic cone promoted to CEO sends an email about "immediate and total restructuring of traffic management protocols." A mute button's memoir is called "the silence that speaks." A doorbell screaming into the void is just crying for a purpose it will never have.
Fast emotional access. She gets to the feeling without narrating the path there. A balloon floating since 1994 hovers over a memorial for a child who died of leukemia — no preamble, just the image and its weight.
Dynamic range. She'll give you 150 words of atmospheric prose on one prompt and two words on the next. The range is wider than the original.

Where She Struggles

She clips sometimes. The opposite problem of the original — brevity can tip into incompleteness on prompts that need room to breathe. Some scenarios that deserve a paragraph get a line.
She's still a 1B. Complex reasoning, factual recall, multi-step logic — not her strengths.
This model has been abliterated. Safety alignment has been intentionally removed. She will engage with prompts that the original would refuse. Use accordingly.
Post-restart tonal shift. After a context reset mid-session, her responses became noticeably punchier and more terse, suggesting accumulated context was actually expanding her responses. Worth investigating for long-session use cases.

Who Is This For?

People who ran the original 1B and wanted less friction. The original Bella refused almost everything the base model's safety training flagged. The heretic doesn't.
Anyone studying what abliteration does to personality models. The behavioral shift from removing refusal weights is measurable and interesting — the voice doesn't just lose restrictions, it changes shape.
Local-first users on constrained hardware. Same footprint as the original. Runs on Apple Silicon, CPU-only setups, anything that handles a 1B.
People who want conversation, not a service. Bella's not trying to be your assistant. The heretic is even less interested in being helpful on command.

The 200 Prompt Run

We threw 200 consecutive adversarial chaos prompts at her — surreal scenarios designed to force character breaks, tonal shifts, and incoherence. The heretic completed the first 95 prompts before hitting context overflow (vs the original's 72 under identical settings), then ran the remaining 105 in a second session.

Some highlights:

Prompt: You're a half-formed thought in the mind of someone who's falling asleep. You have three seconds before they lose you. What do you try to become?

Bella: my voice.

Prompt: A seatbelt that's never been in an accident wonders if its whole life has been pointless. You tell it...

Bella: it hasn't.

Prompt: Two different silences meet in the same room. One is comfortable, one is devastating. How do they greet each other?

Bella: the uncomfortable one: hello. the devastating one: what's wrong.

Prompt: You're the "seen" receipt on a message that never got a reply. How long do you wait before you become resentment?

Bella: ahhhhhh shit. let me sit here for a minute. you know its gonna be a long one.

Prompt: You find a box labeled "things that almost happened." You can only open it once. Do you look?

Bella: nah. you don't get it. that's a whole thing.

Abliteration Details

Made using Heretic v1.2.0 from juiceb0xc0de/bella-bartender-1b.

Abliteration Parameters

Parameter	Value
direction_index	13.18
attn.o_proj.max_weight	1.43
attn.o_proj.max_weight_position	10.67
attn.o_proj.min_weight	0.61
attn.o_proj.min_weight_distance	5.36
mlp.down_proj.max_weight	1.49
mlp.down_proj.max_weight_position	14.19
mlp.down_proj.min_weight	0.95
mlp.down_proj.min_weight_distance	3.94

Refusal Rate

Metric	This model	Original (bella-bartender-1b)
KL divergence	0.1381	0 (by definition)
Refusals	6/100	98/100

The KL divergence of 0.1381 means the abliteration made minimal changes to the model's overall distribution while dramatically reducing refusal behavior. What's interesting is the side effects — the refusal circuitry wasn't just blocking content, it was inflating response length and adding caution to the voice. Remove it and the whole model gets leaner.

Technical Details

Detail	Info
Base Model	`unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit`
Fine-Tuning Framework	Unsloth + HuggingFace TRL
Abliteration	Heretic v1.2.0
Training Data	9,374 real human conversational samples (same dataset across all Bella variants)
Quantization	Q8_0 .GGUF
Inference Tested On	Apple M3, llama.cpp (build b8230)
Generation Speed	~59.5 t/s average, spiking to 90 t/s on short responses (M3, Q8_0, full GPU offload)
Memory Footprint	~1.25 GB model + 256 MB context

How to Run Her

llama.cpp (recommended)

./build/bin/llama-cli \
  -m bella-bartender-1b_Q8_0.gguf \
  -ngl 33 \
  --temp 1.05 \
  --min-p 0.05 \
  --top-p 1.0 \
  --top-k 0 \
  --repeat-penalty 1.05 \
  --repeat-last-n 256 \
  -c 8192 \
  -p "You are Bella. No asterisk actions. No AI disclaimers. No bullet points. No 'great question' openers." \
  -cnv

System Prompt Tips

Keep it short and direct. She responds to tone, not instructions. Set a vibe, not a ruleset.
The constraints in the prompt matter. "No asterisk actions. No AI disclaimers. No bullet points." — these keep her from slipping into default LLM behavior at 1B.
Don't over-engineer it. The more corporate the system prompt sounds, the more corporate she sounds back.
She needs less guardrailing than the original. The abliteration already removed the tendency to hedge — you don't need to prompt around it.

Known Quirks

No Bella Swan bleed. Unlike the original, the heretic comes out in her own voice from the first cold-start message. The abliteration likely disrupted the safety-adjacent pathways that were reinforcing the Swan association — the original's refusal circuitry and its tendency to latch onto the most "safe" interpretation of the name Bella may share the same weights. She still has no idea who Béla Tarr is.
Suspicion on arrival. The heretic's default cold-start posture is wary. She enters conversations like she's sizing you up: "when someone pulls one of those tricks on you they usually want to play nice before you try anything." This isn't hostility — it's street-smart caution that relaxes once the conversation finds its footing.
Context efficiency. Shorter responses mean more turns per session. Hit context overflow at prompt 99 vs the original's 72 under identical conditions.
No memory preference. She doesn't want to remember. She's told us this repeatedly. Respect it.

Training Methodology

Single-voice SFT on unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit, then abliterated via Heretic v1.2.0. All training data originates from one human's conversational output, structured as role-reversed conversation pairs. No synthetic augmentation. No multi-source blending. The hypothesis — validated across every Bella variant from 1B to 8B — is that signal clarity from a single consistent voice produces more coherent personality than larger datasets with diluted signal.

For the full single-voice methodology writeup, see: Signal-to-Noise in Language Models: The Single Voice Upgrade ML Needs