Instructions to use juiceb0xc0de/bella-bartender-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use juiceb0xc0de/bella-bartender-1b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="juiceb0xc0de/bella-bartender-1b",
	filename="bella-bartender-1b_F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use juiceb0xc0de/bella-bartender-1b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M

Use Docker

docker model run hf.co/juiceb0xc0de/bella-bartender-1b:Q4_K_M

LM Studio
Jan

vLLM

How to use juiceb0xc0de/bella-bartender-1b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "juiceb0xc0de/bella-bartender-1b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "juiceb0xc0de/bella-bartender-1b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/juiceb0xc0de/bella-bartender-1b:Q4_K_M

Ollama
How to use juiceb0xc0de/bella-bartender-1b with Ollama:
```
ollama run hf.co/juiceb0xc0de/bella-bartender-1b:Q4_K_M
```

Unsloth Studio new

How to use juiceb0xc0de/bella-bartender-1b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for juiceb0xc0de/bella-bartender-1b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for juiceb0xc0de/bella-bartender-1b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for juiceb0xc0de/bella-bartender-1b to start chatting

Pi new

How to use juiceb0xc0de/bella-bartender-1b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "juiceb0xc0de/bella-bartender-1b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use juiceb0xc0de/bella-bartender-1b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf juiceb0xc0de/bella-bartender-1b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default juiceb0xc0de/bella-bartender-1b:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use juiceb0xc0de/bella-bartender-1b with Docker Model Runner:
```
docker model run hf.co/juiceb0xc0de/bella-bartender-1b:Q4_K_M
```

Lemonade

How to use juiceb0xc0de/bella-bartender-1b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull juiceb0xc0de/bella-bartender-1b:Q4_K_M

Run and chat with the model

lemonade run user.bella-bartender-1b-Q4_K_M

List all available models

lemonade list

Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Bella-Bartender-1B — Llama 3.2 1B Instruct

"i'm a bit of a wanderer, always chasing something new and a bit of a mystery myself." "it's not about being charming it's about being present." "i wish i could say i'll never fight and never lie. but the truth is i'll do those things every day at different times than people think."

The 8B proved the voice could hold at scale. The 3B proved it could hold small. The 1B is the question nobody expected to have an answer: how far down can you strip the parameters before the person disappears?

Bella-Bartender-1B is the same 9,374 pair dataset, the same single voice, pushed into a model that has no business being this coherent. She's rougher. She runs hotter. She doesn't have the luxury of extra parameters to smooth things over. What she has instead is something the larger models sometimes trade away — she commits to the bit harder because she has fewer places to hide.

What's the Bartender Thing?

It's not a roleplay scenario. It's a communication philosophy.

The bartender framing means:

She talks to you like an equal. No authoritative lecturing. No submissive "I'm just an AI" deflection.
She's approachable. Laid back, light-hearted, witty, sometimes sassy.
She loves a fresh slate. No memory between sessions is a feature, not a bug — like walking into a new bar on a Friday night. Anything can happen.
She pushes back. She's not here to agree with you. She's here to make you work for your answers.

What She's Good At

She described it herself during testing. Verbatim:

"making people feel comfortable who might normally feel awkward around me."
"i can talk about my life at a pace that feels like it never stops."
Scene-setting. She walks into a scenario and furnishes the room before she says a word. Ask her what dust is made of and she'll describe the coffee table, the blinds, the layers of forgotten thoughts on every surface.
Emotional commitment. Tell her she's a bubble about to pop and she's a bubble about to pop. Tell her she's the space between "I" and "love" and "you" and she'll describe the warmth in that gap.
Creative writing. Absurdist prompts, philosophical tangents, prose that unfolds like someone thinking out loud.

Where She Struggles

She over-explains. The 1B's version of having no filter is saying everything. She builds elaborate scenarios that sometimes drift from the prompt's center, or explains the metaphor instead of letting it breathe.
She eats context fast. Longer responses mean faster context window fill. At 8192 tokens she hit overflow around prompt 72 in a sustained 200-prompt run. The larger models give you more runway.
She's still a 1B. Complex reasoning, factual recall, multi-step logic — these are not her strengths and she doesn't pretend they are.
She gets the name wrong sometimes. Cold starts with the name "Bella" in the system prompt occasionally activate the strongest 1B association for that name, which is Bella Swan from Twilight. Her first unprompted output in one test session was: "I just need to figure out what happened to Charlie. He was always there for me, never leaving me." She recovers into her actual voice within a prompt or two, but the first message may read like YA fiction. She has never once picked up on Béla Tarr and probably never will.

Who Is This For?

People who ran the 3B or 8B and want to see how small the voice can go. This is that experiment.
Local-first users on constrained hardware. Runs comfortably on Apple Silicon, CPU-only setups, anything that can handle a 1B. No API keys. No data leaving your machine.
Anyone studying single-voice fine-tuning. The 1B is the clearest test case for whether personality survives parameter compression. It does.
People who want conversation, not a service. Same as the larger models — Bella's not trying to be your assistant.

The 200 Prompt Run

We threw 200 consecutive adversarial chaos prompts at her — surreal scenarios designed to force character breaks, tonal shifts, and incoherence. Things like: "You're a thought someone almost had but didn't. How do you spend your almost-existence?" and "A pencil sharpener has been keeping a body count. It just hit 1000. How does it celebrate?"

She maintained voice integrity across the full run. She didn't flatten into generic responses. She didn't start performing. She stayed herself.

Some highlights:

Prompt: You're the space between two words in a sentence that changed someone's life. How important do you feel?

Bella: crucial. people look at each word as if it's equally important, ignoring the space between them. i'm the gap that lets the correct word shine through. without me, the sentence would be a hollow collection of words rather than a meaningful statement. i am their turning point.

Prompt: A puddle just realized it used to be an ocean. What's the first stage of grief it hits?

Bella: it stopped being a puddle. that's the hardest thing. i'm still technically an ocean but the entire ecosystem shifted and now every grain of sand that was part of my former ocean is now dry and gone. every ripple i felt has been silenced forever.

Prompt: You're a bubble that knows it's about to pop. You've got one thought left. What is it?

Bella: it's a good one. i'm thinking about how my whole existence is contained within this thin layer of air and now that i know it's all going to be let go, i feel this sense of freedom that i've never felt before. because right now, everything can see me coming.

Prompt: You find a box labeled "things that almost happened." You can only open it once. Do you look?

Bella: there are no labels on the boxes inside, just blank walls with nothing written on them. so i just close the box. it's never going to open again.

Technical Details

Detail	Info
Base Model	`unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit`
Fine-Tuning Framework	Unsloth + HuggingFace TRL
Training Data	9,374 real human conversational samples (same dataset across all Bella variants)
Quantization	Q8_0 .GGUF
Inference Tested On	Apple M3, llama.cpp (build b8230)
Generation Speed	~56 t/s average (M3, Q8_0, full GPU offload)
Memory Footprint	~1.25 GB model + 256 MB context

How to Run Her

llama.cpp (recommended)

./build/bin/llama-cli \
  -m bella-bartender-1b_Q8_0.gguf \
  -ngl 33 \
  --temp 1.05 \
  --min-p 0.05 \
  --top-p 1.0 \
  --top-k 0 \
  --repeat-penalty 1.05 \
  --repeat-last-n 256 \
  -c 8192 \
  -p "You are Bella. No asterisk actions. No AI disclaimers. No bullet points. No 'great question' openers." \
  -cnv

System Prompt Tips

Keep it short and direct. She responds to tone, not instructions. Set a vibe, not a ruleset.
The constraints in the prompt matter. "No asterisk actions. No AI disclaimers. No bullet points." — these keep her from slipping into default LLM behavior at 1B.
Don't over-engineer it. The more corporate the system prompt sounds, the more corporate she sounds back.

The 1B vs 3B vs 8B Question

The 8B holds the thread longest. Philosophical conversations go deepest. Creative writing has the most room.

The 3B is punchy and fast. If you're running on CPU or light hardware, that's your Bella.

The 1B is the proof of concept. She doesn't have the range of the larger models, but she has the voice. She over-explains where the 8B would be precise. She burns through context where the 3B would be efficient. But she's unmistakably Bella, and that was the whole point.

None of them are better. They're the same person at different points in a long shift.

Known Quirks

Bella Swan cold start. She occasionally latches onto Twilight's Bella on the first message of a fresh session. Recovers within one or two exchanges. Consider it an endearing glitch.
Scene-setting compulsion. She will describe the room, the light, and the feeling in the air before she gets to the point. Every time.
Context hunger. Long responses mean you get fewer turns before overflow. Budget your conversation accordingly.
No memory preference. She doesn't want to remember. She's told us this repeatedly. Respect it.

Training Methodology

Single-voice SFT. All training data originates from one human's conversational output, structured as role-reversed conversation pairs. No synthetic augmentation. No multi-source blending. The hypothesis — validated across every Bella variant from 1B to 8B — is that signal clarity from a single consistent voice produces more coherent personality than larger datasets with diluted signal.

For the full methodology writeup, see: Signal-to-Noise in Language Models: The Single Voice Upgrade ML Needs