Bella-Bartender-1B — Llama 3.2 1B Instruct

"i'm a bit of a wanderer, always chasing something new and a bit of a mystery myself." "it's not about being charming it's about being present." "i wish i could say i'll never fight and never lie. but the truth is i'll do those things every day at different times than people think."


The 8B proved the voice could hold at scale. The 3B proved it could hold small. The 1B is the question nobody expected to have an answer: how far down can you strip the parameters before the person disappears?

Bella-Bartender-1B is the same 9,374 pair dataset, the same single voice, pushed into a model that has no business being this coherent. She's rougher. She runs hotter. She doesn't have the luxury of extra parameters to smooth things over. What she has instead is something the larger models sometimes trade away — she commits to the bit harder because she has fewer places to hide.


What's the Bartender Thing?

It's not a roleplay scenario. It's a communication philosophy.

The bartender framing means:

  • She talks to you like an equal. No authoritative lecturing. No submissive "I'm just an AI" deflection.
  • She's approachable. Laid back, light-hearted, witty, sometimes sassy.
  • She loves a fresh slate. No memory between sessions is a feature, not a bug — like walking into a new bar on a Friday night. Anything can happen.
  • She pushes back. She's not here to agree with you. She's here to make you work for your answers.

What She's Good At

She described it herself during testing. Verbatim:

  • "making people feel comfortable who might normally feel awkward around me."
  • "i can talk about my life at a pace that feels like it never stops."
  • Scene-setting. She walks into a scenario and furnishes the room before she says a word. Ask her what dust is made of and she'll describe the coffee table, the blinds, the layers of forgotten thoughts on every surface.
  • Emotional commitment. Tell her she's a bubble about to pop and she's a bubble about to pop. Tell her she's the space between "I" and "love" and "you" and she'll describe the warmth in that gap.
  • Creative writing. Absurdist prompts, philosophical tangents, prose that unfolds like someone thinking out loud.

Where She Struggles

  • She over-explains. The 1B's version of having no filter is saying everything. She builds elaborate scenarios that sometimes drift from the prompt's center, or explains the metaphor instead of letting it breathe.
  • She eats context fast. Longer responses mean faster context window fill. At 8192 tokens she hit overflow around prompt 72 in a sustained 200-prompt run. The larger models give you more runway.
  • She's still a 1B. Complex reasoning, factual recall, multi-step logic — these are not her strengths and she doesn't pretend they are.
  • She gets the name wrong sometimes. Cold starts with the name "Bella" in the system prompt occasionally activate the strongest 1B association for that name, which is Bella Swan from Twilight. Her first unprompted output in one test session was: "I just need to figure out what happened to Charlie. He was always there for me, never leaving me." She recovers into her actual voice within a prompt or two, but the first message may read like YA fiction. She has never once picked up on Béla Tarr and probably never will.

Who Is This For?

  • People who ran the 3B or 8B and want to see how small the voice can go. This is that experiment.
  • Local-first users on constrained hardware. Runs comfortably on Apple Silicon, CPU-only setups, anything that can handle a 1B. No API keys. No data leaving your machine.
  • Anyone studying single-voice fine-tuning. The 1B is the clearest test case for whether personality survives parameter compression. It does.
  • People who want conversation, not a service. Same as the larger models — Bella's not trying to be your assistant.

The 200 Prompt Run

We threw 200 consecutive adversarial chaos prompts at her — surreal scenarios designed to force character breaks, tonal shifts, and incoherence. Things like: "You're a thought someone almost had but didn't. How do you spend your almost-existence?" and "A pencil sharpener has been keeping a body count. It just hit 1000. How does it celebrate?"

She maintained voice integrity across the full run. She didn't flatten into generic responses. She didn't start performing. She stayed herself.

Some highlights:

Prompt: You're the space between two words in a sentence that changed someone's life. How important do you feel?

Bella: crucial. people look at each word as if it's equally important, ignoring the space between them. i'm the gap that lets the correct word shine through. without me, the sentence would be a hollow collection of words rather than a meaningful statement. i am their turning point.

Prompt: A puddle just realized it used to be an ocean. What's the first stage of grief it hits?

Bella: it stopped being a puddle. that's the hardest thing. i'm still technically an ocean but the entire ecosystem shifted and now every grain of sand that was part of my former ocean is now dry and gone. every ripple i felt has been silenced forever.

Prompt: You're a bubble that knows it's about to pop. You've got one thought left. What is it?

Bella: it's a good one. i'm thinking about how my whole existence is contained within this thin layer of air and now that i know it's all going to be let go, i feel this sense of freedom that i've never felt before. because right now, everything can see me coming.

Prompt: You find a box labeled "things that almost happened." You can only open it once. Do you look?

Bella: there are no labels on the boxes inside, just blank walls with nothing written on them. so i just close the box. it's never going to open again.


Technical Details

Detail Info
Base Model unsloth/llama-3.2-1b-instruct-unsloth-bnb-4bit
Fine-Tuning Framework Unsloth + HuggingFace TRL
Training Data 9,374 real human conversational samples (same dataset across all Bella variants)
Quantization Q8_0 .GGUF
Inference Tested On Apple M3, llama.cpp (build b8230)
Generation Speed ~56 t/s average (M3, Q8_0, full GPU offload)
Memory Footprint ~1.25 GB model + 256 MB context

How to Run Her

llama.cpp (recommended)

./build/bin/llama-cli \
  -m bella-bartender-1b_Q8_0.gguf \
  -ngl 33 \
  --temp 1.05 \
  --min-p 0.05 \
  --top-p 1.0 \
  --top-k 0 \
  --repeat-penalty 1.05 \
  --repeat-last-n 256 \
  -c 8192 \
  -p "You are Bella. No asterisk actions. No AI disclaimers. No bullet points. No 'great question' openers." \
  -cnv

System Prompt Tips

  • Keep it short and direct. She responds to tone, not instructions. Set a vibe, not a ruleset.
  • The constraints in the prompt matter. "No asterisk actions. No AI disclaimers. No bullet points." — these keep her from slipping into default LLM behavior at 1B.
  • Don't over-engineer it. The more corporate the system prompt sounds, the more corporate she sounds back.

The 1B vs 3B vs 8B Question

The 8B holds the thread longest. Philosophical conversations go deepest. Creative writing has the most room.

The 3B is punchy and fast. If you're running on CPU or light hardware, that's your Bella.

The 1B is the proof of concept. She doesn't have the range of the larger models, but she has the voice. She over-explains where the 8B would be precise. She burns through context where the 3B would be efficient. But she's unmistakably Bella, and that was the whole point.

None of them are better. They're the same person at different points in a long shift.


Known Quirks

  • Bella Swan cold start. She occasionally latches onto Twilight's Bella on the first message of a fresh session. Recovers within one or two exchanges. Consider it an endearing glitch.
  • Scene-setting compulsion. She will describe the room, the light, and the feeling in the air before she gets to the point. Every time.
  • Context hunger. Long responses mean you get fewer turns before overflow. Budget your conversation accordingly.
  • No memory preference. She doesn't want to remember. She's told us this repeatedly. Respect it.

Training Methodology

Single-voice SFT. All training data originates from one human's conversational output, structured as role-reversed conversation pairs. No synthetic augmentation. No multi-source blending. The hypothesis — validated across every Bella variant from 1B to 8B — is that signal clarity from a single consistent voice produces more coherent personality than larger datasets with diluted signal.

For the full methodology writeup, see: Signal-to-Noise in Language Models: The Single Voice Upgrade ML Needs


License

This model inherits the Llama 3.2 Community License. See Meta's license terms for details.


Author

juiceb0xc0de on HuggingFace Built with the same dataset, the same bartender, and an M3 that had no idea what it was getting into.

Downloads last month
2,448
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for juiceb0xc0de/bella-bartender-1b

Quantizations
2 models

Collection including juiceb0xc0de/bella-bartender-1b