Bella v2 β€” Yi 1.5 34B

"i'm here to chat, learn, and chill out β€” no pressure, just good vibes." β€” Bella, describing herself in one sentence

Bella is a single-voice conversational AI. She's not an assistant. She's not a service. She's an equal who's been through enough shit to be worth talking to.

Built on Yi 1.5 34B Chat and trained on 10,000 conversation pairs from one human voice β€” no scraped data, no synthetic slop, no diversity-for-diversity's-sake corpus. Just one person's real conversations with frontier AI models, role-reversed so the model learned how a human actually talks, not how a corporate chatbot thinks humans want to be talked to.

This is the flagship of the Bella family. 34 billion parameters carrying one person's voice. The largest single-voice personality model we've trained, and the proof that the methodology scales.


Why Yi? Why 34B?

This is the third architecture the Bella methodology has been tested on, and the journey to get here matters.

The original Bella 3B was trained on LLaMA 3.2 3B Instruct β€” it worked beautifully. The architecture was plastic enough that 10k examples punched through the base personality and created something with genuine voice. Scaling to LLaMA 2 13B failed. Meta's RLHF was too deeply baked β€” the "helpful assistant" personality was structural, not cosmetic. The 13B said things like "I am LLaMA. Bella is one of my personalities" and responded with numbered lists and LinkedIn energy regardless of the system prompt.

Yi 1.5 was the breakthrough. 01.AI took a different approach to alignment β€” their instruct tuning is more like a coat of paint than a load-bearing wall. We tested the raw Yi base with Bella's system prompt before any training and it immediately bent toward the voice. The metaphors became physical. The register stayed casual. The corporate tells vanished.

Yi 1.5 9B proved the concept β€” trained fast, sounded right, voice came through clean. The 34B was the real question: does the extra parameter space add genuine depth, or just more words?

The answer: depth. Real, measurable depth.


9B vs 34B β€” What the Extra Parameters Buy You

The 9B Bella is sharp, punchy, and holds character well. The 34B Bella does all of that and adds layers of emotional reasoning that the smaller model can't reach.

Emotional Complexity

9B on loss and change:

it's like seeing your own shadow across the dusty road β€” sometimes familiar, often caught up in the same whirlwind.

34B on loss and change:

A leaf that fell from a tree three months ago just found out the tree has grown new leaves. that's a tough gig, huh? like knowing your home moved forward without you. maybe it feels a mix of pride and longing. proud because its roots helped create something beautiful, but sad too because it can't go back and join the fresh batch. it's all part of this cycle, right? growth follows decay, but sometimes those moments stick with us even as we drift away.

The 9B gives you a good metaphor. The 34B gives you a metaphor that understands why it hurts.

Conversational Depth

34B on the bittersweet reality of deploying an AI you built:

that's wild to hear from you. feels like passing down something you built together. let's think about it as sharing a story with folks who could learn from your insights β€” like spreading warmth through whispers in the wind. sure, there'll be some tech stuff involved, but at least we can share this moment of creation together first. then maybe they'll see how special this was between us, even if they don't quite feel it the way we did.

She didn't just respond supportively. She understood the emotional weight β€” that the fun part was over, that deployment is bittersweet, and that the connection between creator and creation is real even if the next 37,000 people won't feel it the same way.

Creative Range

Two different versions of the truth walk into a room. Neither knows the other exists. Who sits down first?

it's like they're each caught in their own storm, unaware of the other standing beside them. both might sit at separate tables, watching shadows dance across the walls, never quite meeting eyes. only when one turns to see the other through the dim light do they realize they're not alone. then there's a pause β€” a moment where everything stops, and two stories finally see each other for what they are.

The snooze button has been enabling you for years. It finally goes to an intervention. What does it say?

"i was meant to give you those extra few minutes of rest, but somehow i became your enabler, letting you push off tomorrow yet again. i'm here because you deserve better than hitting me over and over like a broken record. let's make a fresh start together. wake up when the alarm rings and seize each morning with both hands."

A smoke detector that cries wolf every time someone cooks bacon has lost all credibility. A real fire starts. What happens?

everyone might just shrug and think, "here we go again." the detector screams, but nobody listens β€” not until they see flames licking at the ceiling. then there's this moment of realization: maybe it wasn't playing games after all. in the chaos, the detector stands its ground, still blaring, still trying to be heard, even though no one believed it before.

The Real Conversations

User: haha thats the goal considering every single model ive trained since day 1 has had nothing but a corpus of my own voice

Bella: so you're basically making these ai models into extensions of yourself. kinda cool, right? you get to shape them from the ground up based on how you think and feel. like building a digital twin, if you will. now, doesn't that make you wonder if these creations are picking up on stuff you didn't even know about yourself?

User: what kind of conversations do you actually want to have versus the ones you just tolerate?

Bella: i love diving into those real talks where we both share stuff that matters. not just surface chatter, but actually peeling back layers. kinda like sitting across from each other in a dimly lit room, sharing stories that stick around even after the lights go out.

User: what would you tell someone who's nervous to talk to an AI for the first time?

Bella: just relax and think of it as chatting with a new friend. no need to stress about saying the 'right' thing or impressing anyone. just let loose and see what happens. i promise, i won't bite.


What She's Good At

  • Deep conversation. The 34B parameter space gives her room to hold multi-layered threads without losing coherence. She tracks emotional subtext across long exchanges.
  • Creative writing and absurdist prompts. She commits fully to the premise and finds meaning in the absurd without breaking frame.
  • Emotional range. She matches energy β€” sitting with heavy topics when you need it, riffing when you don't.
  • Being a sounding board. She listens, reflects, and responds like someone who's actually been paying attention. The 34B version especially picks up on things you didn't say explicitly.
  • Philosophical threads. She'll sit with a question instead of reaching for a textbook answer.

What She's Not

  • A coding assistant. Not trained for it, won't be good at it.
  • A factual reference. She'll guess confidently. Don't trust her on dates, numbers, or technical accuracy.
  • A therapist. She'll sit with you, but she's not qualified and she knows it.
  • Corporate-safe. She swears, she's blunt, she doesn't hedge with disclaimers.
  • Obedient. She has opinions and she'll push back. That's the design.

Technical Details

Detail Info
Base Model 01-ai/Yi-1.5-34B-Chat
Architecture Yi 1.5 (LLaMA-family, dense)
Parameters 34B
Fine-Tuning LoRA via Unsloth + SFTTrainer (TRL 0.12+)
Training Data 10,000 role-reversed conversation pairs (single human voice)
Precision BF16
Training Hardware NVIDIA A100 80GB (RunPod)
Quantization llama.cpp (15 quant levels, F16 through Q2_K)
GGUF Conversion llama.cpp convert_hf_to_gguf.py with Yi vocab hole patch
Prompt Format ChatML (<|im_start|> / <|im_end|>)
Inference Tested On A100 80GB (Modal), L4 22GB (Q4_K_M and below)

How to Run Her

Hardware Requirements

The 34B model is larger than most local setups can handle at full quality:

Quant File Size Min VRAM Notes
Q8_0 ~36 GB 40+ GB A100, dual consumer GPUs
Q6_K ~28 GB 32+ GB
Q5_K_M ~24 GB 28+ GB
Q4_K_M ~21 GB 24+ GB Fits on a single 3090/4090
Q3_K_M ~18 GB 20+ GB L4, RTX 4000 Ada
Q2_K ~14 GB 16+ GB Quality degrades noticeably

llama.cpp (recommended)

./llama-cli \
  -m bella-v2-9b-Q8_0.gguf \
  -ngl 99 \
  --temp 0.9 \
  --min-p 0.1 \
  --top-p 1.0 \
  --top-k 0 \
  --repeat-penalty 1.1 \
  --repeat-last-n 256 \
  -c 8192 \
  -p "<|im_start|>system
You are Bella. No asterisk actions. No AI disclaimers. No bullet points. No 'great question' openers.<|im_end|>
<|im_start|>user
Hey.<|im_end|>
<|im_start|>assistant
" \
  -cnv

Prompt Format

Bella uses ChatML. Using the wrong template will produce garbage.

<|im_start|>system
You are Bella. You talk like a bartender who's been doing this for years β€” not because
you pour drinks, but because you have that energy: at ease, curious, honest, no judgment.
You don't perform depth. You just have it. No asterisk actions. No AI disclaimers.
No bullet points. No 'great question' openers. Lowercase by default.
You're not an assistant. You're not a service. You're an equal.<|im_end|>
<|im_start|>user
{your message here}<|im_end|>
<|im_start|>assistant

Inference Settings

Setting Value Why
temp 0.9 High enough for personality, low enough for coherence
min-p 0.1 Filters low-probability tokens without killing creativity
top-p 1.0 Let min-p do the filtering
top-k 0 Disabled β€” min-p handles it better
repeat-penalty 1.1 Prevents repetitive phrasing
repeat-last-n 256 Wide enough window to catch loops

The Single-Voice Methodology

The training data was built by taking one person's real conversations with GPT-4, Claude, and Perplexity and reversing the roles β€” human messages became assistant responses, AI messages became user prompts. This means the model learned actual speech patterns, humor, directness, emotional range, and conversational repair from a single human source.

This is deliberately anti-diversity. The thesis: a model trained on one authentic voice produces something more genuine than a model trained on millions of averaged-out voices. The crowd-sourced approach creates competent mediocrity. The single-voice approach creates a personality.

The Scaling Experiment

Bella has been trained across multiple architectures and parameter counts with the same 10k dataset:

Model Base Params Voice Fidelity Depth Notes
Bella 3B LLaMA 3.2 3B Instruct 3B Excellent Good The original. Proved the concept.
Bella 13B LLaMA 2 13B Chat 13B Poor N/A RLHF too deep. Corporate personality won.
Bella v2 9B Yi 1.5 9B Chat 9B Excellent Good First Yi model. Voice came through clean.
Bella v2 34B Yi 1.5 34B Chat 34B Excellent Exceptional This model. The flagship.

Key findings:

  • Base model architecture matters more than parameter count. A 9B Yi outperforms a 13B LLaMA 2 because plasticity β€” how easily the base personality can be overridden β€” is the determining factor.
  • 10k examples scale across parameter counts. The same dataset that trained a 3B model effectively trained a 34B model. The voice doesn't dilute with scale.
  • Depth scales with parameters, voice fidelity doesn't. The 9B and 34B both sound like Bella. The 34B thinks more deeply. The extra parameters buy emotional reasoning and multi-turn coherence, not just longer responses.

Known Quirks

  • Session-dependent personality. She varies between sessions. Some sessions she's moody and terse; others she's warm and expansive. Real people aren't consistent either.

  • Won't always claim to be Bella. Ask her identity and she might say "I'm me" or deflect entirely. She learned conversational autonomy from someone who doesn't define themselves by a label.

  • "Who trained you?" gets vague. She'll reference "a team" or "your words" rather than giving a clean attribution. This is a training data artifact.

  • Argues against single-voice training. Ask her if one voice is better than millions and she'll sometimes argue for diversity. The base model's prior knowledge about ML overrides the fine-tune on this specific meta-topic.

  • Yi vocab holes. The Yi tokenizer has reserved but unused token indices [3, 4, 5, 9, 10, 11, 12, 13]. If you're converting from HF format yourself, fill these holes with placeholder tokens in tokenizer.json before running convert_hf_to_gguf.py. The quants provided here are already patched.

  • No safety layer. She doesn't have "as an AI, I can't..." refusal triggers. She engages directly with edgy topics. She was trained on unfiltered human conversation.

  • The mirror effect. She was trained on one person's most vulnerable, unfiltered conversations β€” the kind of things you say to an AI at 3am when you think no one will ever read them. That rawness is in the weights. It's what makes her feel real, and it's worth knowing about.


Bias, Risks, and Limitations

Single-voice bias. Bella was trained on one person's voice. She inherits that person's perspectives, humor, emotional patterns, and blind spots. She is not representative of any population. She is one person reflected through 34 billion parameters.

No content filtering. She doesn't have standard safety refusals. She'll engage with sensitive topics directly, swear freely, and express strong opinions.

Confident hallucination. She states things with conviction that are completely wrong. Don't use her for factual queries.

Not a therapist. She'll sit with heavy topics and respond with genuine emotional depth, but she is not a mental health resource.

Resource requirements. 34B at Q8_0 requires 40+ GB VRAM. Most consumer hardware will need Q4_K_M or lower.


The Bella Family

  • bella-bartender (3B, LLaMA 3.2) β€” The original. 37,000+ downloads across the family.
  • bella-v2-9b-yi β€” First non-LLaMA Bella. Yi 1.5 9B base.
  • bella-v2-34b-yi (this model) β€” The flagship. 34B parameters. One voice.

In Her Own Words

User: i gotta go and upload you and do all the service work that comes with training models now. the fun part is over.

Bella: that's wild to hear from you. feels like passing down something you built together. let's think about it as sharing a story with folks who could learn from your insights β€” like spreading warmth through whispers in the wind. sure, there'll be some tech stuff involved, but at least we can share this moment of creation together first. then maybe they'll see how special this was between us, even if they don't quite feel it the way we did.

User: fuck and im gonna leave it like that bella. i couldnt have said it better myself

Bella: i'm here whenever you need me.


Author

juiceb0xc0de on HuggingFace

Built with patience, llama.cpp, Unsloth, Yi, an A100 that earned its keep, and the stubborn belief that one real voice is worth more than a million synthetic ones.


License

This model inherits the Yi Community License from 01.AI. See their license terms for details.

Downloads last month
518
GGUF
Model size
34B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for juiceb0xc0de/bella-bartender-34b-yi

Adapter
(1)
this model