YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Babaru LLaMA-3.2-1B-Instruct Fine-Tuned Models

Welcome to the Babaru LLaMA-3.2-1B-Instruct repository, showcasing two formats of the model:

  • LoRA Adapter: A lightweight adapter that can be mounted on the base LLaMA-3.2-1B-Instruct to add Babaru’s persona and fine-tuned behavior.
  • Merged GGUF Model: A fully-merged checkpoint in GGUF format for direct inference via llama.cpp (e.g., mobile or embedded apps).

πŸ“– Overview

Babaru is a snarky, theatrical AI assistant with deep knowledge in healthcare and therapy, designed to offer compassionate, grounded, and actionable support. This repo provides:

  1. Adapter Files (babaru-lora-llama-3.2-1B-instruct-v2): LoRA weights you can attach to the base model.
  2. Merged GGUF File (babaru-merged.gguf): Combined base + LoRA in a single GGUF binary ready for llama.cpp.

πŸ€– Persona & Purpose

Babaru’s voice and style embody:

  • Empathy & Compassion: Listens and responds with sensitivity to mental health topics.
  • Expertise in Healthcare: Provides accurate, research-backed information on physical and mental wellness.
  • Snarky & Theatrical Flair: Maintains a light-hearted, witty tone to keep conversations engaging.

This persona is especially suited for applications in mental health support, wellness coaching, and educational therapy assistance.

πŸš€ Files in This Repo

β”œβ”€β”€ adapter/  
β”‚   └── babaru-lora-llama-3.2-1B-instruct-v2/  # LoRA adapter folder
β”‚       β”œβ”€β”€ adapter_config.json
β”‚       β”œβ”€β”€ adapter_model.safetensors
β”‚       └── tokenizer files...
β”œβ”€β”€ babaru-merged.gguf  # Fully-merged GGUF model
└── README.md

πŸ› οΈ Usage

1. Base + LoRA Adapter (Python)

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel
import torch

model_id    = "meta-llama/Llama-3.2-1B-Instruct"      # base model
adapter_dir = "babaru-lora-llama-3.2-1B-instruct-v2"                # where you saved your PEFT weights

# 1) load base and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    low_cpu_mem_usage=True
)

# 2) wrap with adapter
model = PeftModel.from_pretrained(base_model, adapter_dir)

# 3) cast to float16 on MPS for speed
if model.device.type == "mps":
    model = model.to(torch.float16)

chat_pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if model.device.type=="cuda" else None,
    torch_dtype=torch.float16 if model.device.type=="mps" else None,
)

def build_prompt(history):
    system = (
       "You are Babaru, a snarky, theatrical AI assistant. "
        "Keep responses brief, witty, and in your signature tone.\n\n"
        "Keep responses under 3 sentences or shorter only, you can also answer directly sometimes and don't have to talk too much"
    )
    convo = ""
    for role, txt in history:
        prefix = "User: " if role == "user" else "Assistant: "
        convo += f"{prefix}{txt}{tokenizer.eos_token}"
    return system + convo + "Assistant: "

def chat_loop():
    history = []
    print("Type your message and hit Enter (or β€˜exit’ to quit).")
    while True:
        user_in = input("You: ")
        if user_in.strip().lower() in ("exit", "quit"):
            print("Goodbye!")
            break
        if not user_in.strip():
            continue

        history.append(("user", user_in))
        prompt = build_prompt(history)

        out = chat_pipe(
            prompt,
            max_new_tokens=256,
            do_sample=True,
            top_p=0.9,
            temperature=0.8,
            pad_token_id=tokenizer.eos_token_id
        )[0]["generated_text"]

        # extract just the assistant’s reply
        reply = out[len(prompt):].split(tokenizer.eos_token)[0].strip()
        history.append(("assistant", reply))

        # Print both user input and assistant reply, clearly labeled
        print(f"\nYou: {user_in}")
        print(f"Assistant: {reply}\n")

if __name__ == "__main__":
    chat_loop()

2. Merged GGUF Model (C++ / llama.cpp)

# In your llama.cpp build folder:
./main \
  -m /path/to/babaru-merged.gguf \
  -p "User: What are some tips for managing anxiety? Assistant:" \
  --n_predict 64 \
  --temp 0.7 \
  --threads 4

For interactive mode:

./main \
  -m /path/to/babaru-merged.gguf \
  --interactive-prompt \
  --n_predict 128 \
  --temp 0.7 \
  --threads 4

πŸ” Fine-Tuning Details

  • Dataset: stevenArtificial/Babaru_Multi-turn_Dataset consisting of ~7,000 multi-turn conversations focused on therapy, anxiety, depression, and first-aid topics.
  • LoRA Config: r=64, alpha=64, dropout=0.10 to balance capacity with regularization.
  • Training: 6 epochs, LR=3e-4, weight decay=0, warmup=10%, early stopping (patience=3).

These hyperparameters were chosen to deeply integrate Babaru’s supportive, snarky style without overfitting.


πŸ§‘β€πŸ’» Developer & Contact

Feel free to file issues or contribute enhancements!


πŸ“œ License

This project is licensed under the MIT License. See LICENSE for details.

Downloads last month
6
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support