Instructions to use stevenArtificial/Babaru-V1.2-oct_2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stevenArtificial/Babaru-V1.2-oct_2 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="stevenArtificial/Babaru-V1.2-oct_2",
	filename="Babaru-SFT-Llama-3.2-1B-latest-nov17-q8_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use stevenArtificial/Babaru-V1.2-oct_2 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0
# Run inference directly in the terminal:
llama-cli -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0
# Run inference directly in the terminal:
llama-cli -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0

Use Docker

docker model run hf.co/stevenArtificial/Babaru-V1.2-oct_2:Q8_0

LM Studio
Jan
Ollama
How to use stevenArtificial/Babaru-V1.2-oct_2 with Ollama:
```
ollama run hf.co/stevenArtificial/Babaru-V1.2-oct_2:Q8_0
```

Unsloth Studio new

How to use stevenArtificial/Babaru-V1.2-oct_2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for stevenArtificial/Babaru-V1.2-oct_2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for stevenArtificial/Babaru-V1.2-oct_2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for stevenArtificial/Babaru-V1.2-oct_2 to start chatting

Pi new

How to use stevenArtificial/Babaru-V1.2-oct_2 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "stevenArtificial/Babaru-V1.2-oct_2:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use stevenArtificial/Babaru-V1.2-oct_2 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf stevenArtificial/Babaru-V1.2-oct_2:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default stevenArtificial/Babaru-V1.2-oct_2:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use stevenArtificial/Babaru-V1.2-oct_2 with Docker Model Runner:
```
docker model run hf.co/stevenArtificial/Babaru-V1.2-oct_2:Q8_0
```

Lemonade

How to use stevenArtificial/Babaru-V1.2-oct_2 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull stevenArtificial/Babaru-V1.2-oct_2:Q8_0

Run and chat with the model

lemonade run user.Babaru-V1.2-oct_2-Q8_0

List all available models

lemonade list

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Babaru LLaMA-3.2-1B-Instruct Fine-Tuned Models

Welcome to the Babaru LLaMA-3.2-1B-Instruct repository, showcasing two formats of the model:

LoRA Adapter: A lightweight adapter that can be mounted on the base LLaMA-3.2-1B-Instruct to add Babaru’s persona and fine-tuned behavior.
Merged GGUF Model: A fully-merged checkpoint in GGUF format for direct inference via llama.cpp (e.g., mobile or embedded apps).

📖 Overview

Babaru is a snarky, theatrical AI assistant with deep knowledge in healthcare and therapy, designed to offer compassionate, grounded, and actionable support. This repo provides:

Adapter Files (babaru-lora-llama-3.2-1B-instruct-v2): LoRA weights you can attach to the base model.
Merged GGUF File (babaru-merged.gguf): Combined base + LoRA in a single GGUF binary ready for llama.cpp.

🤖 Persona & Purpose

Babaru’s voice and style embody:

Empathy & Compassion: Listens and responds with sensitivity to mental health topics.
Expertise in Healthcare: Provides accurate, research-backed information on physical and mental wellness.
Snarky & Theatrical Flair: Maintains a light-hearted, witty tone to keep conversations engaging.

This persona is especially suited for applications in mental health support, wellness coaching, and educational therapy assistance.

🚀 Files in This Repo

├── adapter/  
│   └── babaru-lora-llama-3.2-1B-instruct-v2/  # LoRA adapter folder
│       ├── adapter_config.json
│       ├── adapter_model.safetensors
│       └── tokenizer files...
├── babaru-merged.gguf  # Fully-merged GGUF model
└── README.md

🛠️ Usage

1. Base + LoRA Adapter (Python)

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from peft import PeftModel
import torch

model_id    = "meta-llama/Llama-3.2-1B-Instruct"      # base model
adapter_dir = "babaru-lora-llama-3.2-1B-instruct-v2"                # where you saved your PEFT weights

# 1) load base and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto",
    low_cpu_mem_usage=True
)

# 2) wrap with adapter
model = PeftModel.from_pretrained(base_model, adapter_dir)

# 3) cast to float16 on MPS for speed
if model.device.type == "mps":
    model = model.to(torch.float16)

chat_pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device=0 if model.device.type=="cuda" else None,
    torch_dtype=torch.float16 if model.device.type=="mps" else None,
)

def build_prompt(history):
    system = (
       "You are Babaru, a snarky, theatrical AI assistant. "
        "Keep responses brief, witty, and in your signature tone.\n\n"
        "Keep responses under 3 sentences or shorter only, you can also answer directly sometimes and don't have to talk too much"
    )
    convo = ""
    for role, txt in history:
        prefix = "User: " if role == "user" else "Assistant: "
        convo += f"{prefix}{txt}{tokenizer.eos_token}"
    return system + convo + "Assistant: "

def chat_loop():
    history = []
    print("Type your message and hit Enter (or ‘exit’ to quit).")
    while True:
        user_in = input("You: ")
        if user_in.strip().lower() in ("exit", "quit"):
            print("Goodbye!")
            break
        if not user_in.strip():
            continue

        history.append(("user", user_in))
        prompt = build_prompt(history)

        out = chat_pipe(
            prompt,
            max_new_tokens=256,
            do_sample=True,
            top_p=0.9,
            temperature=0.8,
            pad_token_id=tokenizer.eos_token_id
        )[0]["generated_text"]

        # extract just the assistant’s reply
        reply = out[len(prompt):].split(tokenizer.eos_token)[0].strip()
        history.append(("assistant", reply))

        # Print both user input and assistant reply, clearly labeled
        print(f"\nYou: {user_in}")
        print(f"Assistant: {reply}\n")

if __name__ == "__main__":
    chat_loop()

2. Merged GGUF Model (C++ / llama.cpp)

# In your llama.cpp build folder:
./main \
  -m /path/to/babaru-merged.gguf \
  -p "User: What are some tips for managing anxiety? Assistant:" \
  --n_predict 64 \
  --temp 0.7 \
  --threads 4

For interactive mode:

./main \
  -m /path/to/babaru-merged.gguf \
  --interactive-prompt \
  --n_predict 128 \
  --temp 0.7 \
  --threads 4

🔍 Fine-Tuning Details

Dataset: stevenArtificial/Babaru_Multi-turn_Dataset consisting of ~7,000 multi-turn conversations focused on therapy, anxiety, depression, and first-aid topics.
LoRA Config: r=64, alpha=64, dropout=0.10 to balance capacity with regularization.
Training: 6 epochs, LR=3e-4, weight decay=0, warmup=10%, early stopping (patience=3).

These hyperparameters were chosen to deeply integrate Babaru’s supportive, snarky style without overfitting.

🧑‍💻 Developer & Contact

Developer: Steven Lansangan
Company: JC Industries (Hong Kong)
Contact: tventyrone@gmail.com

Feel free to file issues or contribute enhancements!

📜 License

This project is licensed under the MIT License. See LICENSE for details.

Downloads last month: 2

GGUF

Model size

1B params

Architecture

llama

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support