Instructions to use blackmamba2003/Villanelle with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use blackmamba2003/Villanelle with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("blackmamba2003/Villanelle", dtype="auto")

PEFT
How to use blackmamba2003/Villanelle with PEFT:
```
Task type is invalid.
```

llama-cpp-python

How to use blackmamba2003/Villanelle with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="blackmamba2003/Villanelle",
	filename="Medina-Qwen3.5-27B-OpenClaw-Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use blackmamba2003/Villanelle with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf blackmamba2003/Villanelle:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf blackmamba2003/Villanelle:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf blackmamba2003/Villanelle:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf blackmamba2003/Villanelle:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf blackmamba2003/Villanelle:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf blackmamba2003/Villanelle:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf blackmamba2003/Villanelle:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf blackmamba2003/Villanelle:Q4_K_M

Use Docker

docker model run hf.co/blackmamba2003/Villanelle:Q4_K_M

LM Studio
Jan
Ollama
How to use blackmamba2003/Villanelle with Ollama:
```
ollama run hf.co/blackmamba2003/Villanelle:Q4_K_M
```

Unsloth Studio

How to use blackmamba2003/Villanelle with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for blackmamba2003/Villanelle to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for blackmamba2003/Villanelle to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for blackmamba2003/Villanelle to start chatting

Atomic Chat new
Docker Model Runner
How to use blackmamba2003/Villanelle with Docker Model Runner:
```
docker model run hf.co/blackmamba2003/Villanelle:Q4_K_M
```

Lemonade

How to use blackmamba2003/Villanelle with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull blackmamba2003/Villanelle:Q4_K_M

Run and chat with the model

lemonade run user.Villanelle-Q4_K_M

List all available models

lemonade list

Configuration Parsing Warning:In tokenizer_config.json: "tokenizer_config.chat_template" must be one of [string, array]

Medina-Qwen3.5-27B-OpenClaw

A LoRA fine-tune of Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled trained on OpenClaw tool-call data — optimized for agentic reasoning with structured tool invocation.

The base model is a Claude 4.6 Opus reasoning distillation of Qwen3.5-27B. This fine-tune adds structured tool-calling capability in the OpenClaw XML format, making it suitable for local agentic deployments.

GGUF Downloads

Quantization	Size	Use case
Q4_K_M	15.4 GB	✅ Recommended — 24GB VRAM or 32GB unified memory
Q8_0	26.6 GB	Near-lossless, 32GB+ VRAM or 48GB unified memory

Training Details

Parameter	Value
Base model	Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
Training GPU	NVIDIA A100 SXM4 80GB
Framework	Unsloth + TRL SFTTrainer
Dataset	OpenClaw tool-call examples (243 examples)
Training time	~102 minutes
Epochs	3
Steps	48
Final loss	0.7066
LoRA rank	r=32, alpha=64, rsLoRA=True
LoRA dropout	0.05
LoRA targets	q/k/v/o/gate/up/down proj
Trainable params	159,383,552 (0.58%)
Context window	4096 tokens
Batch size	1 (effective: 8 with grad accum)
Learning rate	2e-4 (cosine schedule, 5% warmup)
Quantization	4-bit NF4 during training
Optimizer	AdamW 8-bit

What It Does

This adapter teaches the model the OpenClaw tool-calling format — a structured XML-style invocation pattern used by the OpenClaw AI agent platform:

<function_calls>
<invoke name="TOOL_NAME">
<parameter name="PARAM_NAME">value</parameter>
</invoke>
</function_calls>

Supported tools in training data: exec, read, write, edit, web_search, web_fetch, browser, memory_search, memory_get, message, cron, nodes, image, pdf, sessions_spawn, session_status

Usage with llama.cpp / Ollama

# Ollama (Q4_K_M)
ollama run hf.co/peterjohannmedina/Medina-Qwen3.5-27B-OpenClaw:Q4_K_M

# llama.cpp direct
./llama-cli -m Medina-Qwen3.5-27B-OpenClaw-Q4_K_M.gguf \
  --ctx-size 4096 -p "You are an AI assistant with access to tools..."

Usage with Transformers (LoRA adapter)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "peterjohannmedina/Medina-Qwen3.5-27B-OpenClaw")
tokenizer = AutoTokenizer.from_pretrained("peterjohannmedina/Medina-Qwen3.5-27B-OpenClaw")

Companion Model

For a smaller version that runs on M3 MacBook / 16GB systems:

Medina-Qwen3-14B-OpenClaw (Q4_K_M: 8.4 GB, Q8_0: 14.6 GB)

License

Apache 2.0 — same as the base model.

Downloads last month: 21

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

4-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for blackmamba2003/Villanelle

Base model

Qwen/Qwen3.5-27B

Finetuned

Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled

Adapter

(16)

this model