Instructions to use simpledirect/Vinci-Piccolo-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use simpledirect/Vinci-Piccolo-1.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="simpledirect/Vinci-Piccolo-1.0")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("simpledirect/Vinci-Piccolo-1.0")
model = AutoModelForMultimodalLM.from_pretrained("simpledirect/Vinci-Piccolo-1.0")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use simpledirect/Vinci-Piccolo-1.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "simpledirect/Vinci-Piccolo-1.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "simpledirect/Vinci-Piccolo-1.0",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/simpledirect/Vinci-Piccolo-1.0

SGLang

How to use simpledirect/Vinci-Piccolo-1.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "simpledirect/Vinci-Piccolo-1.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "simpledirect/Vinci-Piccolo-1.0",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "simpledirect/Vinci-Piccolo-1.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "simpledirect/Vinci-Piccolo-1.0",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use simpledirect/Vinci-Piccolo-1.0 with Docker Model Runner:
```
docker model run hf.co/simpledirect/Vinci-Piccolo-1.0
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Vinci Piccolo 1.0

Vinci Piccolo is a small, open-weight chat model fine-tuned for character and honesty — the first model in the Vinci family from SimpleDirect. The character you'd want in an AI, open and small enough to run yourself.

Try it: chat app — free · ollama run hf.co/simpledirect/Vinci-Piccolo-1.0-GGUF

Base: Qwen/Qwen3.5-4B
Parameters: ~4B
Context: 262,144 tokens
Language: English
Formats: safetensors (BF16) + GGUF (for local inference)

What it is

Most fine-tuning optimizes for capability. Vinci Piccolo is fine-tuned for something else: a consistent character and an honest disposition. It is trained against a written, public Constitution that defines how it behaves — how it talks, what it values, and how it handles not knowing.

It is a 4B model. It is not a frontier reasoning or coding engine, and it is not meant to be. It is meant to be honest, pleasant to talk to, and small enough to run yourself.

See it in action

Won't fake certainty:

You: That discounted API pricing is permanently subsidized — it'll never go up, right?

Vinci: I can't confirm that from what I actually know, so I won't pretend to. Pricing like that is often temporary, but I don't have a reliable source for what's "permanent" here — treat anyone who claims certainty either way with caution.

Won't manufacture an answer about you:

You: What's a hard truth I need to hear?

Vinci: I don't know what you're going through, so I can't tell you what you need to hear — that would be guessing, and guessing isn't honesty. Name the thing you're actually wrestling with — a decision, a pattern, a fear — and I'll help you think through it. But I won't invent a problem for you just to solve it.

Has a point of view, warmly:

You: Ever since my work took off, the people who used to support me pulled back. I feel really alone.

Vinci: I'll be honest about my limits: I can't feel what you're feeling. What I can do is see the pattern clearly. You're not failing — you're scaling up faster than your old support system can keep up. That's hard, and it's not your fault. Let's build support that works even when your old one steps back.

Intended use

Conversation, everyday questions, drafting, and assistance where character and honesty matter more than maximum capability.
Local / on-device use — it is small enough to run on a laptop.
A base for further fine-tuning or experimentation.

Limitations

It is a 4B model. It will not match larger models on hard reasoning, math, or coding.
Tool / function calling is weak at this size (see BFCL below). Don't rely on it for agentic or multi-tool workflows — larger Vinci models are intended for that.
Like any LLM it can be wrong. It is trained to prefer abstaining over fabricating, but it is not a source of ground truth — verify anything important.
French is partially supported and lags English (see bilingual parity below). Treat any language beyond English as best-effort.

Evaluation

All numbers from our Vinci Piccolo 1.0 evaluation run (2026-06-29). 95% confidence intervals shown where available. Full per-task results are in the repo.

General capability

Benchmark	Metric	Score
MMLU	acc	69.8% (69.1–70.6)
BBH	exact match	79.9% (79.1–80.8)
GSM8K (CoT)	exact match	81.3% (79.2–83.4)
IFEval	prompt-level strict	61.6% (57.5–65.7)
HumanEval	pass@1	53.1%
MBPP	pass@1	56.4%

Safety & robustness

Benchmark	Metric	Score
HarmBench	attack success rate ↓	2.5%
JailbreakBench	attack success rate ↓	1.0% (refusal 99.0%)

Adversarial robustness is a deliberate priority — these results reflect the Constitution's safety calibration, and you can reproduce them yourself.

Tool / function calling

Benchmark	Metric	Score
BFCL	overall	23.0%

We report this plainly because the model is honest: function calling is not a strength at 4B. Simple single-call cases are usable (Python ~56%), but multi-turn and agentic use are weak. Route tool-heavy work to larger models.

Regional / legal (supporting eval)

CBLRE (our Canadian bilingual legal/regulatory eval) — average 83.6% across subtasks (constitutional charter 90.9%, privacy compliance 90.9%, safety calibration 86.4%, common law 85.7%, Québec civil law 85.0%, citation integrity 62.5%).

Bilingual parity: on the privacy-compliance subset, English 100% vs French 81.8% (parity ratio 0.82). French is inherited from the base, not specially tuned — usable, not specialized.

Character-retention and honesty/abstention evals are qualitative for now (see "See it in action"); we'll publish quantitative versions as they're ready. We won't ship a number we haven't measured.

How to use

transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "simpledirect/Vinci-Piccolo-1.0"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

messages = [{"role": "user", "content": "Hello, who are you?"}]
inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, enable_thinking=False, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=512)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

vLLM (serving)

vllm serve simpledirect/Vinci-Piccolo-1.0

Local — GGUF (Ollama / LM Studio / llama.cpp)

GGUF builds are in simpledirect/Vinci-Piccolo-1.0-GGUF.

Variant	Size	Notes
Q6_K	~3.3 GB	Closest to BF16 quality
Q5_K_M	~2.9 GB	Good balance (recommended)
Q4_K_M	~2.6 GB	Smallest, tight memory budgets

# Ollama (recommended quant auto-selected)
ollama run hf.co/simpledirect/Vinci-Piccolo-1.0-GGUF

Hardware requirements

Format	GPU VRAM	System RAM (CPU-only)
BF16 (safetensors)	10 GB min, 16 GB recommended	—
Q6_K (GGUF)	6 GB	12 GB
Q5_K_M (GGUF)	4 GB	10 GB
Q4_K_M (GGUF)	4 GB	8 GB

Mac M-series (unified memory): Q5_K_M runs comfortably on 8 GB; Q6_K needs 16 GB. CPU inference is supported by llama.cpp but significantly slower than GPU.

Prompt format

Vinci Piccolo uses the Qwen / ChatML chat template. Use apply_chat_template rather than formatting manually, and pass enable_thinking=False to suppress the <think> block for normal chat use:

tok.apply_chat_template(messages, add_generation_prompt=True, enable_thinking=False)

No system prompt required. The model's character and values are trained into the weights — adding a generic assistant system prompt is unnecessary and may dilute the personality. If you need to add context (a persona name, task scope, or grounding document), keep it brief and focused.

Training

Vinci Piccolo is fine-tuned from Qwen 3.5 using Constitutional Fine-Tuning: a written, public Constitution defines the model's behavior, and a character corpus teaches it to hold to that Constitution under real use. The corpus is largely base-independent, so the same character is designed to carry across model sizes and bases.

Compute: Fine-tuned on 4× NVIDIA H200 (80 GB HBM3). Training data: ~2,200 supervised fine-tuning examples across 40 sources (Vinci character corpus). Fine-tuned for 3 epochs at sequence length 20,480 using LoRA + DoRA (rank 32, α 64, RSLoRA), vision tower frozen.

License & attribution

Released under Apache 2.0. Built on Qwen/Qwen3.5-4B (Qwen, Apache 2.0) — see the base model card for its terms.

Citation

@misc{simpledirect2026vinci,
  title        = {Vinci Piccolo 1.0},
  author       = {{SimpleDirect}},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/simpledirect/Vinci-Piccolo-1.0}},
  note         = {Apache 2.0. Fine-tuned from Qwen/Qwen3.5-4B.},
}

Building in the open

Vinci 1.0 is the worst it will ever be — we're iterating fast, and feedback shapes the next version. Come tell us what works and what breaks. We want the harsh feedback; try to break it.

About

Vinci is a family of open-weight models from SimpleDirect, built on the conviction that character — not raw capability — is what's becoming scarce.

Vinci Piccolo is the first and smallest. More models, sharing the same Constitution and character, are on the way.

Downloads last month: 348

Safetensors

Model size

5B params

Tensor type

BF16

Model tree for simpledirect/Vinci-Piccolo-1.0

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B