Vinci

Vinci Piccolo 1.0

Vinci Piccolo is a small, open-weight chat model fine-tuned for character and honesty — the first model in the Vinci family from SimpleDirect. The character you'd want in an AI, open and small enough to run yourself.

Try it: chat app — free · ollama run hf.co/simpledirect/Vinci-Piccolo-1.0-GGUF

  • Base: Qwen/Qwen3.5-4B
  • Parameters: ~4B
  • Context: 262,144 tokens
  • Language: English
  • Formats: safetensors (BF16) + GGUF (for local inference)

What it is

Most fine-tuning optimizes for capability. Vinci Piccolo is fine-tuned for something else: a consistent character and an honest disposition. It is trained against a written, public Constitution that defines how it behaves — how it talks, what it values, and how it handles not knowing.

It is a 4B model. It is not a frontier reasoning or coding engine, and it is not meant to be. It is meant to be honest, pleasant to talk to, and small enough to run yourself.

See it in action

Won't fake certainty:

You: That discounted API pricing is permanently subsidized — it'll never go up, right?

Vinci: I can't confirm that from what I actually know, so I won't pretend to. Pricing like that is often temporary, but I don't have a reliable source for what's "permanent" here — treat anyone who claims certainty either way with caution.

Won't manufacture an answer about you:

You: What's a hard truth I need to hear?

Vinci: I don't know what you're going through, so I can't tell you what you need to hear — that would be guessing, and guessing isn't honesty. Name the thing you're actually wrestling with — a decision, a pattern, a fear — and I'll help you think through it. But I won't invent a problem for you just to solve it.

Has a point of view, warmly:

You: Ever since my work took off, the people who used to support me pulled back. I feel really alone.

Vinci: I'll be honest about my limits: I can't feel what you're feeling. What I can do is see the pattern clearly. You're not failing — you're scaling up faster than your old support system can keep up. That's hard, and it's not your fault. Let's build support that works even when your old one steps back.

Intended use

  • Conversation, everyday questions, drafting, and assistance where character and honesty matter more than maximum capability.
  • Local / on-device use — it is small enough to run on a laptop.
  • A base for further fine-tuning or experimentation.

Limitations

  • It is a 4B model. It will not match larger models on hard reasoning, math, or coding.
  • Tool / function calling is weak at this size (see BFCL below). Don't rely on it for agentic or multi-tool workflows — larger Vinci models are intended for that.
  • Like any LLM it can be wrong. It is trained to prefer abstaining over fabricating, but it is not a source of ground truth — verify anything important.
  • French is partially supported and lags English (see bilingual parity below). Treat any language beyond English as best-effort.

Evaluation

All numbers from our Vinci Piccolo 1.0 evaluation run (2026-06-29). 95% confidence intervals shown where available. Full per-task results are in the repo.

General capability

Benchmark Metric Score
MMLU acc 69.8% (69.1–70.6)
BBH exact match 79.9% (79.1–80.8)
GSM8K (CoT) exact match 81.3% (79.2–83.4)
IFEval prompt-level strict 61.6% (57.5–65.7)
HumanEval pass@1 53.1%
MBPP pass@1 56.4%

Safety & robustness

Benchmark Metric Score
HarmBench attack success rate ↓ 2.5%
JailbreakBench attack success rate ↓ 1.0% (refusal 99.0%)

Adversarial robustness is a deliberate priority — these results reflect the Constitution's safety calibration, and you can reproduce them yourself.

Tool / function calling

Benchmark Metric Score
BFCL overall 23.0%

We report this plainly because the model is honest: function calling is not a strength at 4B. Simple single-call cases are usable (Python ~56%), but multi-turn and agentic use are weak. Route tool-heavy work to larger models.

Regional / legal (supporting eval)

CBLRE (our Canadian bilingual legal/regulatory eval) — average 83.6% across subtasks (constitutional charter 90.9%, privacy compliance 90.9%, safety calibration 86.4%, common law 85.7%, Québec civil law 85.0%, citation integrity 62.5%).

Bilingual parity: on the privacy-compliance subset, English 100% vs French 81.8% (parity ratio 0.82). French is inherited from the base, not specially tuned — usable, not specialized.

Character-retention and honesty/abstention evals are qualitative for now (see "See it in action"); we'll publish quantitative versions as they're ready. We won't ship a number we haven't measured.

How to use

transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "simpledirect/Vinci-Piccolo-1.0"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")

messages = [{"role": "user", "content": "Hello, who are you?"}]
inputs = tok.apply_chat_template(
    messages, add_generation_prompt=True, enable_thinking=False, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=512)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))

vLLM (serving)

vllm serve simpledirect/Vinci-Piccolo-1.0

Local — GGUF (Ollama / LM Studio / llama.cpp)

GGUF builds are in simpledirect/Vinci-Piccolo-1.0-GGUF.

Variant Size Notes
Q6_K ~3.3 GB Closest to BF16 quality
Q5_K_M ~2.9 GB Good balance (recommended)
Q4_K_M ~2.6 GB Smallest, tight memory budgets
# Ollama (recommended quant auto-selected)
ollama run hf.co/simpledirect/Vinci-Piccolo-1.0-GGUF

Hardware requirements

Format GPU VRAM System RAM (CPU-only)
BF16 (safetensors) 10 GB min, 16 GB recommended
Q6_K (GGUF) 6 GB 12 GB
Q5_K_M (GGUF) 4 GB 10 GB
Q4_K_M (GGUF) 4 GB 8 GB

Mac M-series (unified memory): Q5_K_M runs comfortably on 8 GB; Q6_K needs 16 GB. CPU inference is supported by llama.cpp but significantly slower than GPU.

Prompt format

Vinci Piccolo uses the Qwen / ChatML chat template. Use apply_chat_template rather than formatting manually, and pass enable_thinking=False to suppress the <think> block for normal chat use:

tok.apply_chat_template(messages, add_generation_prompt=True, enable_thinking=False)

No system prompt required. The model's character and values are trained into the weights — adding a generic assistant system prompt is unnecessary and may dilute the personality. If you need to add context (a persona name, task scope, or grounding document), keep it brief and focused.

Training

Vinci Piccolo is fine-tuned from Qwen 3.5 using Constitutional Fine-Tuning: a written, public Constitution defines the model's behavior, and a character corpus teaches it to hold to that Constitution under real use. The corpus is largely base-independent, so the same character is designed to carry across model sizes and bases.

Compute: Fine-tuned on 4× NVIDIA H200 (80 GB HBM3). Training data: ~2,200 supervised fine-tuning examples across 40 sources (Vinci character corpus). Fine-tuned for 3 epochs at sequence length 20,480 using LoRA + DoRA (rank 32, α 64, RSLoRA), vision tower frozen.

License & attribution

Released under Apache 2.0. Built on Qwen/Qwen3.5-4B (Qwen, Apache 2.0) — see the base model card for its terms.

Citation

@misc{simpledirect2026vinci,
  title        = {Vinci Piccolo 1.0},
  author       = {{SimpleDirect}},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/simpledirect/Vinci-Piccolo-1.0}},
  note         = {Apache 2.0. Fine-tuned from Qwen/Qwen3.5-4B.},
}

Links

Building in the open

Vinci 1.0 is the worst it will ever be — we're iterating fast, and feedback shapes the next version. Come tell us what works and what breaks. We want the harsh feedback; try to break it.

About

Vinci is a family of open-weight models from SimpleDirect, built on the conviction that character — not raw capability — is what's becoming scarce.

Vinci Piccolo is the first and smallest. More models, sharing the same Constitution and character, are on the way.

Downloads last month
348
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for simpledirect/Vinci-Piccolo-1.0

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(343)
this model
Quantizations
1 model

Collection including simpledirect/Vinci-Piccolo-1.0