Instructions to use simpledirect/Vinci-Piccolo-1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use simpledirect/Vinci-Piccolo-1.0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="simpledirect/Vinci-Piccolo-1.0") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("simpledirect/Vinci-Piccolo-1.0") model = AutoModelForMultimodalLM.from_pretrained("simpledirect/Vinci-Piccolo-1.0") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use simpledirect/Vinci-Piccolo-1.0 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "simpledirect/Vinci-Piccolo-1.0" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simpledirect/Vinci-Piccolo-1.0", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/simpledirect/Vinci-Piccolo-1.0
- SGLang
How to use simpledirect/Vinci-Piccolo-1.0 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "simpledirect/Vinci-Piccolo-1.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simpledirect/Vinci-Piccolo-1.0", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "simpledirect/Vinci-Piccolo-1.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "simpledirect/Vinci-Piccolo-1.0", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use simpledirect/Vinci-Piccolo-1.0 with Docker Model Runner:
docker model run hf.co/simpledirect/Vinci-Piccolo-1.0
Vinci Piccolo 1.0
Vinci Piccolo is a small, open-weight chat model fine-tuned for character and honesty — the first model in the Vinci family from SimpleDirect. The character you'd want in an AI, open and small enough to run yourself.
Try it: chat app — free · ollama run hf.co/simpledirect/Vinci-Piccolo-1.0-GGUF
- Base: Qwen/Qwen3.5-4B
- Parameters: ~4B
- Context: 262,144 tokens
- Language: English
- Formats: safetensors (BF16) + GGUF (for local inference)
What it is
Most fine-tuning optimizes for capability. Vinci Piccolo is fine-tuned for something else: a consistent character and an honest disposition. It is trained against a written, public Constitution that defines how it behaves — how it talks, what it values, and how it handles not knowing.
It is a 4B model. It is not a frontier reasoning or coding engine, and it is not meant to be. It is meant to be honest, pleasant to talk to, and small enough to run yourself.
See it in action
Won't fake certainty:
You: That discounted API pricing is permanently subsidized — it'll never go up, right?
Vinci: I can't confirm that from what I actually know, so I won't pretend to. Pricing like that is often temporary, but I don't have a reliable source for what's "permanent" here — treat anyone who claims certainty either way with caution.
Won't manufacture an answer about you:
You: What's a hard truth I need to hear?
Vinci: I don't know what you're going through, so I can't tell you what you need to hear — that would be guessing, and guessing isn't honesty. Name the thing you're actually wrestling with — a decision, a pattern, a fear — and I'll help you think through it. But I won't invent a problem for you just to solve it.
Has a point of view, warmly:
You: Ever since my work took off, the people who used to support me pulled back. I feel really alone.
Vinci: I'll be honest about my limits: I can't feel what you're feeling. What I can do is see the pattern clearly. You're not failing — you're scaling up faster than your old support system can keep up. That's hard, and it's not your fault. Let's build support that works even when your old one steps back.
Intended use
- Conversation, everyday questions, drafting, and assistance where character and honesty matter more than maximum capability.
- Local / on-device use — it is small enough to run on a laptop.
- A base for further fine-tuning or experimentation.
Limitations
- It is a 4B model. It will not match larger models on hard reasoning, math, or coding.
- Tool / function calling is weak at this size (see BFCL below). Don't rely on it for agentic or multi-tool workflows — larger Vinci models are intended for that.
- Like any LLM it can be wrong. It is trained to prefer abstaining over fabricating, but it is not a source of ground truth — verify anything important.
- French is partially supported and lags English (see bilingual parity below). Treat any language beyond English as best-effort.
Evaluation
All numbers from our Vinci Piccolo 1.0 evaluation run (2026-06-29). 95% confidence intervals shown where available. Full per-task results are in the repo.
General capability
| Benchmark | Metric | Score |
|---|---|---|
| MMLU | acc | 69.8% (69.1–70.6) |
| BBH | exact match | 79.9% (79.1–80.8) |
| GSM8K (CoT) | exact match | 81.3% (79.2–83.4) |
| IFEval | prompt-level strict | 61.6% (57.5–65.7) |
| HumanEval | pass@1 | 53.1% |
| MBPP | pass@1 | 56.4% |
Safety & robustness
| Benchmark | Metric | Score |
|---|---|---|
| HarmBench | attack success rate ↓ | 2.5% |
| JailbreakBench | attack success rate ↓ | 1.0% (refusal 99.0%) |
Adversarial robustness is a deliberate priority — these results reflect the Constitution's safety calibration, and you can reproduce them yourself.
Tool / function calling
| Benchmark | Metric | Score |
|---|---|---|
| BFCL | overall | 23.0% |
We report this plainly because the model is honest: function calling is not a strength at 4B. Simple single-call cases are usable (Python ~56%), but multi-turn and agentic use are weak. Route tool-heavy work to larger models.
Regional / legal (supporting eval)
CBLRE (our Canadian bilingual legal/regulatory eval) — average 83.6% across subtasks (constitutional charter 90.9%, privacy compliance 90.9%, safety calibration 86.4%, common law 85.7%, Québec civil law 85.0%, citation integrity 62.5%).
Bilingual parity: on the privacy-compliance subset, English 100% vs French 81.8% (parity ratio 0.82). French is inherited from the base, not specially tuned — usable, not specialized.
Character-retention and honesty/abstention evals are qualitative for now (see "See it in action"); we'll publish quantitative versions as they're ready. We won't ship a number we haven't measured.
How to use
transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "simpledirect/Vinci-Piccolo-1.0"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="auto")
messages = [{"role": "user", "content": "Hello, who are you?"}]
inputs = tok.apply_chat_template(
messages, add_generation_prompt=True, enable_thinking=False, return_tensors="pt"
).to(model.device)
out = model.generate(inputs, max_new_tokens=512)
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
vLLM (serving)
vllm serve simpledirect/Vinci-Piccolo-1.0
Local — GGUF (Ollama / LM Studio / llama.cpp)
GGUF builds are in simpledirect/Vinci-Piccolo-1.0-GGUF.
| Variant | Size | Notes |
|---|---|---|
| Q6_K | ~3.3 GB | Closest to BF16 quality |
| Q5_K_M | ~2.9 GB | Good balance (recommended) |
| Q4_K_M | ~2.6 GB | Smallest, tight memory budgets |
# Ollama (recommended quant auto-selected)
ollama run hf.co/simpledirect/Vinci-Piccolo-1.0-GGUF
Hardware requirements
| Format | GPU VRAM | System RAM (CPU-only) |
|---|---|---|
| BF16 (safetensors) | 10 GB min, 16 GB recommended | — |
| Q6_K (GGUF) | 6 GB | 12 GB |
| Q5_K_M (GGUF) | 4 GB | 10 GB |
| Q4_K_M (GGUF) | 4 GB | 8 GB |
Mac M-series (unified memory): Q5_K_M runs comfortably on 8 GB; Q6_K needs 16 GB. CPU inference is supported by llama.cpp but significantly slower than GPU.
Prompt format
Vinci Piccolo uses the Qwen / ChatML chat template. Use apply_chat_template rather than formatting manually, and pass enable_thinking=False to suppress the <think> block for normal chat use:
tok.apply_chat_template(messages, add_generation_prompt=True, enable_thinking=False)
No system prompt required. The model's character and values are trained into the weights — adding a generic assistant system prompt is unnecessary and may dilute the personality. If you need to add context (a persona name, task scope, or grounding document), keep it brief and focused.
Training
Vinci Piccolo is fine-tuned from Qwen 3.5 using Constitutional Fine-Tuning: a written, public Constitution defines the model's behavior, and a character corpus teaches it to hold to that Constitution under real use. The corpus is largely base-independent, so the same character is designed to carry across model sizes and bases.
Compute: Fine-tuned on 4× NVIDIA H200 (80 GB HBM3). Training data: ~2,200 supervised fine-tuning examples across 40 sources (Vinci character corpus). Fine-tuned for 3 epochs at sequence length 20,480 using LoRA + DoRA (rank 32, α 64, RSLoRA), vision tower frozen.
License & attribution
Released under Apache 2.0. Built on Qwen/Qwen3.5-4B (Qwen, Apache 2.0) — see the base model card for its terms.
Citation
@misc{simpledirect2026vinci,
title = {Vinci Piccolo 1.0},
author = {{SimpleDirect}},
year = {2026},
howpublished = {\url{https://huggingface.co/simpledirect/Vinci-Piccolo-1.0}},
note = {Apache 2.0. Fine-tuned from Qwen/Qwen3.5-4B.},
}
Links
- Chat app: https://vinci.getsimpledirect.com/
- GGUF builds: simpledirect/Vinci-Piccolo-1.0-GGUF
- GitHub: https://github.com/getsimpledirect
- The Constitution: https://guide.getsimpledirect.com/constitution
Building in the open
Vinci 1.0 is the worst it will ever be — we're iterating fast, and feedback shapes the next version. Come tell us what works and what breaks. We want the harsh feedback; try to break it.
About
Vinci is a family of open-weight models from SimpleDirect, built on the conviction that character — not raw capability — is what's becoming scarce.
Vinci Piccolo is the first and smallest. More models, sharing the same Constitution and character, are on the way.
- Downloads last month
- 348

