Instructions to use caid-technologies/blueprint-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use caid-technologies/blueprint-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="caid-technologies/blueprint-base")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("caid-technologies/blueprint-base", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use caid-technologies/blueprint-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "caid-technologies/blueprint-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "caid-technologies/blueprint-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/caid-technologies/blueprint-base
- SGLang
How to use caid-technologies/blueprint-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "caid-technologies/blueprint-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "caid-technologies/blueprint-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "caid-technologies/blueprint-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "caid-technologies/blueprint-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Unsloth Studio
How to use caid-technologies/blueprint-base with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for caid-technologies/blueprint-base to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for caid-technologies/blueprint-base to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for caid-technologies/blueprint-base to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="caid-technologies/blueprint-base", max_seq_length=2048, ) - Docker Model Runner
How to use caid-technologies/blueprint-base with Docker Model Runner:
docker model run hf.co/caid-technologies/blueprint-base
Blueprint Base β Qwen2.5-3B
Blueprint turns a plain-English hardware idea into an organized project plan.
Tell it what you want to build β "a compact desk clock with an e-ink display and a remote" β and it gives back a structured blueprint: the parts list, how the parts connect, step-by-step build instructions, rough costs, and a quick design check. Everything comes out as clean, organized data that an app can read and build on.
This is the all-in-one model β it runs on its own, no add-ons needed. (There's also a small adapter-only version at blueprint-base-lora.)
Early research preview. Great for drafting and exploring ideas β not a replacement for real engineering, CAD software, or safety review.
What it can do
Give it a hardware idea and it can produce any of:
- π a parts list (components)
- π a wiring/connection map between the parts
- π οΈ ordered build steps
- π² rough sourcing and cost info
- β a basic design check
- π¦ or the whole project plan at once
You can ask for the complete plan, or just one piece (like only the parts list).
What it's good for β and not
β Good for: brainstorming hardware projects, drafting parts lists and build steps, and turning a rough idea into an organized starting plan.
π« Not for: final engineering decisions, real CAD models, electrical safety, or anything safety-critical. Treat the output as a helpful first draft to review, not a finished design.
Try it
from transformers import AutoModelForCausalLM, AutoTokenizer
REPO = "caid-technologies/blueprint-base"
model = AutoModelForCausalLM.from_pretrained(REPO, device_map="auto", torch_dtype="bfloat16")
tok = AutoTokenizer.from_pretrained(REPO)
msgs = [
{"role": "system", "content":
"You design hobbyist electronics projects. Given a request, reply with a single "
"JSON object describing the full project. Output only the JSON."},
{"role": "user", "content": "A compact desk clock with an e-ink display and an IR remote."},
]
inputs = tok.apply_chat_template(
msgs, add_generation_prompt=True, return_tensors="pt", return_dict=True).to(model.device)
out = model.generate(**inputs, max_new_tokens=6144, repetition_penalty=1.1,
pad_token_id=tok.eos_token_id)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
π‘ Tip: keep max_new_tokens high (β₯ 6000) so long plans aren't cut off, and keep
repetition_penalty=1.1 so wiring lists don't get stuck repeating. For Ollama/local apps,
convert this model to GGUF with llama.cpp.
What it learned from
It was trained on about 130 hobbyist hardware projects β things like weather stations, small robots, drones, smart-home gadgets, lab tools, and audio gear β expanded into a few thousand practice examples. Everything is small, maker-style electronics-plus-hardware.
Most common project types in the training data:
| Project type | Share | Examples |
|---|---|---|
| Test & lab instruments | ~20% | function generator, Geiger counter |
| Smart-home / IoT gadgets | ~15% | pet feeder, smart mailbox, pill dispenser |
| Radio, comms & networking | ~9% | LoRa base station, APRS tracker, NAS |
| Wearables & health | ~8% | sleep ring, heart-rate strap |
| Audio & music | ~8% | synth module, guitar pedal, speaker |
| Robotics & motion | ~7% | quadruped robot, robotic arm |
| Environmental sensing | ~7% | air-quality monitor, weather station |
| Clocks & e-ink displays | ~6% | word clock, e-ink calendar |
| Maker / fabrication tools | ~5% | vinyl cutter, pen plotter |
| Drones & aerial | ~5% | FPV drone, VTOL aircraft |
| Everything else | ~10% | lighting, games, automotive, power |
Good to know (limitations)
- It's a small model, so complex, many-part projects are harder for it.
- It proposes designs; it doesn't verify them. Always sanity-check before building.
- It's strongest on common project types (lab tools, smart-home) and weaker on rarer ones (games, automotive).
How well it works
We tested it on projects it had never seen during training. Here's how often it produced a valid, well-structured result for each task:
| Task | Valid result |
|---|---|
| π οΈ Build steps | ~100% |
| β Design check | ~100% |
| π Parts list | ~95% |
| π¦ Full project plan | ~85β97% |
| π Wiring map | ~67% |
It's strongest at build steps, design checks, and parts lists. Full end-to-end plans are close
behind, and wiring maps are the hardest (and most sensitive to the repetition_penalty tip
above). Figures are from held-out testing and are being finalized for the current version.
Technical details (for ML folks)
- Base model:
Qwen/Qwen2.5-3B-Instruct; this repo is the fine-tune merged to 16-bit (standalone, no adapter needed). - Method: QLoRA with Unsloth (LoRA r=32, alpha=32, all attention+MLP projections), then merged.
- Training: 1 epoch, max_seq_len 6144, effective batch 8, lr 2e-4 (linear, 3% warmup), adamw_8bit, NEFTune Ξ±=5, loss masked to assistant turns, early stopping on eval loss
- Hardware: single RTX 4070 (12 GB)
- Data: synthetic dataset projected into 6 task "modes" (full plan, parts, wiring, instructions, validation); split grouped by project so none leak between train/test. ~3,242 rows; modes rebalanced (cap 350/mode) so the model doesn't coast on the easy ones.
- Inference:
do_sample=False,repetition_penaltyβ1.1,max_new_tokensβ₯6000, pass the attention mask.
@misc{blueprint_base,
title = {Blueprint Base: Qwen2.5-3B for structured hardware project generation},
author = {Caid Technologies},
year = {2026},
howpublished = {\url{https://huggingface.co/caid-technologies}}
}
Built with Unsloth and π€ Transformers / PEFT / TRL.