Instructions to use Yusiko/qwen3.5-prompter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Yusiko/qwen3.5-prompter with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Yusiko/qwen3.5-prompter", filename="Qwen3.5-4B.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use Yusiko/qwen3.5-prompter with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Yusiko/qwen3.5-prompter:BF16 # Run inference directly in the terminal: llama-cli -hf Yusiko/qwen3.5-prompter:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Yusiko/qwen3.5-prompter:BF16 # Run inference directly in the terminal: llama-cli -hf Yusiko/qwen3.5-prompter:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Yusiko/qwen3.5-prompter:BF16 # Run inference directly in the terminal: ./llama-cli -hf Yusiko/qwen3.5-prompter:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Yusiko/qwen3.5-prompter:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Yusiko/qwen3.5-prompter:BF16
Use Docker
docker model run hf.co/Yusiko/qwen3.5-prompter:BF16
- LM Studio
- Jan
- vLLM
How to use Yusiko/qwen3.5-prompter with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Yusiko/qwen3.5-prompter" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Yusiko/qwen3.5-prompter", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Yusiko/qwen3.5-prompter:BF16
- Ollama
How to use Yusiko/qwen3.5-prompter with Ollama:
ollama run hf.co/Yusiko/qwen3.5-prompter:BF16
- Unsloth Studio new
How to use Yusiko/qwen3.5-prompter with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Yusiko/qwen3.5-prompter to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Yusiko/qwen3.5-prompter to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Yusiko/qwen3.5-prompter to start chatting
- Pi new
How to use Yusiko/qwen3.5-prompter with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Yusiko/qwen3.5-prompter:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Yusiko/qwen3.5-prompter:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Yusiko/qwen3.5-prompter with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Yusiko/qwen3.5-prompter:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Yusiko/qwen3.5-prompter:BF16
Run Hermes
hermes
- Docker Model Runner
How to use Yusiko/qwen3.5-prompter with Docker Model Runner:
docker model run hf.co/Yusiko/qwen3.5-prompter:BF16
- Lemonade
How to use Yusiko/qwen3.5-prompter with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Yusiko/qwen3.5-prompter:BF16
Run and chat with the model
lemonade run user.qwen3.5-prompter-BF16
List all available models
lemonade list
🧠 Qwen3.5-4B Prompter — GGUF
A multilingual prompt engineer model fine-tuned on Yusiko/prompter — a 5,000-sample dataset covering 10 languages and 7 domains.
Given any short, vague user input, this model expands it into a fully structured, production-ready prompt with role assignment, context, step-by-step instructions, output format, and quality standards — following Google's Prompt Engineering Whitepaper best practices.
🚀 Trained 2x faster with Unsloth · Exported to GGUF · Ready for Ollama & llama.cpp
📦 Available Files
| File | Quantization | Size | Use case |
|---|---|---|---|
Qwen3.5-4B.Q4_0.gguf |
Q4_0 | ~2.54 GB | 💡 Recommended — fast, efficient |
Qwen3.5-4B.BF16-mmproj.gguf |
BF16 | larger | 🔬 Higher precision |
🚀 Quick Start
Ollama
ollama run hf.co/Yusiko/qwen3.5-prompter
llama.cpp
# Text-only
llama-cli -hf Yusiko/qwen3.5-prompter --jinja
# Multimodal
llama-mtmd-cli -hf Yusiko/qwen3.5-prompter --jinja
Python (llama-cpp-python)
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="Yusiko/qwen3.5-prompter",
filename="Qwen3.5-4B.Q4_0.gguf",
n_ctx=2048,
)
response = llm.create_chat_completion(
messages=[
{
"role": "user",
"content": (
"Below is an instruction that describes a task, paired with an input "
"that provides further context. Write a response that appropriately "
"completes the request.\n\n"
"### Instruction:\n"
"As a prompt engineer, transform this simple input into a fully detailed, professional prompt\n\n"
"### Input:\n"
"Write a Python function\n\n"
"### Response:"
)
}
],
max_tokens=512,
temperature=0.7,
)
print(response["choices"][0]["message"]["content"])
💡 Prompt Format (Alpaca)
Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
### Instruction:
As a prompt engineer, transform this simple input into a fully detailed, professional prompt
### Input:
{your simple prompt here}
### Response:
⚠️ Always use string concatenation — not
.format()— when building prompts programmatically. The model's outputs contain{curly braces}that will causeKeyErrorwith Python's string formatter.
🎯 What This Model Does
Input: a short, vague prompt
Write a Python function
Output: a complete, structured, professional prompt
## System Prompt
You are a senior software engineer with 10+ years of Python experience.
Your task is to write a Python function with production-quality standards.
## Role & Context
Act as a senior engineer conducting a thorough implementation session...
## Step-by-Step Instructions
1. Clarify requirements, edge cases, and constraints before writing any code
2. Design the interface and data structures first, then implement logic
3. Write the implementation with comprehensive inline documentation
4. Add input validation and robust error handling for all edge cases
5. Write unit tests covering happy path, edge cases, and error scenarios
## Output Requirements
- Implementation: Complete, working code with no placeholders
- Documentation: Inline comments explaining non-obvious logic
- Tests: At minimum 3 test cases (happy path, edge case, error case)
...
📊 Training Details
| Field | Value |
|---|---|
| 🤖 Base model | Qwen/Qwen3.5-4B |
| 🗂️ Dataset | Yusiko/prompter |
| 📦 Dataset size | 5,000 samples |
| 🌍 Languages | 10 (az, en, tr, ru, de, fr, zh, ar, es, ja) |
| 🎯 Method | QLoRA (rank=16, alpha=16) |
| ⚙️ Framework | Unsloth + TRL SFTTrainer |
| 💻 Hardware | NVIDIA RTX5070 12GB |
| 🧮 Optimizer | AdamW (PyTorch) |
| 📐 Seq length | 1024 tokens |
| 🔢 Batch size | 1 × 8 grad accum = 8 effective |
| 📉 LR scheduler | Cosine |
| 🔁 Training steps | 500 |
| 🏷️ Export format | GGUF Q4_0 |
🏗️ Dataset Overview
The Yusiko/prompter dataset contains 4 output types, each following Google's Prompt Engineering Whitepaper:
| Type | Count | Description |
|---|---|---|
| 🔷 Standard | ~3,280 | Role + system + contextual prompting |
| 🔶 Few-shot | ~1,000 | 2 examples shown before the main task |
| 🔹 Chain-of-Thought | ~460 | Step-by-step reasoning structure |
| 🔸 Step-back | ~260 | General principles → specific implementation |
Domains covered: Coding · Writing · Analysis · ML/AI · DevOps · Data Engineering · Business Strategy
⚙️ Hardware Requirements
| Setup | VRAM / RAM | Speed |
|---|---|---|
| GPU (Q4_0) | 4–6 GB VRAM | Fast |
| CPU only (Q4_0) | ~6 GB RAM | Moderate |
| Apple Silicon (Q4_0) | ~6 GB unified RAM | Fast via Metal |
📜 Citation
@model{yusiko_qwen35_prompter_2025,
author = {Yusif},
title = {Qwen3.5-4B Prompter: Multilingual Prompt Engineering Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/Yusiko/qwen3.5-prompter},
dataset = {https://huggingface.co/datasets/Yusiko/prompter}
}
🙏 Acknowledgements
- Unsloth — 2x faster fine-tuning, GGUF export
- Google Prompt Engineering Whitepaper — Lee Boonstra et al., Feb 2025
- TRL — SFTTrainer + SFTConfig
- Qwen Team — Qwen3.5 base model
Built with ❤️ by Yusif · Apache 2.0 License
- Downloads last month
- 308
4-bit