Instructions to use stefans71/frontend-design-expert-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use stefans71/frontend-design-expert-8b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="stefans71/frontend-design-expert-8b", filename="frontend-design-expert-Q3_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use stefans71/frontend-design-expert-8b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf stefans71/frontend-design-expert-8b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf stefans71/frontend-design-expert-8b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf stefans71/frontend-design-expert-8b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf stefans71/frontend-design-expert-8b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf stefans71/frontend-design-expert-8b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf stefans71/frontend-design-expert-8b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf stefans71/frontend-design-expert-8b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf stefans71/frontend-design-expert-8b:Q4_K_M
Use Docker
docker model run hf.co/stefans71/frontend-design-expert-8b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use stefans71/frontend-design-expert-8b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "stefans71/frontend-design-expert-8b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "stefans71/frontend-design-expert-8b", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/stefans71/frontend-design-expert-8b:Q4_K_M
- Ollama
How to use stefans71/frontend-design-expert-8b with Ollama:
ollama run hf.co/stefans71/frontend-design-expert-8b:Q4_K_M
- Unsloth Studio
How to use stefans71/frontend-design-expert-8b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for stefans71/frontend-design-expert-8b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for stefans71/frontend-design-expert-8b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for stefans71/frontend-design-expert-8b to start chatting
- Pi
How to use stefans71/frontend-design-expert-8b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf stefans71/frontend-design-expert-8b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "stefans71/frontend-design-expert-8b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use stefans71/frontend-design-expert-8b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf stefans71/frontend-design-expert-8b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default stefans71/frontend-design-expert-8b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use stefans71/frontend-design-expert-8b with Docker Model Runner:
docker model run hf.co/stefans71/frontend-design-expert-8b:Q4_K_M
- Lemonade
How to use stefans71/frontend-design-expert-8b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull stefans71/frontend-design-expert-8b:Q4_K_M
Run and chat with the model
lemonade run user.frontend-design-expert-8b-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf stefans71/frontend-design-expert-8b:# Run inference directly in the terminal:
llama-cli -hf stefans71/frontend-design-expert-8b:Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf stefans71/frontend-design-expert-8b:# Run inference directly in the terminal:
./llama-cli -hf stefans71/frontend-design-expert-8b:Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf stefans71/frontend-design-expert-8b:# Run inference directly in the terminal:
./build/bin/llama-cli -hf stefans71/frontend-design-expert-8b:Use Docker
docker model run hf.co/stefans71/frontend-design-expert-8b:
Vision critique trigger: Use exactly
"Critique this UI design."when sending a screenshot. The model learned this specific phrase during training โ other phrasings may not reliably activate the critique behavior.
The Problem
Base models are RLHF-tuned to be immediately helpful โ they build immediately regardless of how vague the request is. You can't fix this with a system prompt. It has to be trained into the weights.
1/10 โ 10/10 on qualifying questions. All 10 tested vague prompts triggered clarifying questions from the fine-tuned model; only 1/10 from the base model.
Before / After
Left: base Qwen3-VL-8B ignores the brand name and defaults to blue. Right: fine-tuned model applies FitTrack branding and green accent across every interactive element.
Fine-tuned vs. base Qwen3-VL-8B on the same prompts:
| Prompt | Base Model | Fine-tuned |
|---|---|---|
| Pricing card โ dark, purple, 3 tiers | Renders one Pro card | All 3 tiers with "Most Popular" badge |
| Navbar โ dog daycare, warm colors | Generic SaaS links + rendering artifacts | Domain-appropriate labels ("Book a Spot") |
| Login form โ fitness app, green accent | Blue buttons regardless | Green applied consistently across all states |
| Stats dashboard โ revenue + users + churn | One standalone chart | Two linked KPI cards with sparkline |
| Mobile bottom nav โ 5 tabs, orange active | Generates a social feed | All 5 labeled tabs, correct active state |
| Testimonial card โ minimal, photo + stars | Adds unrequested carousel | Focused single card |
Training Pipeline
Teacher-student distillation:
- Qwen3.6-27B generates HTML components from natural language prompts
- Playwright renders each component to desktop (1280ร900) and mobile (390ร844) screenshots
- GPT-5.4 critiques each screenshot and rewrites the HTML with expert design improvements โ hover states, WCAG contrast, color consistency, layout hierarchy
- Training pairs:
[screenshot + original HTML + critique] โ [expert improved HTML]
The gap between Qwen's output and GPT-5.4's rewrite is the training signal. 3,090 records across 8 types:
| Record type | Count | Description |
|---|---|---|
screenshot_code_critique_to_improved |
~472 | PNG + HTML + critique โ expert improved HTML โ most valuable |
screenshot_to_critique |
~472 | Desktop screenshot โ design critique with measurements |
screenshot_to_code |
~472 | Desktop screenshot โ HTML reconstruction |
mobile_to_code |
~472 | Mobile screenshot โ HTML |
screenshot_html_to_critique |
~472 | Screenshot + HTML โ detailed critique |
prompt_to_html |
~472 | Natural language prompt โ HTML component |
qualifying_conversation |
150 | Vague request โ questions โ answers โ build |
immediate_conversation |
104 | Clear request โ direct build |
Validated Behaviors
| Test | Base 8B | Fine-tuned 8B | Fine-tuned 4B |
|---|---|---|---|
| Qualifying questions (10 vague) | 1/10 | 10/10 | 9/10 |
| Vision critique specificity | Vague | px + hex + WCAG | px + contrast |
| Token accuracy (training) | โ | 98.1% | 92.5% |
| Clean HTML output | Verbose | 0 wrapper chars | 0 wrapper chars |
| Self-improvement loop | -0.50 (regresses) | -0.35 (slight regression) | not tested |
Head-to-Head Design Quality
Head-to-head test: base Qwen3-VL-8B vs fine-tuned, same 10 prompts, same hardware (RTX 3080 Ti 12GB), GPT-5.4 judge using the same critique rubric as training.
| Component | Category | Base | Fine-tuned | Delta |
|---|---|---|---|---|
| Login form (dark) | Form | 5 | 6.5 | +1.5 |
| Checkout form (light) | Form | 5 | 5 | 0 |
| Pricing card (dark) | Card | 5 | 6 | +1 |
| Product card (light) | Card | 5 | 5 | 0 |
| Top navbar (light) | Navbar | 4 | 4 | 0 |
| Sidebar nav (dark) | Navbar | 4 | 3 | -1 |
| Mobile bottom sheet (dark) | Mobile | 1 | 6 | +5 |
| Transaction list (light) | Mobile | 5 | 6.5 | +1.5 |
| CTA section (dark) | Marketing | 6 | 6.5 | +0.5 |
| Invoice table (light) | Data | 5 | 6.5 | +1.5 |
| Average | 4.50 | 5.50 | +1.00 |
- Fine-tuned wins: 6/10 components
- Tied: 3/10
- Base wins: 1/10 (dark navbar only)
- Biggest improvement: mobile dark bottom sheet +5 (base scored 1, fine-tuned scored 6)
Note: Scores reflect first-pass generation without the improvement step. The model was trained on critique+improvement pairs โ ask it to critique and improve its own output for higher quality results.
Thinking mode: Always disable thinking mode in your inference server. Add
"chat_template_kwargs": {"enable_thinking": false}to API requests, or use--no-thinkflag with llama-server.
Quick Start
Text-only (Ollama)
ollama pull stefans71/frontend-design-expert-8b
ollama run stefans71/frontend-design-expert-8b \
"make me a pricing card for my SaaS called TaskFlow, dark theme, purple accent"
Vision + Text (llama-server)
Ollama does not currently support separate mmproj files for vision. Use llama-server:
llama-server \
-m frontend-design-expert-Q4_K_M.gguf \
--mmproj mmproj-F16.gguf \
-c 8192 \
--host 0.0.0.0 \
--port 8080
Send requests via the OpenAI-compatible API:
import base64, requests
with open("screenshot.png", "rb") as f:
img = base64.b64encode(f.read()).decode()
response = requests.post("http://localhost:8080/v1/chat/completions", json={
"model": "frontend-design-expert",
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img}"}},
{"type": "text", "text": "Critique this UI design."}
]
}],
"max_tokens": 1024
})
print(response.json()["choices"][0]["message"]["content"])
Inference tips
- Vision critique trigger: Use exactly
"Critique this UI design."โ other phrasings may trigger thinking-mode EOS - Disable thinking mode: Add
"chat_template_kwargs": {"enable_thinking": false}to API requests - Screenshot resolution: Max 1024ร1024 to avoid VRAM OOM on 12GB GPUs
- Context window: 8192 tokens; increase to 32768 for full-page builds
Files
| File | Size | Use |
|---|---|---|
frontend-design-expert-Q4_K_M.gguf |
4.7 GB | Primary โ 12GB GPU (RTX 3060, RTX 4070, etc.) |
frontend-design-expert-Q3_K_M.gguf |
3.9 GB | Tight 12GB โ more KV cache headroom |
mmproj-F16.gguf |
1.1 GB | Vision encoder โ required for screenshot input |
Training Details
| Property | Value |
|---|---|
| Base model | Qwen/Qwen3-VL-8B-Instruct |
| Method | QLoRA (NF4 4-bit + BF16 LoRA adapters, rank 32) |
| Dataset | 3,090 records โ stefans71/frontend-design-dataset |
| Hardware | NVIDIA RTX 5090 (32GB, Blackwell) |
| Training time | 2h 39m |
| Final loss | 0.246 |
| Token accuracy | 98.1% |
| Framework | SWIFT 4.2.1 (Alibaba) |
| Vision encoder | Frozen (--freeze_vit True) |
Limitations
- Vision critique requires the exact phrase
"Critique this UI design."โ other phrasings may not reliably activate the behavior - Ollama does not currently support separate mmproj files โ use llama-server for vision tasks
- Generated HTML uses inline CSS only (no Tailwind CDN) โ intentional for offline compatibility
- Complex HTML outputs may be truncated at 4096 tokens โ increase
max_tokensfor full-page builds
Related
- stefans71/frontend-design-lite-4b โ 4B version for 8GB GPUs
- stefans71/frontend-design-dataset โ training pipeline (Bun + TypeScript + Playwright)
- Base model: Qwen/Qwen3-VL-8B-Instruct
@misc{stefan2026frontenddesign,
title={Frontend Design Expert: Fine-tuning Qwen3-VL-8B for UI Generation via Teacher-Student Distillation},
author={Stefan, Scott},
year={2026},
url={https://huggingface.co/stefans71/frontend-design-expert-8b}
}
- Downloads last month
- 231
3-bit
4-bit
Model tree for stefans71/frontend-design-expert-8b
Base model
Qwen/Qwen3-VL-8B-Instruct
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf stefans71/frontend-design-expert-8b:# Run inference directly in the terminal: llama-cli -hf stefans71/frontend-design-expert-8b: