β‘ WebICoder v3 β HTML Code Generation (MLX 8-bit)
WebICoder v3 is a fine-tuned version of Microsoft Phi-2 (2.7B parameters) specialized in generating complete, production-ready HTML/CSS websites from natural language descriptions.
Optimized for Apple Silicon via MLX.
Model Details
| Property | Value |
|---|---|
| Base Model | Microsoft Phi-2 (2.7B parameters) |
| Architecture | PhiForCausalLM (32 layers, 2560 hidden) |
| Format | MLX (Apple Silicon optimized) |
| Quantization | 8-bit (8.503 bits/weight, affine) |
| Size | ~2.9 GB |
| Context Length | 4096 tokens |
| Task | HTML/CSS Code Generation |
| Speed | ~12-20 tok/s on M-series Mac |
Also Available
| Variant | Link | Size |
|---|---|---|
| 8-bit (higher quality) | YOUR_USERNAME/WebICoder-v3-MLX-8bit |
~2.9 GB |
β οΈ MANDATORY β Read Before Using
If you skip these steps, the model will produce broken, repeated, or low-quality output. Follow ALL 5 rules below to get the best results.
Rule 1 β Use the correct prompt format
The model was trained with an Alpaca-style format. You MUST wrap your prompt like this:
### Instruction:
{your website description here}
### Response:
β DO NOT send raw text like "Create a website" β the model won't understand it correctly.
β
DO use the format above, or use tokenizer.apply_chat_template() which does it automatically.
Rule 2 β ALWAYS stop at </html>
The model does not always emit an EOS token after finishing the HTML. You MUST check for </html> in the output and stop generation when you see it.
# β
Correct β stop at </html>
for response in stream_generate(model, tokenizer, prompt=prompt, max_tokens=4096, sampler=sampler):
full_text += response.text
if "</html>" in full_text:
break
β Without this, the model will repeat the entire page in a loop.
Rule 3 β Use repetition penalty
A repetition penalty is essential to prevent the model from generating duplicate sections (e.g., the same footer twice, identical testimonials).
from mlx_lm.sample_utils import make_logits_processors
logits_processors = make_logits_processors(repetition_penalty=1.2, repetition_context_size=256)
Then pass logits_processors=logits_processors to stream_generate().
Rule 4 β Use low temperature (0.3 β 0.5)
High temperature (> 0.7) produces incoherent, broken HTML. Always use 0.3 β 0.5.
from mlx_lm.sample_utils import make_sampler
sampler = make_sampler(temp=0.4) # β
Recommended
Rule 5 β Post-process the output
The model may occasionally prepend training artifacts (system prompt) before the HTML. Always clean the output:
import re
def clean_html(text: str) -> str:
"""Extract clean HTML from model output."""
# Remove leaked system prompts
text = re.sub(r"You are (?:Deep|Web[iI])coder.*?production-ready code\.\n*", "", text, flags=re.DOTALL)
text = re.sub(r"### Instruction:.*", "", text, flags=re.DOTALL)
text = re.sub(r"### Response:\s*", "", text, flags=re.DOTALL)
# Extract HTML document
match = re.search(r"(<(?:!DOCTYPE\s+html|html)[\s\S]*?</html>)", text, re.IGNORECASE)
if match:
return match.group(1).strip()
# Fallback
start = re.search(r"<(?:!DOCTYPE|html|head|body)", text, re.IGNORECASE)
if start:
html = text[start.start():].strip()
if not html.lower().startswith("<!doctype"):
html = "<!DOCTYPE html>\n<html>\n" + html + "\n</html>"
return html
return text.strip()
Quick Start β Complete Working Example
Copy-paste this and it will work:
from mlx_lm import load, stream_generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors
import re
# 1. Load model
model, tokenizer = load("YOUR_USERNAME/WebICoder-v3-MLX-8bit")
# 2. Format prompt (MANDATORY)
user_prompt = "Create a modern portfolio website with a hero, project cards, and a contact form"
prompt = f"""### Instruction:
{user_prompt}
### Response:
"""
# 3. Configure sampler + repetition penalty (MANDATORY)
sampler = make_sampler(temp=0.4)
logits_processors = make_logits_processors(repetition_penalty=1.2, repetition_context_size=256)
# 4. Generate with stop at </html> (MANDATORY)
full_text = ""
for response in stream_generate(
model, tokenizer,
prompt=prompt,
max_tokens=4096,
sampler=sampler,
logits_processors=logits_processors,
):
full_text += response.text
print(response.text, end="", flush=True)
if "</html>" in full_text or response.finish_reason:
break
# 5. Clean output (MANDATORY)
def clean_html(text):
text = re.sub(r"You are (?:Deep|Web[iI])coder.*?production-ready code\.\n*", "", text, flags=re.DOTALL)
match = re.search(r"(<(?:!DOCTYPE\s+html|html)[\s\S]*?</html>)", text, re.IGNORECASE)
return match.group(1).strip() if match else text.strip()
html = clean_html(full_text)
# Save to file
with open("output.html", "w") as f:
f.write(html)
print(f"\n\nSaved to output.html ({len(html)} chars)")
Recommended Parameters Summary
| Parameter | Value | Mandatory? |
|---|---|---|
| Prompt format | ### Instruction: / ### Response: |
β YES |
| Temperature | 0.3 β 0.5 | β YES |
| Repetition Penalty | 1.2 | β YES |
| Repetition Context | 256 | β YES |
| Max Tokens | 4096 | β YES |
Stop at </html> |
Check output and break | β YES |
| Post-processing | clean_html() function |
β YES |
| Top-p | 0.9 | Recommended |
| Top-k | 50 | Optional |
Using the Chat Template
The tokenizer includes a built-in chat template that handles prompt formatting automatically:
messages = [
{"role": "user", "content": "Create a dark-themed portfolio website with project cards"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# This automatically wraps it in ### Instruction: / ### Response: format
Using the Example Script
# Single prompt
python example.py "Create a landing page for a coffee shop"
# Interactive mode
python example.py --interactive
Example Outputs
| Prompt | What You Get |
|---|---|
| "Create a portfolio with a hero and project cards" | Nav, animated hero, glassmorphism cards, contact form, footer |
| "Create a landing page for a fitness app" | Hero gradient, feature cards, testimonials, CTA, footer |
| "Create a pricing page with 3 tiers" | Toggle monthly/yearly, feature lists, highlighted plan |
| "Create a login page with split layout" | Gradient left, form right, social login buttons |
What the Model Generates
When properly configured, WebICoder v3 produces:
- β
Complete
<!DOCTYPE html>with<head>,<meta>,<title> - β
Vanilla CSS β custom properties, gradients, glassmorphism,
backdrop-filter - β
Responsive design β
@mediaqueries,clamp(), CSS Gridauto-fit - β
Animations β
fade-inwithIntersectionObserver, hover transitions - β Modern design β gradient text, blur effects, rounded corners, shadows
- β Complete pages β nav, hero, content sections, footer
Limitations
- Optimized for single-page HTML with embedded CSS/JS
- Context window: 4096 tokens β very complex multi-section pages may still be truncated
- Based on Phi-2 (2.7B) β larger models will produce more sophisticated output
- English prompts work best
Training Details
| Property | Value |
|---|---|
| Base Model | microsoft/phi-2 |
| Fine-tuning | Full fine-tuning on HTML/CSS code pairs |
| Training Format | Alpaca-style (Instruction / Response) |
| Training Context | 4096 tokens |
| Precision | float16 |
| Quantization | Post-training 8-bit (MLX affine, group_size=64) |
Files Included
| File | Description |
|---|---|
model.safetensors |
Quantized model weights |
config.json |
Model architecture configuration |
tokenizer.json |
Tokenizer vocabulary |
tokenizer_config.json |
Tokenizer settings with chat template |
generation_config.json |
Recommended generation parameters |
example.py |
Ready-to-use example script with all mandatory rules |
LICENSE |
MIT License |
Citation
@misc{webicoder-v3,
title={WebICoder v3: Fine-tuned Phi-2 for HTML Code Generation},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/YOUR_USERNAME/WebICoder-v3-MLX-8bit}
}
License
MIT License β see LICENSE for details.
- Downloads last month
- 41
8-bit
Model tree for nexsendev/webicoder-v3-mlx-q8
Base model
microsoft/phi-2