⚑ WebICoder v3 β€” HTML Code Generation (MLX 8-bit)

WebICoder v3 is a fine-tuned version of Microsoft Phi-2 (2.7B parameters) specialized in generating complete, production-ready HTML/CSS websites from natural language descriptions.

Optimized for Apple Silicon via MLX.

Model Details

Property Value
Base Model Microsoft Phi-2 (2.7B parameters)
Architecture PhiForCausalLM (32 layers, 2560 hidden)
Format MLX (Apple Silicon optimized)
Quantization 8-bit (8.503 bits/weight, affine)
Size ~2.9 GB
Context Length 4096 tokens
Task HTML/CSS Code Generation
Speed ~12-20 tok/s on M-series Mac

Also Available

Variant Link Size
8-bit (higher quality) YOUR_USERNAME/WebICoder-v3-MLX-8bit ~2.9 GB

⚠️ MANDATORY β€” Read Before Using

If you skip these steps, the model will produce broken, repeated, or low-quality output. Follow ALL 5 rules below to get the best results.

Rule 1 β€” Use the correct prompt format

The model was trained with an Alpaca-style format. You MUST wrap your prompt like this:

### Instruction:
{your website description here}

### Response:

❌ DO NOT send raw text like "Create a website" β€” the model won't understand it correctly.

βœ… DO use the format above, or use tokenizer.apply_chat_template() which does it automatically.

Rule 2 β€” ALWAYS stop at </html>

The model does not always emit an EOS token after finishing the HTML. You MUST check for </html> in the output and stop generation when you see it.

# βœ… Correct β€” stop at </html>
for response in stream_generate(model, tokenizer, prompt=prompt, max_tokens=4096, sampler=sampler):
    full_text += response.text
    if "</html>" in full_text:
        break

❌ Without this, the model will repeat the entire page in a loop.

Rule 3 β€” Use repetition penalty

A repetition penalty is essential to prevent the model from generating duplicate sections (e.g., the same footer twice, identical testimonials).

from mlx_lm.sample_utils import make_logits_processors

logits_processors = make_logits_processors(repetition_penalty=1.2, repetition_context_size=256)

Then pass logits_processors=logits_processors to stream_generate().

Rule 4 β€” Use low temperature (0.3 – 0.5)

High temperature (> 0.7) produces incoherent, broken HTML. Always use 0.3 – 0.5.

from mlx_lm.sample_utils import make_sampler

sampler = make_sampler(temp=0.4)  # βœ… Recommended

Rule 5 β€” Post-process the output

The model may occasionally prepend training artifacts (system prompt) before the HTML. Always clean the output:

import re

def clean_html(text: str) -> str:
    """Extract clean HTML from model output."""
    # Remove leaked system prompts
    text = re.sub(r"You are (?:Deep|Web[iI])coder.*?production-ready code\.\n*", "", text, flags=re.DOTALL)
    text = re.sub(r"### Instruction:.*", "", text, flags=re.DOTALL)
    text = re.sub(r"### Response:\s*", "", text, flags=re.DOTALL)

    # Extract HTML document
    match = re.search(r"(<(?:!DOCTYPE\s+html|html)[\s\S]*?</html>)", text, re.IGNORECASE)
    if match:
        return match.group(1).strip()

    # Fallback
    start = re.search(r"<(?:!DOCTYPE|html|head|body)", text, re.IGNORECASE)
    if start:
        html = text[start.start():].strip()
        if not html.lower().startswith("<!doctype"):
            html = "<!DOCTYPE html>\n<html>\n" + html + "\n</html>"
        return html

    return text.strip()

Quick Start β€” Complete Working Example

Copy-paste this and it will work:

from mlx_lm import load, stream_generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors
import re

# 1. Load model
model, tokenizer = load("YOUR_USERNAME/WebICoder-v3-MLX-8bit")

# 2. Format prompt (MANDATORY)
user_prompt = "Create a modern portfolio website with a hero, project cards, and a contact form"

prompt = f"""### Instruction:
{user_prompt}

### Response:
"""

# 3. Configure sampler + repetition penalty (MANDATORY)
sampler = make_sampler(temp=0.4)
logits_processors = make_logits_processors(repetition_penalty=1.2, repetition_context_size=256)

# 4. Generate with stop at </html> (MANDATORY)
full_text = ""
for response in stream_generate(
    model, tokenizer,
    prompt=prompt,
    max_tokens=4096,
    sampler=sampler,
    logits_processors=logits_processors,
):
    full_text += response.text
    print(response.text, end="", flush=True)

    if "</html>" in full_text or response.finish_reason:
        break

# 5. Clean output (MANDATORY)
def clean_html(text):
    text = re.sub(r"You are (?:Deep|Web[iI])coder.*?production-ready code\.\n*", "", text, flags=re.DOTALL)
    match = re.search(r"(<(?:!DOCTYPE\s+html|html)[\s\S]*?</html>)", text, re.IGNORECASE)
    return match.group(1).strip() if match else text.strip()

html = clean_html(full_text)

# Save to file
with open("output.html", "w") as f:
    f.write(html)
print(f"\n\nSaved to output.html ({len(html)} chars)")

Recommended Parameters Summary

Parameter Value Mandatory?
Prompt format ### Instruction: / ### Response: βœ… YES
Temperature 0.3 – 0.5 βœ… YES
Repetition Penalty 1.2 βœ… YES
Repetition Context 256 βœ… YES
Max Tokens 4096 βœ… YES
Stop at </html> Check output and break βœ… YES
Post-processing clean_html() function βœ… YES
Top-p 0.9 Recommended
Top-k 50 Optional

Using the Chat Template

The tokenizer includes a built-in chat template that handles prompt formatting automatically:

messages = [
    {"role": "user", "content": "Create a dark-themed portfolio website with project cards"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# This automatically wraps it in ### Instruction: / ### Response: format

Using the Example Script

# Single prompt
python example.py "Create a landing page for a coffee shop"

# Interactive mode
python example.py --interactive

Example Outputs

Prompt What You Get
"Create a portfolio with a hero and project cards" Nav, animated hero, glassmorphism cards, contact form, footer
"Create a landing page for a fitness app" Hero gradient, feature cards, testimonials, CTA, footer
"Create a pricing page with 3 tiers" Toggle monthly/yearly, feature lists, highlighted plan
"Create a login page with split layout" Gradient left, form right, social login buttons

What the Model Generates

When properly configured, WebICoder v3 produces:

  • βœ… Complete <!DOCTYPE html> with <head>, <meta>, <title>
  • βœ… Vanilla CSS β€” custom properties, gradients, glassmorphism, backdrop-filter
  • βœ… Responsive design β€” @media queries, clamp(), CSS Grid auto-fit
  • βœ… Animations β€” fade-in with IntersectionObserver, hover transitions
  • βœ… Modern design β€” gradient text, blur effects, rounded corners, shadows
  • βœ… Complete pages β€” nav, hero, content sections, footer

Limitations

  • Optimized for single-page HTML with embedded CSS/JS
  • Context window: 4096 tokens β€” very complex multi-section pages may still be truncated
  • Based on Phi-2 (2.7B) β€” larger models will produce more sophisticated output
  • English prompts work best

Training Details

Property Value
Base Model microsoft/phi-2
Fine-tuning Full fine-tuning on HTML/CSS code pairs
Training Format Alpaca-style (Instruction / Response)
Training Context 4096 tokens
Precision float16
Quantization Post-training 8-bit (MLX affine, group_size=64)

Files Included

File Description
model.safetensors Quantized model weights
config.json Model architecture configuration
tokenizer.json Tokenizer vocabulary
tokenizer_config.json Tokenizer settings with chat template
generation_config.json Recommended generation parameters
example.py Ready-to-use example script with all mandatory rules
LICENSE MIT License

Citation

@misc{webicoder-v3,
  title={WebICoder v3: Fine-tuned Phi-2 for HTML Code Generation},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/YOUR_USERNAME/WebICoder-v3-MLX-8bit}
}

License

MIT License β€” see LICENSE for details.

Downloads last month
41
Safetensors
Model size
0.8B params
Tensor type
F16
Β·
U32
Β·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nexsendev/webicoder-v3-mlx-q8

Base model

microsoft/phi-2
Quantized
(56)
this model