README.md · nexsendev/webicoder-v3-mlx-q4 at main

File size: 9,055 Bytes

2c07d60

---
license: mit
language:
  - en
tags:
  - mlx
  - phi-2
  - html
  - css
  - web-development
  - code-generation
  - fine-tuned
  - apple-silicon
base_model: microsoft/phi-2
pipeline_tag: text-generation
library_name: mlx
model-index:
  - name: WebICoder-v3-MLX-4bit
    results: []
---

# ⚡ WebICoder v3 — HTML Code Generation (MLX 4-bit)

**WebICoder v3** is a fine-tuned version of [Microsoft Phi-2](https://huggingface.co/microsoft/phi-2) (2.7B parameters) specialized in generating **complete, production-ready HTML/CSS websites** from natural language descriptions.

Optimized for **Apple Silicon** via [MLX](https://github.com/ml-explore/mlx).

## Model Details

| Property | Value |
|---|---|
| **Base Model** | Microsoft Phi-2 (2.7B parameters) |
| **Architecture** | PhiForCausalLM (32 layers, 2560 hidden) |
| **Format** | MLX (Apple Silicon optimized) |
| **Quantization** | 4-bit (4.504 bits/weight, affine) |
| **Size** | ~1.5 GB |
| **Context Length** | 4096 tokens |
| **Task** | HTML/CSS Code Generation |
| **Speed** | ~20-40 tok/s on M-series Mac |

## Also Available

| Variant | Link | Size |
|---|---|---|
| **8-bit** (higher quality) | `YOUR_USERNAME/WebICoder-v3-MLX-8bit` | ~2.9 GB |

---

## ⚠️ MANDATORY — Read Before Using

> **If you skip these steps, the model will produce broken, repeated, or low-quality output.**
> Follow ALL 5 rules below to get the best results.

### Rule 1 — Use the correct prompt format

The model was trained with an **Alpaca-style format**. You MUST wrap your prompt like this:

```
### Instruction:
{your website description here}

### Response:
```

❌ **DO NOT** send raw text like `"Create a website"` — the model won't understand it correctly.

✅ **DO** use the format above, or use `tokenizer.apply_chat_template()` which does it automatically.

### Rule 2 — ALWAYS stop at `</html>`

The model does not always emit an EOS token after finishing the HTML. You **MUST** check for `</html>` in the output and stop generation when you see it.

```python
# ✅ Correct — stop at </html>
for response in stream_generate(model, tokenizer, prompt=prompt, max_tokens=4096, sampler=sampler):
    full_text += response.text
    if "</html>" in full_text:
        break
```

❌ Without this, the model will **repeat the entire page** in a loop.

### Rule 3 — Use repetition penalty

A repetition penalty is **essential** to prevent the model from generating duplicate sections (e.g., the same footer twice, identical testimonials).

```python
from mlx_lm.sample_utils import make_logits_processors

logits_processors = make_logits_processors(repetition_penalty=1.2, repetition_context_size=256)
```

Then pass `logits_processors=logits_processors` to `stream_generate()`.

### Rule 4 — Use low temperature (0.3 – 0.5)

High temperature (> 0.7) produces incoherent, broken HTML. **Always use 0.3 – 0.5**.

```python
from mlx_lm.sample_utils import make_sampler

sampler = make_sampler(temp=0.4)  # ✅ Recommended
```

### Rule 5 — Post-process the output

The model may occasionally prepend training artifacts (system prompt) before the HTML. **Always clean the output:**

```python
import re

def clean_html(text: str) -> str:
    """Extract clean HTML from model output."""
    # Remove leaked system prompts
    text = re.sub(r"You are (?:Deep|Web[iI])coder.*?production-ready code\.\n*", "", text, flags=re.DOTALL)
    text = re.sub(r"### Instruction:.*", "", text, flags=re.DOTALL)
    text = re.sub(r"### Response:\s*", "", text, flags=re.DOTALL)

    # Extract HTML document
    match = re.search(r"(<(?:!DOCTYPE\s+html|html)[\s\S]*?</html>)", text, re.IGNORECASE)
    if match:
        return match.group(1).strip()

    # Fallback
    start = re.search(r"<(?:!DOCTYPE|html|head|body)", text, re.IGNORECASE)
    if start:
        html = text[start.start():].strip()
        if not html.lower().startswith("<!doctype"):
            html = "<!DOCTYPE html>\n<html>\n" + html + "\n</html>"
        return html

    return text.strip()
```

---

## Quick Start — Complete Working Example

Copy-paste this and it will work:

```python
from mlx_lm import load, stream_generate
from mlx_lm.sample_utils import make_sampler, make_logits_processors
import re

# 1. Load model
model, tokenizer = load("YOUR_USERNAME/WebICoder-v3-MLX-4bit")

# 2. Format prompt (MANDATORY)
user_prompt = "Create a modern portfolio website with a hero, project cards, and a contact form"

prompt = f"""### Instruction:
{user_prompt}

### Response:
"""

# 3. Configure sampler + repetition penalty (MANDATORY)
sampler = make_sampler(temp=0.4)
logits_processors = make_logits_processors(repetition_penalty=1.2, repetition_context_size=256)

# 4. Generate with stop at </html> (MANDATORY)
full_text = ""
for response in stream_generate(
    model, tokenizer,
    prompt=prompt,
    max_tokens=4096,
    sampler=sampler,
    logits_processors=logits_processors,
):
    full_text += response.text
    print(response.text, end="", flush=True)

    if "</html>" in full_text or response.finish_reason:
        break

# 5. Clean output (MANDATORY)
def clean_html(text):
    text = re.sub(r"You are (?:Deep|Web[iI])coder.*?production-ready code\.\n*", "", text, flags=re.DOTALL)
    match = re.search(r"(<(?:!DOCTYPE\s+html|html)[\s\S]*?</html>)", text, re.IGNORECASE)
    return match.group(1).strip() if match else text.strip()

html = clean_html(full_text)

# Save to file
with open("output.html", "w") as f:
    f.write(html)
print(f"\n\nSaved to output.html ({len(html)} chars)")
```

---

## Recommended Parameters Summary

| Parameter | Value | Mandatory? |
|---|---|:---:|
| **Prompt format** | `### Instruction:` / `### Response:` | ✅ YES |
| **Temperature** | 0.3 – 0.5 | ✅ YES |
| **Repetition Penalty** | 1.2 | ✅ YES |
| **Repetition Context** | 256 | ✅ YES |
| **Max Tokens** | 4096 | ✅ YES |
| **Stop at `</html>`** | Check output and break | ✅ YES |
| **Post-processing** | `clean_html()` function | ✅ YES |
| **Top-p** | 0.9 | Recommended |
| **Top-k** | 50 | Optional |

---

## Using the Chat Template

The tokenizer includes a built-in chat template that handles prompt formatting automatically:

```python
messages = [
    {"role": "user", "content": "Create a dark-themed portfolio website with project cards"}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# This automatically wraps it in ### Instruction: / ### Response: format
```

## Using the Example Script

```bash
# Single prompt
python example.py "Create a landing page for a coffee shop"

# Interactive mode
python example.py --interactive
```

---

## Example Outputs

| Prompt | What You Get |
|---|---|
| "Create a portfolio with a hero and project cards" | Nav, animated hero, glassmorphism cards, contact form, footer |
| "Create a landing page for a fitness app" | Hero gradient, feature cards, testimonials, CTA, footer |
| "Create a pricing page with 3 tiers" | Toggle monthly/yearly, feature lists, highlighted plan |
| "Create a login page with split layout" | Gradient left, form right, social login buttons |

---

## What the Model Generates

When properly configured, WebICoder v3 produces:

- ✅ Complete `<!DOCTYPE html>` with `<head>`, `<meta>`, `<title>`
- ✅ **Vanilla CSS** — custom properties, gradients, glassmorphism, `backdrop-filter`
- ✅ **Responsive design** — `@media` queries, `clamp()`, CSS Grid `auto-fit`
- ✅ **Animations** — `fade-in` with `IntersectionObserver`, hover transitions
- ✅ **Modern design** — gradient text, blur effects, rounded corners, shadows
- ✅ **Complete pages** — nav, hero, content sections, footer

---

## Limitations

- Optimized for **single-page HTML** with embedded CSS/JS
- Context window: **4096 tokens** — very complex multi-section pages may still be truncated
- Based on Phi-2 (2.7B) — larger models will produce more sophisticated output
- English prompts work best

---

## Training Details

| Property | Value |
|---|---|
| **Base Model** | microsoft/phi-2 |
| **Fine-tuning** | Full fine-tuning on HTML/CSS code pairs |
| **Training Format** | Alpaca-style (Instruction / Response) |
| **Training Context** | 4096 tokens |
| **Precision** | float16 |
| **Quantization** | Post-training 4-bit (MLX affine, group_size=64) |

---

## Files Included

| File | Description |
|---|---|
| `model.safetensors` | Quantized model weights |
| `config.json` | Model architecture configuration |
| `tokenizer.json` | Tokenizer vocabulary |
| `tokenizer_config.json` | Tokenizer settings with chat template |
| `generation_config.json` | Recommended generation parameters |
| `example.py` | Ready-to-use example script with all mandatory rules |
| `LICENSE` | MIT License |

---

## Citation

```bibtex
@misc{webicoder-v3,
  title={WebICoder v3: Fine-tuned Phi-2 for HTML Code Generation},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/YOUR_USERNAME/WebICoder-v3-MLX-4bit}
}
```

## License

MIT License — see [LICENSE](LICENSE) for details.