README.md · nexsendev/webicoder-v3-mlx-q8 at main

webicoder-v3-mlx-q8 / README.md

nexsendev

Add files using upload-large-folder tool

fea7f6b verified 6 days ago

preview code

raw

history blame contribute delete

9.06 kB

	---
	license: mit
	language:
	- en
	tags:
	- mlx
	- phi-2
	- html
	- css
	- web-development
	- code-generation
	- fine-tuned
	- apple-silicon
	base_model: microsoft/phi-2
	pipeline_tag: text-generation
	library_name: mlx
	model-index:
	- name: WebICoder-v3-MLX-8bit
	results: []
	---

	# ⚡ WebICoder v3 — HTML Code Generation (MLX 8-bit)

	WebICoder v3 is a fine-tuned version of [Microsoft Phi-2](https://huggingface.co/microsoft/phi-2) (2.7B parameters) specialized in generating complete, production-ready HTML/CSS websites from natural language descriptions.

	Optimized for Apple Silicon via [MLX](https://github.com/ml-explore/mlx).

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base Model \| Microsoft Phi-2 (2.7B parameters) \|
	\| Architecture \| PhiForCausalLM (32 layers, 2560 hidden) \|
	\| Format \| MLX (Apple Silicon optimized) \|
	\| Quantization \| 8-bit (8.503 bits/weight, affine) \|
	\| Size \| ~2.9 GB \|
	\| Context Length \| 4096 tokens \|
	\| Task \| HTML/CSS Code Generation \|
	\| Speed \| ~12-20 tok/s on M-series Mac \|

	## Also Available

	\| Variant \| Link \| Size \|
	\|---\|---\|---\|
	\| 8-bit (higher quality) \| `YOUR_USERNAME/WebICoder-v3-MLX-8bit` \| ~2.9 GB \|

	---

	## ⚠️ MANDATORY — Read Before Using

	> If you skip these steps, the model will produce broken, repeated, or low-quality output.
	> Follow ALL 5 rules below to get the best results.

	### Rule 1 — Use the correct prompt format

	The model was trained with an Alpaca-style format. You MUST wrap your prompt like this:

	```
	### Instruction:
	{your website description here}

	### Response:
	```

	❌ DO NOT send raw text like `"Create a website"` — the model won't understand it correctly.

	✅ DO use the format above, or use `tokenizer.apply_chat_template()` which does it automatically.

	### Rule 2 — ALWAYS stop at `</html>`

	The model does not always emit an EOS token after finishing the HTML. You MUST check for `</html>` in the output and stop generation when you see it.

	```python
	# ✅ Correct — stop at </html>
	for response in stream_generate(model, tokenizer, prompt=prompt, max_tokens=4096, sampler=sampler):
	full_text += response.text
	if "</html>" in full_text:
	break
	```

	❌ Without this, the model will repeat the entire page in a loop.

	### Rule 3 — Use repetition penalty

	A repetition penalty is essential to prevent the model from generating duplicate sections (e.g., the same footer twice, identical testimonials).

	```python
	from mlx_lm.sample_utils import make_logits_processors

	logits_processors = make_logits_processors(repetition_penalty=1.2, repetition_context_size=256)
	```

	Then pass `logits_processors=logits_processors` to `stream_generate()`.

	### Rule 4 — Use low temperature (0.3 – 0.5)

	High temperature (> 0.7) produces incoherent, broken HTML. Always use 0.3 – 0.5.

	```python
	from mlx_lm.sample_utils import make_sampler

	sampler = make_sampler(temp=0.4) # ✅ Recommended
	```

	### Rule 5 — Post-process the output

	The model may occasionally prepend training artifacts (system prompt) before the HTML. Always clean the output:

	```python
	import re

	def clean_html(text: str) -> str:
	"""Extract clean HTML from model output."""
	# Remove leaked system prompts
	text = re.sub(r"You are (?:Deep\|Web[iI])coder.?production-ready code\.\n", "", text, flags=re.DOTALL)
	text = re.sub(r"### Instruction:.*", "", text, flags=re.DOTALL)
	text = re.sub(r"### Response:\s*", "", text, flags=re.DOTALL)

	# Extract HTML document
	match = re.search(r"(<(?:!DOCTYPE\s+html\|html)[\s\S]*?</html>)", text, re.IGNORECASE)
	if match:
	return match.group(1).strip()

	# Fallback
	start = re.search(r"<(?:!DOCTYPE\|html\|head\|body)", text, re.IGNORECASE)
	if start:
	html = text[start.start():].strip()
	if not html.lower().startswith("<!doctype"):
	html = "<!DOCTYPE html>\n<html>\n" + html + "\n</html>"
	return html

	return text.strip()
	```

	---

	## Quick Start — Complete Working Example

	Copy-paste this and it will work:

	```python
	from mlx_lm import load, stream_generate
	from mlx_lm.sample_utils import make_sampler, make_logits_processors
	import re

	# 1. Load model
	model, tokenizer = load("YOUR_USERNAME/WebICoder-v3-MLX-8bit")

	# 2. Format prompt (MANDATORY)
	user_prompt = "Create a modern portfolio website with a hero, project cards, and a contact form"

	prompt = f"""### Instruction:
	{user_prompt}

	### Response:
	"""

	# 3. Configure sampler + repetition penalty (MANDATORY)
	sampler = make_sampler(temp=0.4)
	logits_processors = make_logits_processors(repetition_penalty=1.2, repetition_context_size=256)

	# 4. Generate with stop at </html> (MANDATORY)
	full_text = ""
	for response in stream_generate(
	model, tokenizer,
	prompt=prompt,
	max_tokens=4096,
	sampler=sampler,
	logits_processors=logits_processors,
	):
	full_text += response.text
	print(response.text, end="", flush=True)

	if "</html>" in full_text or response.finish_reason:
	break

	# 5. Clean output (MANDATORY)
	def clean_html(text):
	text = re.sub(r"You are (?:Deep\|Web[iI])coder.?production-ready code\.\n", "", text, flags=re.DOTALL)
	match = re.search(r"(<(?:!DOCTYPE\s+html\|html)[\s\S]*?</html>)", text, re.IGNORECASE)
	return match.group(1).strip() if match else text.strip()

	html = clean_html(full_text)

	# Save to file
	with open("output.html", "w") as f:
	f.write(html)
	print(f"\n\nSaved to output.html ({len(html)} chars)")
	```

	---

	## Recommended Parameters Summary

	\| Parameter \| Value \| Mandatory? \|
	\|---\|---\|:---:\|
	\| Prompt format \| `### Instruction:` / `### Response:` \| ✅ YES \|
	\| Temperature \| 0.3 – 0.5 \| ✅ YES \|
	\| Repetition Penalty \| 1.2 \| ✅ YES \|
	\| Repetition Context \| 256 \| ✅ YES \|
	\| Max Tokens \| 4096 \| ✅ YES \|
	\| Stop at `</html>` \| Check output and break \| ✅ YES \|
	\| Post-processing \| `clean_html()` function \| ✅ YES \|
	\| Top-p \| 0.9 \| Recommended \|
	\| Top-k \| 50 \| Optional \|

	---

	## Using the Chat Template

	The tokenizer includes a built-in chat template that handles prompt formatting automatically:

	```python
	messages = [
	{"role": "user", "content": "Create a dark-themed portfolio website with project cards"}
	]

	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	# This automatically wraps it in ### Instruction: / ### Response: format
	```

	## Using the Example Script

	```bash
	# Single prompt
	python example.py "Create a landing page for a coffee shop"

	# Interactive mode
	python example.py --interactive
	```

	---

	## Example Outputs

	\| Prompt \| What You Get \|
	\|---\|---\|
	\| "Create a portfolio with a hero and project cards" \| Nav, animated hero, glassmorphism cards, contact form, footer \|
	\| "Create a landing page for a fitness app" \| Hero gradient, feature cards, testimonials, CTA, footer \|
	\| "Create a pricing page with 3 tiers" \| Toggle monthly/yearly, feature lists, highlighted plan \|
	\| "Create a login page with split layout" \| Gradient left, form right, social login buttons \|

	---

	## What the Model Generates

	When properly configured, WebICoder v3 produces:

	- ✅ Complete `<!DOCTYPE html>` with `<head>`, `<meta>`, `<title>`
	- ✅ Vanilla CSS — custom properties, gradients, glassmorphism, `backdrop-filter`
	- ✅ Responsive design — `@media` queries, `clamp()`, CSS Grid `auto-fit`
	- ✅ Animations — `fade-in` with `IntersectionObserver`, hover transitions
	- ✅ Modern design — gradient text, blur effects, rounded corners, shadows
	- ✅ Complete pages — nav, hero, content sections, footer

	---

	## Limitations

	- Optimized for single-page HTML with embedded CSS/JS
	- Context window: 4096 tokens — very complex multi-section pages may still be truncated
	- Based on Phi-2 (2.7B) — larger models will produce more sophisticated output
	- English prompts work best

	---

	## Training Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base Model \| microsoft/phi-2 \|
	\| Fine-tuning \| Full fine-tuning on HTML/CSS code pairs \|
	\| Training Format \| Alpaca-style (Instruction / Response) \|
	\| Training Context \| 4096 tokens \|
	\| Precision \| float16 \|
	\| Quantization \| Post-training 8-bit (MLX affine, group_size=64) \|

	---

	## Files Included

	\| File \| Description \|
	\|---\|---\|
	\| `model.safetensors` \| Quantized model weights \|
	\| `config.json` \| Model architecture configuration \|
	\| `tokenizer.json` \| Tokenizer vocabulary \|
	\| `tokenizer_config.json` \| Tokenizer settings with chat template \|
	\| `generation_config.json` \| Recommended generation parameters \|
	\| `example.py` \| Ready-to-use example script with all mandatory rules \|
	\| `LICENSE` \| MIT License \|

	---

	## Citation

	```bibtex
	@misc{webicoder-v3,
	title={WebICoder v3: Fine-tuned Phi-2 for HTML Code Generation},
	year={2025},
	publisher={Hugging Face},
	url={https://huggingface.co/YOUR_USERNAME/WebICoder-v3-MLX-8bit}
	}
	```

	## License

	MIT License — see [LICENSE](LICENSE) for details.