LiteVit5
/

model

PyTorch

litevit5

custom_code

Model card Files Files and versions

xet

Community

litvit5 commited on Jan 16

Commit

00e5ca7

1 Parent(s): 63b8b5e

abstract

Browse files

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -6,9 +6,9 @@ base_model:
   - google/siglip2-base-patch16-512
 ---
-# LiteVit5 - Image-to-HTML Model
-A lightweight transformer model combining SigLIP vision encoder with T5 seq2seq decoder for image-to-text generation tasks.
 ## Model Architecture

   - google/siglip2-base-patch16-512
 ---
+# No Code, No Cloud: On-Device Mockup-to-Code with Lightweight Vision-Language AI
+Bridging the gap between visual design and functional code remains a persistent challenge in modern UI workflows, especially for small teams and non-programmers. Existing solutions, such as Figma-to-code tools and recent vision-language models (VLMs), often depend on proprietary cloud APIs or large-scale architectures, limiting offline operation, privacy, and control. We present LiteViT5, a lightweight, on-device vision-language model that directly generates HTML from images of design mockups, enabling private, no-code prototyping without cloud infrastructure. Built on a compact ViT–T5 encoder–decoder framework with 235M parameters, LiteViT5 achieves competitive results on both in-distribution (WebSight) and out-of-distribution (Design2Code) benchmarks. We evaluate its performance across structure, position, color, and CLIP-based similarity metrics and report its comparable performance to models 10–30× larger such as PaliGemma-3B, LLaVA-7B, and DeepSeek-VL-7B. We further assess LiteViT5 in a user study with 24 participants assessing perceived accuracy, code quality, and editability. Our findings show that LiteViT5 supports rapid design iteration, reduces reliance on developer handoff, making it a practical, assistive tool for democratizing web interface creation. This work highlights the potential of efficient, human-centered generative AI to empower interface design beyond expert-only workflows. To support transparency and reproducibility, we release LiteViT5 as an open-source model on Hugging Face: https://huggingface.co/LiteVit5/model.
 ## Model Architecture