saadxsalman commited on
Commit
edf65db
·
verified ·
1 Parent(s): c3de38f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -13
README.md CHANGED
@@ -1,21 +1,132 @@
1
  ---
2
- base_model: LiquidAI/LFM2.5-VL-450M
3
- tags:
4
- - text-generation-inference
5
- - transformers
6
- - unsloth
7
- - lfm2_vl
8
- license: apache-2.0
9
  language:
10
  - en
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
- # Uploaded finetuned model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
- - **Developed by:** saadxsalman
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** LiquidAI/LFM2.5-VL-450M
 
 
 
 
 
 
 
 
 
18
 
19
- This lfm2_vl model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 
 
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
+ base_model: LiquidAI/LFM-2.5-450M-Instruct
6
+ tags:
7
+ - vision-language
8
+ - image-to-code
9
+ - tailwind-css
10
+ - web-development
11
+ - unsloth
12
+ - liquid-foundation-model
13
+ model_name: Liquid-Web (LFM2.5-VL-450M-WebSight)
14
+ datasets:
15
+ - HuggingFaceM4/WebSight
16
  ---
17
 
18
+ # Model Card: Liquid-Web (LFM2.5-VL-450M-WebSight)
19
+
20
+ ## Model Details
21
+ * **Model Type:** Vision-Language Model (VLM)
22
+ * **Base Model:** LiquidAI/LFM2.5-VL-450M
23
+ * **Architecture:** Liquid Foundation Model (LFM) with SigLIP2 Vision Encoder.
24
+ * **Fine-tuning Method:** LoRA (Low-Rank Adaptation)
25
+ * **Task:** Image-to-HTML/Tailwind CSS Generation (UI-to-Code)
26
+ * **Language:** English
27
+ * **License:** Apache 2.0 (Inherited from base)
28
+
29
+ ## Intended Use
30
+ ### Primary Use Case
31
+ This model is designed to take screenshots of web pages as input and generate the corresponding HTML code using Tailwind CSS utilities. It is intended for developers looking to automate the conversion of UI designs or existing web pages into clean, functional code.
32
+
33
+ ### Out-of-Scope Use
34
+ * General-purpose image captioning (it is highly specialized for code).
35
+ * Generating scripts for malicious web automation.
36
+ * Production-ready code without human review (the model may hallucinate specific color shades or precise pixel alignments).
37
+
38
+ ## Training Data
39
+ The model was fine-tuned on the **HuggingFaceM4/WebSight (v0.2)** dataset.
40
+ * **Format:** Paired images (screenshots) and text (HTML + Tailwind CSS).
41
+ * **Volume:** 1,000 high-quality samples (fine-tuned subset).
42
+ * **Content:** Diverse layouts including landing pages, dashboards, and portfolio sites.
43
 
44
+ ## Training Procedure
45
+ ### Hyperparameters
46
+ | Parameter | Value |
47
+ | :--- | :--- |
48
+ | LoRA Rank (r) | 64 |
49
+ | LoRA Alpha | 64 |
50
+ | Optimizer | AdamW (8-bit) |
51
+ | Learning Rate | 2e-4 |
52
+ | Batch Size | 1 (Per device) |
53
+ | Gradient Accumulation | 8 |
54
+ | Max Steps | 100 |
55
+ | Precision | 4-bit Quantization (NormalFloat4) |
56
 
57
+ **Frameworks Used:**
58
+ * **Unsloth:** For memory-efficient training and fast kernels.
59
+ * **TRL & PEFT:** For the Supervised Fine-Tuning (SFT) loop.
60
 
61
+ ## Performance & Limitations
62
+ ### Strengths
63
+ * **Extreme Efficiency:** At only 450M parameters, this model provides high-speed inference on edge devices (Jetson Orin, Mobile).
64
+ * **Modern CSS:** Specifically trained on Tailwind CSS v2.2+, avoiding bloated traditional CSS files.
65
+
66
+ ### Limitations
67
+ * **Hallucinations:** The model may occasionally invent Tailwind classes that do not exist (e.g., `text-custom-500`).
68
+ * **Complexity:** Very deep or complex nested layouts might result in truncated HTML due to the `max_seq_length` limit.
69
+ * **Resolution:** Fine details in very small text within screenshots may be missed by the vision encoder.
70
+
71
+ ## Example Output: User Signup Component
72
+ **Input Image:** A screenshot of a magenta-themed signup form.
73
+
74
+ **Generated Code:**
75
+ ```html
76
+ <body class="bg-pink-600 flex justify-center items-center h-screen">
77
+ <div class="bg-white p-8 rounded-lg shadow-xl w-full max-w-md text-center">
78
+ <h1 class="text-3xl font-semibold text-gray-800 mb-6">SignUp Form</h1>
79
+ <div class="space-y-4">
80
+ <div class="flex items-center border rounded p-2">
81
+ <span class="px-2 text-gray-500">👤</span>
82
+ <input type="text" placeholder="Enter Username" class="w-full outline-none">
83
+ </div>
84
+ <div class="flex items-center border rounded p-2">
85
+ <span class="px-2 text-gray-500">✉️</span>
86
+ <input type="email" placeholder="Enter Email" class="w-full outline-none">
87
+ </div>
88
+ <div class="flex items-center border rounded p-2">
89
+ <span class="px-2 text-gray-500">🔒</span>
90
+ <input type="password" placeholder="Create Password" class="w-full outline-none">
91
+ </div>
92
+ <div class="flex items-center border rounded p-2">
93
+ <span class="px-2 text-gray-500">🔒</span>
94
+ <input type="password" placeholder="Retype Password" class="w-full outline-none">
95
+ </div>
96
+ </div>
97
+ <button class="w-full bg-purple-700 text-white mt-6 py-2 rounded hover:bg-purple-800 transition">
98
+ SignUp
99
+ </button>
100
+ </div>
101
+ </body>
102
+ ```
103
+
104
+ ## How to Get Started
105
+ To use this model for inference, use the following code snippet:
106
+
107
+ ```python
108
+ from unsloth import FastLanguageModel
109
+ from transformers import AutoProcessor
110
+ from PIL import Image
111
+
112
+ model, tokenizer = FastLanguageModel.from_pretrained("saadxsalman/LFM-WebSight-Tailwind", load_in_4bit=True)
113
+ processor = AutoProcessor.from_pretrained("saadxsalman/LFM-WebSight-Tailwind")
114
+
115
+ # Inference logic goes here
116
+ ```
117
+
118
+ ## Citation
119
+ If you use this model, please cite the original WebSight technical report:
120
+
121
+ ```bibtex
122
+ @misc{laurençon2024unlocking,
123
+ title={Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset},
124
+ author={Hugo Laurençon and Léo Tronchon and Victor Sanh},
125
+ year={2024},
126
+ eprint={2403.09029},
127
+ archivePrefix={arXiv},
128
+ primaryClass={cs.HC}
129
+ }
130
+ ```
131
+ ---
132
+ ```