|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- graphic-design |
|
|
- design-generation |
|
|
- layout-planning |
|
|
- qwen3 |
|
|
base_model: Qwen/Qwen3-8B |
|
|
--- |
|
|
|
|
|
# DesignAsCode Semantic Planner |
|
|
|
|
|
The Semantic Planner for the [DesignAsCode](https://github.com/liuziyuan1109/design-as-code) pipeline. Given a natural-language design request, it generates a structured design plan β including layout reasoning, layer grouping, image generation prompts, and text element specifications. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
| | | |
|
|
|---|---| |
|
|
| **Base Model** | Qwen3-8B | |
|
|
| **Fine-tuning** | Supervised Fine-Tuning (SFT) | |
|
|
| **Size** | 16 GB (fp16) | |
|
|
| **Context Window** | 8,192 tokens | |
|
|
|
|
|
## Training Data |
|
|
|
|
|
Trained on ~10k examples sampled from the [DesignAsCode Training Data](https://huggingface.co/datasets/Tony1109/DesignAsCode-training-data), which contains 19,479 design samples distilled from the [Crello](https://huggingface.co/datasets/cyberagent/crello) dataset using GPT-4o and GPT-o3. No additional data was used. |
|
|
|
|
|
### Training Format |
|
|
|
|
|
- **Input:** `prompt` β natural-language design request |
|
|
- **Output:** `layout_thought` + `grouping` + `image_generator` + `generate_text` |
|
|
|
|
|
See the [training data repo](https://huggingface.co/datasets/Tony1109/DesignAsCode-training-data) for field details. |
|
|
|
|
|
## Training Configuration |
|
|
|
|
|
| | | |
|
|
|---|---| |
|
|
| **Batch Size** | 1 | |
|
|
| **Gradient Accumulation** | 2 | |
|
|
| **Learning Rate** | 5e-5 (AdamW) | |
|
|
| **Epochs** | 2 | |
|
|
| **Max Sequence Length** | 8,192 tokens | |
|
|
| **Precision** | bfloat16 | |
|
|
| **Loss** | Completion-only (only on generated tokens) | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
import torch |
|
|
|
|
|
model_path = "Tony1109/DesignAsCode-planner" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_path) |
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_path, |
|
|
torch_dtype=torch.float16, |
|
|
device_map="auto" |
|
|
) |
|
|
``` |
|
|
|
|
|
For full pipeline usage (plan β implement β reflection), see the [project repo](https://github.com/liuziyuan1109/design-as-code) and [QUICKSTART.md](https://github.com/liuziyuan1109/design-as-code/blob/main/QUICKSTART.md). |
|
|
|
|
|
## Outputs |
|
|
|
|
|
The model generates semi-structured text with XML tags: |
|
|
|
|
|
- `<layout_thought>...</layout_thought>` β detailed layout reasoning |
|
|
- `<grouping>...</grouping>` β JSON array grouping related layers with thematic labels |
|
|
- `<image_generator>...</image_generator>` β JSON array of per-layer image generation prompts |
|
|
- `<generate_text>...</generate_text>` β JSON array of text element specifications (font, size, alignment, etc.) |
|
|
|
|
|
## Ethical Considerations |
|
|
|
|
|
- Designs should be reviewed by humans before production use. |
|
|
- May reflect biases present in the training data. |
|
|
- Generated content should be checked for copyright compliance. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{liu2025designascode, |
|
|
title = {DesignAsCode: Bridging Structural Editability and |
|
|
Visual Fidelity in Graphic Design Generation}, |
|
|
author = {Liu, Ziyuan and Sun, Shizhao and Huang, Danqing |
|
|
and Shi, Yingdong and Zhang, Meisheng and Li, Ji |
|
|
and Yu, Jingsong and Bian, Jiang}, |
|
|
journal = {arXiv preprint}, |
|
|
year = {2025} |
|
|
} |
|
|
``` |
|
|
|