Add model card, link to code, project page

This PR adds a model card for the paper [LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis](https://huggingface.co/papers/2503.21749).

Files changed (1) hide show

README.md +14 -5

README.md CHANGED Viewed

@@ -1,17 +1,22 @@
 ---
-license: mit
 datasets:
 - X-ART/LeX-10K
-pipeline_tag: text-to-image
 library_name: diffusers
 tags:
 - art
 - text-rendering
-base_model:
-- Alpha-VLLM/Lumina-Image-2.0
 ---
 **LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis**
 We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 22.16\% PNED gain, and LeX-FLUX outperforming baselines in color (+10.32\%), positional (+5.60\%), and font accuracy (+5.63\%). The codes, models, datasets, and demo are publicly available.
 ![demo](teaser.png)
 **Usage of LeX-Lumina:**
@@ -37,4 +42,8 @@ image = pipe(
 ).images[0]
 image.save("lex_lumina_demo.png")
-```

 ---
+base_model:
+- Alpha-VLLM/Lumina-Image-2.0
 datasets:
 - X-ART/LeX-10K
 library_name: diffusers
+license: mit
+pipeline_tag: text-to-image
 tags:
 - art
 - text-rendering
 ---
 **LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis**
+This repository contains the model presented in the paper [LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis](https://huggingface.co/papers/2503.21749).
+The abstract of the paper is the following:
 We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 22.16\% PNED gain, and LeX-FLUX outperforming baselines in color (+10.32\%), positional (+5.60\%), and font accuracy (+5.63\%). The codes, models, datasets, and demo are publicly available.
 ![demo](teaser.png)
 **Usage of LeX-Lumina:**
 ).images[0]
 image.save("lex_lumina_demo.png")
+```
+See also:
+* [Project page](https://zhaoshitian.github.io/lexart/)
+* [Code](https://github.com/zhaoshitian/LeX-Art)