carminezacc commited on
Commit
dd08e95
·
verified ·
1 Parent(s): f092d2e

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -1,91 +1,43 @@
1
  ---
2
- title: Eruku - Autoregressive Handwriting Generation
3
  emoji: 🖋️
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app.py
9
- pinned: false
10
- license: mit
11
- python_version: 3.10
 
 
 
 
 
 
 
12
  ---
13
 
14
- # 🖋️ Eruku - Autoregressive Styled Text Image Generation
15
 
16
- Generate realistic handwritten and styled text using the Eruku model!
17
-
18
- [![arXiv](https://img.shields.io/badge/arXiv-2510.23240-b31b1b.svg)](https://arxiv.org/abs/2510.23240)
19
- [![Website](https://img.shields.io/badge/Website-eruku.carminezacc.com-blue)](https://eruku.carminezacc.com)
20
-
21
- ## About
22
-
23
- **Eruku** is a state-of-the-art autoregressive model for styled text image generation, particularly excelling at handwritten text generation (HTG). Based on the paper ["Autoregressive Styled Text Image Generation, but Make it Reliable"](https://arxiv.org/abs/2510.23240) by Carmine Zaccagnino, Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Alessio Tonioni, and Rita Cucchiara, Eruku addresses key limitations of previous approaches while maintaining their strengths.
24
-
25
- ### Key Innovations
26
-
27
- - **No Style Text Required**: Unlike previous methods, Eruku doesn't require transcriptions of style images, making it more practical for real-world use
28
- - **Reliable Generation**: Proper stop mechanism prevents repetition loops and visual artifacts
29
- - **Special Token Alignment**: Introduces special textual tokens (SOG/EOG) for better alignment between text and visual representations
30
- - **Classifier-Free Guidance**: Implements CFG for improved control over style adherence and text fidelity
31
- - **Arbitrary Length**: Can generate text images of any length without architectural constraints
32
-
33
- ### Architecture
34
-
35
- The model combines:
36
-
37
- - **T5 Transformer**: Autoregressive text encoder for understanding and generation control
38
- - **VAE (Variational Autoencoder)**: Efficient image tokenizer for converting between pixel space and latent representations
39
- - **Autoregressive Decoder**: Generates visual embeddings sequentially for smooth, natural-looking text
40
-
41
- ## How to Use
42
-
43
- 1. **Style Image** (Optional): Upload a handwriting or typewritten sample image to mimic its style
44
- 2. **Style Text** (Optional): Enter the text from the style image (helps with style transfer)
45
- 3. **Text to Generate**: Enter the text you want to see in the chosen style
46
- 4. **CFG Scale (Text Guidance)**:
47
- - 1.0 = Very loose guidance (style image dominates)
48
- - 2.0-3.0 = Balanced
49
- - Higher values = More literal prompt following
50
- 5. Click **Generate** and wait for your styled text! (Max token budget fixed at 128 for stability)
51
 
52
  ## Features
53
 
54
- - ✨ High-quality handwriting synthesis
55
- - 🎨 Style control through reference images and text
56
- - 🧭 CFG slider controls **text** guidance strength
57
- - 📝 Works with both handwritten and typewritten styles
58
- - ⚡ Powered by ZeroGPU for free GPU access
59
- - 🔧 Adjustable generation parameters
60
 
61
- ## Technical Details
62
-
63
- The model uses:
64
- - T5-base as the backbone language model
65
- - Custom VAE for image generation
66
- - Continuous latent space representation
67
- - Autoregressive generation with special tokens
68
-
69
- ## Tips for Best Results
70
-
71
- - Upload a clear style image for better style mimicking
72
- - Keep text relatively short for best quality
73
- - Adjust CFG scale: lower for more style influence, higher for stricter text adherence
74
- - Style text is optional - the model works well without it
75
- - Token budget is fixed at 128 to match the research code's sweet spot
76
-
77
- ## Performance Highlights
78
 
79
- From the paper, Eruku demonstrates:
80
- - **Superior Text Adherence**: Lower Character Error Rate (CER) compared to previous methods
81
- - **Better Generalization**: Excellent performance on both handwritten and typewritten styles
82
- - **Style Consistency**: High-fidelity style replication while maintaining readability
83
- - **Efficient Training**: Simpler training process without requiring auxiliary networks
84
 
85
  ## Citation
86
 
87
- If you use Eruku in your research, please cite:
88
-
89
  ```bibtex
90
  @InProceedings{pippi2025zeroshot,
91
  author = {Pippi, Vittorio and Quattrini, Fabio and Cascianelli, Silvia and Tonioni, Alessio and Cucchiara, Rita},
@@ -100,26 +52,14 @@ If you use Eruku in your research, please cite:
100
  author = {Carmine Zaccagnino and Fabio Quattrini and Vittorio Pippi and Silvia Cascianelli and Alessio Tonioni and Rita Cucchiara},
101
  title = {Autoregressive Styled Text Image Generation, but Make it Reliable},
102
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
103
- month=3,
104
- year = 2026
105
  }
106
  ```
107
 
108
  ## Links
109
 
110
- - 📄 **Paper**: [arXiv:2510.23240](https://arxiv.org/abs/2510.23240)
111
- - 🌐 **Website**: [eruku.carminezacc.com](https://eruku.carminezacc.com)
112
- - 🤗 **Model**: [blowing-up-groundhogs/eruku](https://huggingface.co/blowing-up-groundhogs/eruku)
113
- - 💻 **Code**: Coming soon!
114
-
115
- ## License
116
-
117
- This project is licensed under the MIT License.
118
-
119
- ## Acknowledgments
120
-
121
- Built with:
122
- - [Gradio](https://gradio.app/) for the web interface
123
- - [Hugging Face](https://huggingface.co/) for model hosting and Spaces
124
- - [ZeroGPU](https://huggingface.co/zero-gpu-explorers) for free GPU access
125
 
 
1
  ---
2
+ title: Eruku - Styled Text Generation
3
  emoji: 🖋️
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: gradio
7
  sdk_version: 4.44.0
8
  app_file: app.py
9
+ pinned: true
10
+ license: apache-2.0
11
+ models:
12
+ - blowing-up-groundhogs/eruku
13
+ tags:
14
+ - handwriting-generation
15
+ - styled-text
16
+ - text-to-image
17
+ - autoregressive
18
+ short_description: Generate handwritten text in any style
19
  ---
20
 
21
+ # Eruku - Autoregressive Styled Text Image Generation
22
 
23
+ This Space demonstrates **Eruku**, a state-of-the-art model for generating handwritten and styled text images.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## Features
26
 
27
+ - **Zero-shot style transfer**: No training needed for new styles
28
+ - **No transcription required**: Works with just a style image
29
+ - **Reliable generation**: Proper EOG mechanism prevents artifacts
30
+ - **Arbitrary length**: Generate text of any length
 
 
31
 
32
+ ## How to Use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
+ 1. Upload a handwriting or font sample as the style image
35
+ 2. Optionally enter the text from the style image
36
+ 3. Enter the text you want to generate
37
+ 4. Click Generate!
 
38
 
39
  ## Citation
40
 
 
 
41
  ```bibtex
42
  @InProceedings{pippi2025zeroshot,
43
  author = {Pippi, Vittorio and Quattrini, Fabio and Cascianelli, Silvia and Tonioni, Alessio and Cucchiara, Rita},
 
52
  author = {Carmine Zaccagnino and Fabio Quattrini and Vittorio Pippi and Silvia Cascianelli and Alessio Tonioni and Rita Cucchiara},
53
  title = {Autoregressive Styled Text Image Generation, but Make it Reliable},
54
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
55
+ month = {March},
56
+ year = {2026}
57
  }
58
  ```
59
 
60
  ## Links
61
 
62
+ - 📄 [Paper](https://arxiv.org/abs/2510.23240)
63
+ - 🌐 [Website](https://eruku.carminezacc.com)
64
+ - 🤗 [Model](https://huggingface.co/blowing-up-groundhogs/eruku)
 
 
 
 
 
 
 
 
 
 
 
 
65
 
app.py CHANGED
@@ -1,279 +1,306 @@
 
 
 
 
 
 
 
1
  import gradio as gr
2
  import torch
3
  import spaces
4
- import os
5
  from pathlib import Path
6
  from PIL import Image
7
  import numpy as np
8
- from huggingface_hub import hf_hub_download
9
-
10
- # Import the model
11
- from eruku_continuous_inf import Emuru
12
 
13
  # Global model variable
14
  model = None
 
 
15
 
16
  def load_model():
17
- """Load the Emuru model with checkpoints"""
18
- global model
 
19
  if model is None:
20
- print("Loading model...")
 
21
 
22
- # Model repository
23
- MODEL_REPO = "blowing-up-groundhogs/eruku"
24
 
25
- t5_checkpoint = 'google-t5/t5-base'
26
- vae_checkpoint = 'blowing-up-groundhogs/emuru_vae'
27
-
28
- # Download OCR checkpoint
29
- print("Downloading OCR checkpoint...")
30
- ocr_checkpoint_path = hf_hub_download(
31
- repo_id=MODEL_REPO,
32
- filename="origami.pth",
33
- cache_dir="./checkpoints"
34
  )
 
 
35
 
36
- try:
37
- print("Initializing model...")
38
- model = Emuru(
39
- t5_checkpoint=t5_checkpoint,
40
- vae_checkpoint=vae_checkpoint,
41
- ocr_checkpoint=ocr_checkpoint_path,
42
- slices_per_query=1,
43
- channels=1
44
- )
45
-
46
- # Download and load trained model checkpoint
47
- print("Downloading trained model checkpoint...")
48
- trained_checkpoint_path = hf_hub_download(
49
- repo_id=MODEL_REPO,
50
- filename="000073688.pth",
51
- cache_dir="./checkpoints"
52
- )
53
-
54
- print(f"Loading trained weights from {trained_checkpoint_path}...")
55
- checkpoint = torch.load(trained_checkpoint_path, map_location='cpu', weights_only=False)
56
-
57
- # Load the state dict - handle different checkpoint formats
58
- if isinstance(checkpoint, dict):
59
- if 'model_state_dict' in checkpoint:
60
- model.load_state_dict(checkpoint['model_state_dict'], strict=False)
61
- elif 'state_dict' in checkpoint:
62
- model.load_state_dict(checkpoint['state_dict'], strict=False)
63
- else:
64
- # Assume the checkpoint itself is the state dict
65
- model.load_state_dict(checkpoint, strict=False)
66
- else:
67
- print("Warning: Unexpected checkpoint format")
68
-
69
- model.eval()
70
- print("✅ Model loaded successfully!")
71
-
72
- except Exception as e:
73
- print(f"❌ Error loading model: {e}")
74
- import traceback
75
- traceback.print_exc()
76
- raise e
77
 
78
  return model
79
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  @spaces.GPU
81
- def generate_handwriting(style_image, style_text, gen_text, cfg_scale=1.5):
 
 
 
 
 
82
  """
83
- Generate handwriting based on style image and generation text
84
 
85
  Args:
86
- style_image: PIL Image or None - style reference image
87
- style_text: Text from the style image (optional)
88
- gen_text: Text to generate in handwriting
89
- cfg_scale: Classifier-free guidance scale (1.0 = no guidance)
90
- max_tokens: Maximum number of tokens to generate
91
-
92
  Returns:
93
- PIL Image of generated handwriting
94
  """
95
  try:
 
 
 
 
 
 
 
 
96
  model = load_model()
97
 
98
  if model is None:
99
- return None, "Error: Model failed to load"
100
 
101
- if not gen_text or gen_text.strip() == "":
102
- return None, "Error: Please provide text to generate"
 
103
 
104
- # Move model to GPU (ZeroGPU will handle this)
105
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
106
- model = model.to(device)
 
 
107
 
108
- # Prepare style image - must match eruku_continuous_inf.py logic
109
- if style_image is not None:
110
- # Convert PIL image to RGB (VAE expects 3 channels)
111
- style_img_pil = style_image.convert('RGB')
112
- style_img_tensor = torch.from_numpy(np.array(style_img_pil)).float()
113
- # Normalize to [-1, 1] range
114
- style_img_tensor = (style_img_tensor / 127.5) - 1.0
115
- # Rearrange from (h, w, c) to (c, h, w)
116
- style_img_tensor = style_img_tensor.permute(2, 0, 1).to(device)
117
- style_img = [style_img_tensor]
118
- style_len = style_img_tensor.shape[-1]
119
- else:
120
- # Use minimal style image if none provided (3 channels, c h w format)
121
- style_img = [torch.ones(3, 64, 64).to(device)]
122
- style_len = 64
123
 
124
- # Encode inputs
125
- with torch.no_grad():
126
- inputs = model.get_model_inputs(
127
- style_img=style_img,
128
- gen_img=None,
129
- style_len=style_len,
130
- gen_len=None,
131
- max_img_len=128 * 8
132
- )
133
-
134
- # Generate
135
- output_img, special_sequence = model.generate(
136
- decoder_inputs_embeds_vae=inputs['decoder_inputs_embeds'],
137
- style_text=[style_text if style_text else ""],
138
- gen_text=[gen_text],
139
- cfg_scale=cfg_scale,
140
- max_new_tokens=128
141
- )
142
-
143
- # Convert to PIL Image
144
- output_img = output_img.cpu()
145
- output_img = (torch.clamp(output_img, -1, 1) + 1) * 127.5
146
- output_img = output_img.byte().squeeze().numpy()
147
-
148
- # Handle different dimensions
149
- if len(output_img.shape) == 2:
150
- pil_img = Image.fromarray(output_img, mode='L')
151
- elif output_img.shape[0] == 3:
152
- output_img = np.transpose(output_img, (1, 2, 0))
153
- pil_img = Image.fromarray(output_img, mode='RGB')
154
- else:
155
- pil_img = Image.fromarray(output_img[0], mode='L')
156
-
157
- return pil_img, "Generation successful!"
158
-
159
  except Exception as e:
160
- print(f"Error during generation: {e}")
161
  import traceback
162
  traceback.print_exc()
163
- return None, f"Error: {str(e)}"
164
 
165
- # Create Gradio interface
166
- with gr.Blocks(theme=gr.themes.Soft()) as demo:
167
- gr.Markdown("""
168
- # 🖋️ Eruku - Autoregressive Styled Text Image Generation
169
-
170
- Generate handwritten and styled text using **Eruku**, a state-of-the-art autoregressive model.
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
- Based on the paper: [**"Autoregressive Styled Text Image Generation, but Make it Reliable"**](https://arxiv.org/abs/2510.23240)
173
- 📄 [Paper](https://arxiv.org/abs/2510.23240) | 🌐 [Website](https://eruku.carminezacc.com)
 
 
 
 
 
 
174
 
175
- ### Key Features
176
- - **No style transcription required** - Just provide text to generate
177
- - 🎯 **Reliable generation** - Proper stop mechanism prevents artifacts
178
- - 📏 **Arbitrary length** - Generate text of any length
179
- - 🎨 **High fidelity** - Excellent style consistency and text readability
180
- - ⚡ **Classifier-Free Guidance** - Fine control over generation
181
 
182
- **How to use:**
183
- 1. Upload a style image (handwritten or typewritten sample) OR leave empty for default
184
- 2. Optionally enter the text from the style image (helps with style transfer)
185
- 3. Enter the text you want to generate in that style
186
- 4. Adjust CFG scale for **text guidance** (1.0 = almost unconstrained, higher = more literal prompt following)
187
- 5. Click Generate!
188
 
189
- ⚠️ **Note:** This demo uses ZeroGPU, so generation may take a moment while the GPU spins up.
 
 
 
 
 
 
 
 
 
190
  """)
191
 
192
  with gr.Row():
193
- with gr.Column():
194
  style_image_input = gr.Image(
195
- label="Style Image (Optional)",
196
  type="pil",
197
- sources=["upload"],
198
- height=150
 
199
  )
 
200
  style_text_input = gr.Textbox(
201
- label="Style Text (Optional)",
202
- placeholder="Text from the style image (if any)",
203
  lines=2,
204
- value=""
205
  )
 
206
  gen_text_input = gr.Textbox(
207
- label="Text to Generate",
208
- placeholder="Enter the text you want in the same style",
209
  lines=3,
210
- value="Hello World!"
211
  )
212
 
213
- cfg_scale = gr.Slider(
214
- minimum=1.0,
215
- maximum=5.0,
216
- value=1.5,
217
- step=0.1,
218
- label="CFG Scale",
219
- info="Controls text guidance strength (higher = more literal prompt)"
220
  )
221
-
222
- generate_btn = gr.Button("Generate ✨", variant="primary", size="lg")
223
 
224
- with gr.Column():
225
  output_image = gr.Image(
226
- label="Generated Handwriting",
227
- type="pil"
 
 
 
 
 
 
 
228
  )
229
- status_text = gr.Textbox(label="Status", lines=2)
230
 
231
- # Examples with sample images
232
- gr.Examples(
233
- examples=[
234
- ["example_images/handwritten_1.png", "", "The quick brown fox jumps over the lazy dog", 1.5],
235
- ["example_images/handwritten_2.png", "", "Hello from handwritten style!", 1.5],
236
- ["example_images/handwritten_3.png", "", "Artificial Intelligence and Machine Learning", 2.0],
237
- ["example_images/typewritten_1.png", "", "This is typewritten style generation", 1.5],
238
- ],
239
- inputs=[style_image_input, style_text_input, gen_text_input, cfg_scale],
240
- label="Example Styles"
241
- )
242
 
243
  # Connect the generation function
244
  generate_btn.click(
245
  fn=generate_handwriting,
246
- inputs=[style_image_input, style_text_input, gen_text_input, cfg_scale],
 
 
 
 
 
 
 
247
  outputs=[output_image, status_text]
248
  )
249
 
250
  gr.Markdown("""
251
  ---
252
- ### About Eruku
253
 
254
- **Eruku** is an autoregressive model for styled text image generation from the paper
255
- ["Autoregressive Styled Text Image Generation, but Make it Reliable"](https://arxiv.org/abs/2510.23240)
256
- by Carmine Zaccagnino, Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Alessio Tonioni, and Rita Cucchiara
 
257
 
258
- The model improves upon previous approaches by:
259
- - Eliminating the need for style text transcriptions
260
- - Introducing special tokens (SOG/EOG) for better text-visual alignment
261
- - Implementing Classifier-Free Guidance for autoregressive generation
262
- - Providing more reliable generation with proper stop mechanisms
263
 
264
- **Tips:**
265
- - Upload a style image to mimic its handwriting or font style
266
- - Style text helps but is optional - the model can work with images alone!
267
- - Higher CFG scales enforce the prompt more strongly (use lower values if you want the style image to dominate)
268
- - The model generalizes well to both handwritten and typewritten styles
269
- - Token budget is fixed to 128 for consistent results
270
 
271
- **Links:**
272
- 📄 [Paper](https://arxiv.org/abs/2510.23240) | 🌐 [Website](https://eruku.carminezacc.com) | 🤗 [Model](https://huggingface.co/blowing-up-groundhogs/eruku)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
273
  """)
274
 
275
- # Launch the app
276
  if __name__ == "__main__":
277
- demo.queue(max_size=20)
278
  demo.launch()
279
 
 
1
+ """
2
+ Eruku Demo - Autoregressive Styled Text Image Generation
3
+
4
+ This Gradio demo showcases the Eruku model for generating handwritten and
5
+ styled text images. Upload a style reference image and generate text in that style.
6
+ """
7
+
8
  import gradio as gr
9
  import torch
10
  import spaces
11
+ import json
12
  from pathlib import Path
13
  from PIL import Image
14
  import numpy as np
 
 
 
 
15
 
16
  # Global model variable
17
  model = None
18
+ DEVICE = None
19
+
20
 
21
  def load_model():
22
+ """Load the Eruku model from HuggingFace."""
23
+ global model, DEVICE
24
+
25
  if model is None:
26
+ print("Loading Eruku model...")
27
+ from transformers import AutoModel
28
 
29
+ DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
30
 
31
+ model = AutoModel.from_pretrained(
32
+ "blowing-up-groundhogs/eruku",
33
+ trust_remote_code=True
 
 
 
 
 
 
34
  )
35
+ model.to(DEVICE)
36
+ model.eval()
37
 
38
+ print(f"✅ Model loaded on {DEVICE}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  return model
41
 
42
+
43
+ def load_examples():
44
+ """Load example samples from examples/samples.json if it exists."""
45
+ examples_file = Path("examples/samples.json")
46
+
47
+ if examples_file.exists():
48
+ with open(examples_file, "r") as f:
49
+ samples = json.load(f)
50
+
51
+ # Convert to Gradio examples format
52
+ examples = []
53
+ for sample in samples:
54
+ img_path = sample.get("style_image", "")
55
+ if img_path and Path(img_path).exists():
56
+ examples.append([
57
+ img_path,
58
+ sample.get("style_text", ""),
59
+ sample.get("gen_text", "Hello World!")
60
+ ])
61
+
62
+ return examples if examples else get_default_examples()
63
+
64
+ return get_default_examples()
65
+
66
+
67
+ def get_default_examples():
68
+ """Return default examples if no samples.json exists."""
69
+ examples = []
70
+ examples_dir = Path("examples")
71
+
72
+ if examples_dir.exists():
73
+ for img_file in sorted(examples_dir.glob("*.png")) + sorted(examples_dir.glob("*.jpg")):
74
+ examples.append([
75
+ str(img_file),
76
+ "",
77
+ "The quick brown fox jumps over the lazy dog"
78
+ ])
79
+
80
+ return examples
81
+
82
+
83
  @spaces.GPU
84
+ def generate_handwriting(
85
+ style_image: Image.Image,
86
+ style_text: str,
87
+ gen_text: str,
88
+ progress=gr.Progress(track_tqdm=True)
89
+ ):
90
  """
91
+ Generate handwriting in the style of the input image.
92
 
93
  Args:
94
+ style_image: Style reference image (PIL Image)
95
+ style_text: Optional transcription of text in style image
96
+ gen_text: Text to generate
97
+
 
 
98
  Returns:
99
+ Tuple of (generated_image, status_message)
100
  """
101
  try:
102
+ # Validate inputs
103
+ if not gen_text or gen_text.strip() == "":
104
+ return None, "❌ Error: Please provide text to generate"
105
+
106
+ if style_image is None:
107
+ return None, "❌ Error: Please upload a style image"
108
+
109
+ # Load model
110
  model = load_model()
111
 
112
  if model is None:
113
+ return None, "Error: Model failed to load"
114
 
115
+ # Preprocess style image
116
+ style_img = style_image.convert('RGB')
117
+ width, height = style_img.size
118
 
119
+ # Resize to height 64
120
+ new_width = int(64 * width / height)
121
+ # Ensure minimum width
122
+ new_width = max(new_width, 64)
123
+ style_img = style_img.resize((new_width, 64), Image.LANCZOS)
124
 
125
+ # Generate
126
+ progress(0.3, desc="Generating...")
127
+
128
+ result = model.generate_handwriting(
129
+ style_image=style_img,
130
+ gen_text=gen_text,
131
+ style_text=style_text if style_text else "",
132
+ cfg_scale=1.25, # Fixed CFG scale as specified
133
+ max_new_tokens=512
134
+ )
135
+
136
+ progress(1.0, desc="Done!")
137
+
138
+ return result, "✅ Generation successful!"
 
139
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
  except Exception as e:
 
141
  import traceback
142
  traceback.print_exc()
143
+ return None, f"Error: {str(e)}"
144
 
145
+
146
+ # Build the Gradio interface
147
+ with gr.Blocks(
148
+ theme=gr.themes.Soft(),
149
+ title="Eruku - Styled Text Generation",
150
+ css="""
151
+ .main-title {
152
+ text-align: center;
153
+ margin-bottom: 1rem;
154
+ }
155
+ .feature-list {
156
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
157
+ border-radius: 10px;
158
+ padding: 1rem;
159
+ color: white;
160
+ margin: 1rem 0;
161
+ }
162
+ """
163
+ ) as demo:
164
 
165
+ gr.HTML("""
166
+ <div class="main-title">
167
+ <h1>🖋️ Eruku - Autoregressive Styled Text Image Generation</h1>
168
+ <p style="font-size: 1.1em; color: #666;">
169
+ Generate handwritten and styled text using a state-of-the-art autoregressive model
170
+ </p>
171
+ </div>
172
+ """)
173
 
174
+ gr.Markdown("""
175
+ Based on the papers:
176
+ - [**"Zero-Shot Styled Text Image Generation, but Make It Autoregressive"**](https://arxiv.org/abs/2510.23240) (CVPR 2025)
177
+ - **"Autoregressive Styled Text Image Generation, but Make it Reliable"** (WACV 2026)
 
 
178
 
179
+ 📄 [Paper](https://arxiv.org/abs/2510.23240) | 🌐 [Website](https://eruku.carminezacc.com) | 🤗 [Model](https://huggingface.co/blowing-up-groundhogs/eruku)
180
+ """)
 
 
 
 
181
 
182
+ gr.HTML("""
183
+ <div class="feature-list">
184
+ <b>✨ Key Features:</b>
185
+ <ul style="margin: 0.5rem 0; padding-left: 1.5rem;">
186
+ <li><b>Zero-shot style transfer</b> - Works with any handwriting style</li>
187
+ <li><b>No transcription required</b> - Style text is optional</li>
188
+ <li><b>Reliable generation</b> - Proper stop mechanism prevents artifacts</li>
189
+ <li><b>Arbitrary length</b> - Generate text of any length</li>
190
+ </ul>
191
+ </div>
192
  """)
193
 
194
  with gr.Row():
195
+ with gr.Column(scale=1):
196
  style_image_input = gr.Image(
197
+ label="📷 Style Image",
198
  type="pil",
199
+ sources=["upload", "clipboard"],
200
+ height=200,
201
+ elem_id="style-image"
202
  )
203
+
204
  style_text_input = gr.Textbox(
205
+ label="📝 Style Text (Optional)",
206
+ placeholder="Text visible in the style image (helps with style extraction)",
207
  lines=2,
208
+ info="Providing the transcription of text in the style image can improve results"
209
  )
210
+
211
  gen_text_input = gr.Textbox(
212
+ label="✍️ Text to Generate",
213
+ placeholder="Enter the text you want to generate in this style",
214
  lines=3,
215
+ value="Hello, World!"
216
  )
217
 
218
+ generate_btn = gr.Button(
219
+ "🚀 Generate",
220
+ variant="primary",
221
+ size="lg"
 
 
 
222
  )
 
 
223
 
224
+ with gr.Column(scale=1):
225
  output_image = gr.Image(
226
+ label="🎨 Generated Output",
227
+ type="pil",
228
+ height=200
229
+ )
230
+
231
+ status_text = gr.Textbox(
232
+ label="Status",
233
+ lines=2,
234
+ interactive=False
235
  )
 
236
 
237
+ # Load examples
238
+ examples = load_examples()
239
+ if examples:
240
+ gr.Examples(
241
+ examples=examples,
242
+ inputs=[style_image_input, style_text_input, gen_text_input],
243
+ label="📚 Example Styles",
244
+ examples_per_page=5
245
+ )
 
 
246
 
247
  # Connect the generation function
248
  generate_btn.click(
249
  fn=generate_handwriting,
250
+ inputs=[style_image_input, style_text_input, gen_text_input],
251
+ outputs=[output_image, status_text]
252
+ )
253
+
254
+ # Also trigger on Enter in gen_text
255
+ gen_text_input.submit(
256
+ fn=generate_handwriting,
257
+ inputs=[style_image_input, style_text_input, gen_text_input],
258
  outputs=[output_image, status_text]
259
  )
260
 
261
  gr.Markdown("""
262
  ---
263
+ ### 📖 How to Use
264
 
265
+ 1. **Upload a style image**: A sample of handwriting or typewritten text whose style you want to replicate
266
+ 2. **Enter style text** (optional): The text that appears in your style image - this helps the model understand the style better
267
+ 3. **Enter generation text**: The text you want to render in the extracted style
268
+ 4. **Click Generate**: The model will produce text in the style of your reference image
269
 
270
+ ### 💡 Tips
 
 
 
 
271
 
272
+ - **Better style images**: Clear, well-contrasted images work best
273
+ - **Style text helps**: While optional, providing the transcription improves style extraction
274
+ - **Length**: The model handles text of any length, but very long texts may take more time
 
 
 
275
 
276
+ ---
277
+
278
+ ### 📚 Citation
279
+
280
+ If you use this model in your research, please cite:
281
+
282
+ ```bibtex
283
+ @InProceedings{pippi2025zeroshot,
284
+ author = {Pippi, Vittorio and Quattrini, Fabio and Cascianelli, Silvia and Tonioni, Alessio and Cucchiara, Rita},
285
+ title = {Zero-Shot Styled Text Image Generation, but Make It Autoregressive},
286
+ booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
287
+ month = {June},
288
+ year = {2025},
289
+ pages = {7910-7919}
290
+ }
291
+
292
+ @inproceedings{zaccagnino2026autoregressive,
293
+ author = {Carmine Zaccagnino and Fabio Quattrini and Vittorio Pippi and Silvia Cascianelli and Alessio Tonioni and Rita Cucchiara},
294
+ title = {Autoregressive Styled Text Image Generation, but Make it Reliable},
295
+ booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
296
+ month = {March},
297
+ year = {2026}
298
+ }
299
+ ```
300
  """)
301
 
302
+
303
  if __name__ == "__main__":
304
+ demo.queue(max_size=10)
305
  demo.launch()
306
 
examples/README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Example Samples
2
+
3
+ This directory contains example style images for the Eruku demo.
4
+
5
+ ## Adding New Samples
6
+
7
+ To add new samples to the demo:
8
+
9
+ 1. Add your style image (PNG or JPG) to this `examples/` directory
10
+ 2. Edit `samples.json` and add a new entry:
11
+
12
+ ```json
13
+ {
14
+ "style_image": "examples/your_image.png",
15
+ "style_text": "Text visible in the style image (optional but recommended)",
16
+ "gen_text": "Default text to generate with this style"
17
+ }
18
+ ```
19
+
20
+ ## File Format
21
+
22
+ - **style_image**: Path to the image file (relative to space root)
23
+ - **style_text**: The text transcription of what's written in the style image. Leave empty `""` if unknown, but providing it improves results.
24
+ - **gen_text**: The default text that will appear in the "Text to Generate" field when this example is selected
25
+
26
+ ## Tips for Good Style Images
27
+
28
+ 1. **Clear contrast**: Black text on white/light background works best
29
+ 2. **Single line**: One line of text is ideal
30
+ 3. **Consistent style**: The whole image should be in the same handwriting style
31
+ 4. **Height**: Images will be resized to height 64, so ensure text is readable at that scale
32
+
examples/handwritten_1.png ADDED
examples/handwritten_2.png ADDED
examples/handwritten_3.png ADDED
examples/samples.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "style_image": "examples/handwritten_1.png",
4
+ "style_text": "",
5
+ "gen_text": "The quick brown fox jumps over the lazy dog"
6
+ },
7
+ {
8
+ "style_image": "examples/handwritten_2.png",
9
+ "style_text": "",
10
+ "gen_text": "Hello from the Eruku model!"
11
+ },
12
+ {
13
+ "style_image": "examples/handwritten_3.png",
14
+ "style_text": "",
15
+ "gen_text": "Artificial Intelligence and Machine Learning"
16
+ },
17
+ {
18
+ "style_image": "examples/typewritten_1.png",
19
+ "style_text": "",
20
+ "gen_text": "This is typewritten style generation"
21
+ }
22
+ ]
23
+
examples/typewritten_1.png ADDED
requirements.txt CHANGED
@@ -1,17 +1,10 @@
1
- # Eruku dependencies - Updated 2025-11-15
2
- # Critical: diffusers>=0.27.0 required for huggingface_hub compatibility
3
- # Critical: transformers>=4.38.0 required for tokenizers>=0.15 (fixes huggingface_hub conflict)
4
- gradio==4.44.0
5
- torch==2.1.0
6
- torchvision==0.16.0
7
- transformers==4.40.0
8
- diffusers==0.27.0
9
- accelerate==0.25.0
10
- einops==0.7.0
11
- numpy==1.24.3
12
- Pillow==10.1.0
13
- huggingface_hub>=0.19.3
14
- spaces==0.28.3
15
- sentencepiece==0.1.99
16
- protobuf==3.20.3
17
 
 
1
+ torch>=2.0.0
2
+ torchvision>=0.15.0
3
+ transformers>=4.40.0
4
+ diffusers>=0.25.0
5
+ einops>=0.7.0
6
+ pillow>=10.0.0
7
+ gradio>=4.0.0
8
+ spaces>=0.19.0
9
+ numpy<2.0.0
 
 
 
 
 
 
 
10