Spaces:

carminezacc
/

eruku

Running on Zero

App Files Files Community

carminezacc commited on Nov 27, 2025

Commit

dd08e95

verified ·

1 Parent(s): f092d2e

Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

README.md +29 -89
app.py +229 -202
examples/README.md +32 -0
examples/handwritten_1.png +0 -0
examples/handwritten_2.png +0 -0
examples/handwritten_3.png +0 -0
examples/samples.json +23 -0
examples/typewritten_1.png +0 -0
requirements.txt +9 -16

README.md CHANGED Viewed

@@ -1,91 +1,43 @@
 ---
-title: Eruku - Autoregressive Handwriting Generation
 emoji: 🖋️
-colorFrom: blue
-colorTo: purple
 sdk: gradio
 sdk_version: 4.44.0
 app_file: app.py
-pinned: false
-license: mit
-python_version: 3.10
 ---
-# 🖋️ Eruku - Autoregressive Styled Text Image Generation
-Generate realistic handwritten and styled text using the Eruku model!
-[![arXiv](https://img.shields.io/badge/arXiv-2510.23240-b31b1b.svg)](https://arxiv.org/abs/2510.23240)
-[![Website](https://img.shields.io/badge/Website-eruku.carminezacc.com-blue)](https://eruku.carminezacc.com)
-## About
-**Eruku** is a state-of-the-art autoregressive model for styled text image generation, particularly excelling at handwritten text generation (HTG). Based on the paper ["Autoregressive Styled Text Image Generation, but Make it Reliable"](https://arxiv.org/abs/2510.23240) by Carmine Zaccagnino, Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Alessio Tonioni, and Rita Cucchiara, Eruku addresses key limitations of previous approaches while maintaining their strengths.
-### Key Innovations
-- **No Style Text Required**: Unlike previous methods, Eruku doesn't require transcriptions of style images, making it more practical for real-world use
-- **Reliable Generation**: Proper stop mechanism prevents repetition loops and visual artifacts
-- **Special Token Alignment**: Introduces special textual tokens (SOG/EOG) for better alignment between text and visual representations
-- **Classifier-Free Guidance**: Implements CFG for improved control over style adherence and text fidelity
-- **Arbitrary Length**: Can generate text images of any length without architectural constraints
-### Architecture
-The model combines:
-- **T5 Transformer**: Autoregressive text encoder for understanding and generation control
-- **VAE (Variational Autoencoder)**: Efficient image tokenizer for converting between pixel space and latent representations
-- **Autoregressive Decoder**: Generates visual embeddings sequentially for smooth, natural-looking text
-## How to Use
-1. **Style Image** (Optional): Upload a handwriting or typewritten sample image to mimic its style
-2. **Style Text** (Optional): Enter the text from the style image (helps with style transfer)
-3. **Text to Generate**: Enter the text you want to see in the chosen style
-4. **CFG Scale (Text Guidance)**:
-   - 1.0 = Very loose guidance (style image dominates)
-   - 2.0-3.0 = Balanced
-   - Higher values = More literal prompt following
-5. Click **Generate** and wait for your styled text! (Max token budget fixed at 128 for stability)
 ## Features
-- ✨ High-quality handwriting synthesis
-- 🎨 Style control through reference images and text
-- 🧭 CFG slider controls **text** guidance strength
-- 📝 Works with both handwritten and typewritten styles
-- ⚡ Powered by ZeroGPU for free GPU access
-- 🔧 Adjustable generation parameters
-## Technical Details
-The model uses:
-- T5-base as the backbone language model
-- Custom VAE for image generation
-- Continuous latent space representation
-- Autoregressive generation with special tokens
-## Tips for Best Results
-- Upload a clear style image for better style mimicking
-- Keep text relatively short for best quality
-- Adjust CFG scale: lower for more style influence, higher for stricter text adherence
-- Style text is optional - the model works well without it
-- Token budget is fixed at 128 to match the research code's sweet spot
-## Performance Highlights
-From the paper, Eruku demonstrates:
-- **Superior Text Adherence**: Lower Character Error Rate (CER) compared to previous methods
-- **Better Generalization**: Excellent performance on both handwritten and typewritten styles
-- **Style Consistency**: High-fidelity style replication while maintaining readability
-- **Efficient Training**: Simpler training process without requiring auxiliary networks
 ## Citation
-If you use Eruku in your research, please cite:
 ```bibtex
 @InProceedings{pippi2025zeroshot,
     author    = {Pippi, Vittorio and Quattrini, Fabio and Cascianelli, Silvia and Tonioni, Alessio and Cucchiara, Rita},
@@ -100,26 +52,14 @@ If you use Eruku in your research, please cite:
     author = {Carmine Zaccagnino and Fabio Quattrini and Vittorio Pippi and Silvia Cascianelli and Alessio Tonioni and Rita Cucchiara},
     title = {Autoregressive Styled Text Image Generation, but Make it Reliable},
     booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
-    month=3,
-    year = 2026
 }
 ```
 ## Links
-- 📄 **Paper**: [arXiv:2510.23240](https://arxiv.org/abs/2510.23240)
-- 🌐 **Website**: [eruku.carminezacc.com](https://eruku.carminezacc.com)
-- 🤗 **Model**: [blowing-up-groundhogs/eruku](https://huggingface.co/blowing-up-groundhogs/eruku)
-- 💻 **Code**: Coming soon!
-## License
-This project is licensed under the MIT License.
-## Acknowledgments
-Built with:
-- [Gradio](https://gradio.app/) for the web interface
-- [Hugging Face](https://huggingface.co/) for model hosting and Spaces
-- [ZeroGPU](https://huggingface.co/zero-gpu-explorers) for free GPU access

 ---
+title: Eruku - Styled Text Generation
 emoji: 🖋️
+colorFrom: purple
+colorTo: blue
 sdk: gradio
 sdk_version: 4.44.0
 app_file: app.py
+pinned: true
+license: apache-2.0
+models:
+  - blowing-up-groundhogs/eruku
+tags:
+  - handwriting-generation
+  - styled-text
+  - text-to-image
+  - autoregressive
+short_description: Generate handwritten text in any style
 ---
+# Eruku - Autoregressive Styled Text Image Generation
+This Space demonstrates **Eruku**, a state-of-the-art model for generating handwritten and styled text images.
 ## Features
+- **Zero-shot style transfer**: No training needed for new styles
+- **No transcription required**: Works with just a style image
+- **Reliable generation**: Proper EOG mechanism prevents artifacts
+- **Arbitrary length**: Generate text of any length
+## How to Use
+1. Upload a handwriting or font sample as the style image
+2. Optionally enter the text from the style image
+3. Enter the text you want to generate
+4. Click Generate!
 ## Citation
 ```bibtex
 @InProceedings{pippi2025zeroshot,
     author    = {Pippi, Vittorio and Quattrini, Fabio and Cascianelli, Silvia and Tonioni, Alessio and Cucchiara, Rita},
     author = {Carmine Zaccagnino and Fabio Quattrini and Vittorio Pippi and Silvia Cascianelli and Alessio Tonioni and Rita Cucchiara},
     title = {Autoregressive Styled Text Image Generation, but Make it Reliable},
     booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
+    month = {March},
+    year = {2026}
 }
 ```
 ## Links
+- 📄 [Paper](https://arxiv.org/abs/2510.23240)
+- 🌐 [Website](https://eruku.carminezacc.com)
+- 🤗 [Model](https://huggingface.co/blowing-up-groundhogs/eruku)

app.py CHANGED Viewed

@@ -1,279 +1,306 @@
 import gradio as gr
 import torch
 import spaces
-import os
 from pathlib import Path
 from PIL import Image
 import numpy as np
-from huggingface_hub import hf_hub_download
-# Import the model
-from eruku_continuous_inf import Emuru
 # Global model variable
 model = None
 def load_model():
-    """Load the Emuru model with checkpoints"""
-    global model
     if model is None:
-        print("Loading model...")
-        # Model repository
-        MODEL_REPO = "blowing-up-groundhogs/eruku"
-        t5_checkpoint = 'google-t5/t5-base'
-        vae_checkpoint = 'blowing-up-groundhogs/emuru_vae'
-        # Download OCR checkpoint
-        print("Downloading OCR checkpoint...")
-        ocr_checkpoint_path = hf_hub_download(
-            repo_id=MODEL_REPO,
-            filename="origami.pth",
-            cache_dir="./checkpoints"
         )
-        try:
-            print("Initializing model...")
-            model = Emuru(
-                t5_checkpoint=t5_checkpoint,
-                vae_checkpoint=vae_checkpoint,
-                ocr_checkpoint=ocr_checkpoint_path,
-                slices_per_query=1,
-                channels=1
-            )
-            # Download and load trained model checkpoint
-            print("Downloading trained model checkpoint...")
-            trained_checkpoint_path = hf_hub_download(
-                repo_id=MODEL_REPO,
-                filename="000073688.pth",
-                cache_dir="./checkpoints"
-            )
-            print(f"Loading trained weights from {trained_checkpoint_path}...")
-            checkpoint = torch.load(trained_checkpoint_path, map_location='cpu', weights_only=False)
-            # Load the state dict - handle different checkpoint formats
-            if isinstance(checkpoint, dict):
-                if 'model_state_dict' in checkpoint:
-                    model.load_state_dict(checkpoint['model_state_dict'], strict=False)
-                elif 'state_dict' in checkpoint:
-                    model.load_state_dict(checkpoint['state_dict'], strict=False)
-                else:
-                    # Assume the checkpoint itself is the state dict
-                    model.load_state_dict(checkpoint, strict=False)
-            else:
-                print("Warning: Unexpected checkpoint format")
-            model.eval()
-            print("✅ Model loaded successfully!")
-        except Exception as e:
-            print(f"❌ Error loading model: {e}")
-            import traceback
-            traceback.print_exc()
-            raise e
     return model
 @spaces.GPU
-def generate_handwriting(style_image, style_text, gen_text, cfg_scale=1.5):
     """
-    Generate handwriting based on style image and generation text
     Args:
-        style_image: PIL Image or None - style reference image
-        style_text: Text from the style image (optional)
-        gen_text: Text to generate in handwriting
-        cfg_scale: Classifier-free guidance scale (1.0 = no guidance)
-        max_tokens: Maximum number of tokens to generate
     Returns:
-        PIL Image of generated handwriting
     """
     try:
         model = load_model()
         if model is None:
-            return None, "Error: Model failed to load"
-        if not gen_text or gen_text.strip() == "":
-            return None, "Error: Please provide text to generate"
-        # Move model to GPU (ZeroGPU will handle this)
-        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-        model = model.to(device)
-        # Prepare style image - must match eruku_continuous_inf.py logic
-        if style_image is not None:
-            # Convert PIL image to RGB (VAE expects 3 channels)
-            style_img_pil = style_image.convert('RGB')
-            style_img_tensor = torch.from_numpy(np.array(style_img_pil)).float()
-            # Normalize to [-1, 1] range
-            style_img_tensor = (style_img_tensor / 127.5) - 1.0
-            # Rearrange from (h, w, c) to (c, h, w)
-            style_img_tensor = style_img_tensor.permute(2, 0, 1).to(device)
-            style_img = [style_img_tensor]
-            style_len = style_img_tensor.shape[-1]
-        else:
-            # Use minimal style image if none provided (3 channels, c h w format)
-            style_img = [torch.ones(3, 64, 64).to(device)]
-            style_len = 64
-        # Encode inputs
-        with torch.no_grad():
-            inputs = model.get_model_inputs(
-                style_img=style_img,
-                gen_img=None,
-                style_len=style_len,
-                gen_len=None,
-                max_img_len=128 * 8
-            )
-            # Generate
-            output_img, special_sequence = model.generate(
-                decoder_inputs_embeds_vae=inputs['decoder_inputs_embeds'],
-                style_text=[style_text if style_text else ""],
-                gen_text=[gen_text],
-                cfg_scale=cfg_scale,
-                max_new_tokens=128
-            )
-            # Convert to PIL Image
-            output_img = output_img.cpu()
-            output_img = (torch.clamp(output_img, -1, 1) + 1) * 127.5
-            output_img = output_img.byte().squeeze().numpy()
-            # Handle different dimensions
-            if len(output_img.shape) == 2:
-                pil_img = Image.fromarray(output_img, mode='L')
-            elif output_img.shape[0] == 3:
-                output_img = np.transpose(output_img, (1, 2, 0))
-                pil_img = Image.fromarray(output_img, mode='RGB')
-            else:
-                pil_img = Image.fromarray(output_img[0], mode='L')
-            return pil_img, "Generation successful!"
     except Exception as e:
-        print(f"Error during generation: {e}")
         import traceback
         traceback.print_exc()
-        return None, f"Error: {str(e)}"
-# Create Gradio interface
-with gr.Blocks(theme=gr.themes.Soft()) as demo:
-    gr.Markdown("""
-    # 🖋️ Eruku - Autoregressive Styled Text Image Generation
-    Generate handwritten and styled text using **Eruku**, a state-of-the-art autoregressive model.
-    Based on the paper: [**"Autoregressive Styled Text Image Generation, but Make it Reliable"**](https://arxiv.org/abs/2510.23240)
-    📄 [Paper](https://arxiv.org/abs/2510.23240) | 🌐 [Website](https://eruku.carminezacc.com)
-    ### Key Features
-    - ✨ **No style transcription required** - Just provide text to generate
-    - 🎯 **Reliable generation** - Proper stop mechanism prevents artifacts
-    - 📏 **Arbitrary length** - Generate text of any length
-    - 🎨 **High fidelity** - Excellent style consistency and text readability
-    - ⚡ **Classifier-Free Guidance** - Fine control over generation
-    **How to use:**
-    1. Upload a style image (handwritten or typewritten sample) OR leave empty for default
-    2. Optionally enter the text from the style image (helps with style transfer)
-    3. Enter the text you want to generate in that style
-    4. Adjust CFG scale for **text guidance** (1.0 = almost unconstrained, higher = more literal prompt following)
-    5. Click Generate!
-    ⚠️ **Note:** This demo uses ZeroGPU, so generation may take a moment while the GPU spins up.
     """)
     with gr.Row():
-        with gr.Column():
             style_image_input = gr.Image(
-                label="Style Image (Optional)",
                 type="pil",
-                sources=["upload"],
-                height=150
             )
             style_text_input = gr.Textbox(
-                label="Style Text (Optional)",
-                placeholder="Text from the style image (if any)",
                 lines=2,
-                value=""
             )
             gen_text_input = gr.Textbox(
-                label="Text to Generate",
-                placeholder="Enter the text you want in the same style",
                 lines=3,
-                value="Hello World!"
             )
-            cfg_scale = gr.Slider(
-                minimum=1.0,
-                maximum=5.0,
-                value=1.5,
-                step=0.1,
-                label="CFG Scale",
-                info="Controls text guidance strength (higher = more literal prompt)"
             )
-            generate_btn = gr.Button("Generate ✨", variant="primary", size="lg")
-        with gr.Column():
             output_image = gr.Image(
-                label="Generated Handwriting",
-                type="pil"
             )
-            status_text = gr.Textbox(label="Status", lines=2)
-    # Examples with sample images
-    gr.Examples(
-        examples=[
-            ["example_images/handwritten_1.png", "", "The quick brown fox jumps over the lazy dog", 1.5],
-            ["example_images/handwritten_2.png", "", "Hello from handwritten style!", 1.5],
-            ["example_images/handwritten_3.png", "", "Artificial Intelligence and Machine Learning", 2.0],
-            ["example_images/typewritten_1.png", "", "This is typewritten style generation", 1.5],
-        ],
-        inputs=[style_image_input, style_text_input, gen_text_input, cfg_scale],
-        label="Example Styles"
-    )
     # Connect the generation function
     generate_btn.click(
         fn=generate_handwriting,
-        inputs=[style_image_input, style_text_input, gen_text_input, cfg_scale],
         outputs=[output_image, status_text]
     )
     gr.Markdown("""
     ---
-    ### About Eruku
-    **Eruku** is an autoregressive model for styled text image generation from the paper
-    ["Autoregressive Styled Text Image Generation, but Make it Reliable"](https://arxiv.org/abs/2510.23240)
-    by Carmine Zaccagnino, Fabio Quattrini, Vittorio Pippi, Silvia Cascianelli, Alessio Tonioni, and Rita Cucchiara
-    The model improves upon previous approaches by:
-    - Eliminating the need for style text transcriptions
-    - Introducing special tokens (SOG/EOG) for better text-visual alignment
-    - Implementing Classifier-Free Guidance for autoregressive generation
-    - Providing more reliable generation with proper stop mechanisms
-    **Tips:**
-    - Upload a style image to mimic its handwriting or font style
-    - Style text helps but is optional - the model can work with images alone!
-    - Higher CFG scales enforce the prompt more strongly (use lower values if you want the style image to dominate)
-    - The model generalizes well to both handwritten and typewritten styles
-    - Token budget is fixed to 128 for consistent results
-    **Links:**
-    📄 [Paper](https://arxiv.org/abs/2510.23240) | 🌐 [Website](https://eruku.carminezacc.com) | 🤗 [Model](https://huggingface.co/blowing-up-groundhogs/eruku)
     """)
-# Launch the app
 if __name__ == "__main__":
-    demo.queue(max_size=20)
     demo.launch()

+"""
+Eruku Demo - Autoregressive Styled Text Image Generation
+This Gradio demo showcases the Eruku model for generating handwritten and
+styled text images. Upload a style reference image and generate text in that style.
+"""
 import gradio as gr
 import torch
 import spaces
+import json
 from pathlib import Path
 from PIL import Image
 import numpy as np
 # Global model variable
 model = None
+DEVICE = None
 def load_model():
+    """Load the Eruku model from HuggingFace."""
+    global model, DEVICE
     if model is None:
+        print("Loading Eruku model...")
+        from transformers import AutoModel
+        DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        model = AutoModel.from_pretrained(
+            "blowing-up-groundhogs/eruku",
+            trust_remote_code=True
         )
+        model.to(DEVICE)
+        model.eval()
+        print(f"✅ Model loaded on {DEVICE}")
     return model
+def load_examples():
+    """Load example samples from examples/samples.json if it exists."""
+    examples_file = Path("examples/samples.json")
+    if examples_file.exists():
+        with open(examples_file, "r") as f:
+            samples = json.load(f)
+        # Convert to Gradio examples format
+        examples = []
+        for sample in samples:
+            img_path = sample.get("style_image", "")
+            if img_path and Path(img_path).exists():
+                examples.append([
+                    img_path,
+                    sample.get("style_text", ""),
+                    sample.get("gen_text", "Hello World!")
+                ])
+        return examples if examples else get_default_examples()
+    return get_default_examples()
+def get_default_examples():
+    """Return default examples if no samples.json exists."""
+    examples = []
+    examples_dir = Path("examples")
+    if examples_dir.exists():
+        for img_file in sorted(examples_dir.glob("*.png")) + sorted(examples_dir.glob("*.jpg")):
+            examples.append([
+                str(img_file),
+                "",
+                "The quick brown fox jumps over the lazy dog"
+            ])
+    return examples
 @spaces.GPU
+def generate_handwriting(
+    style_image: Image.Image,
+    style_text: str,
+    gen_text: str,
+    progress=gr.Progress(track_tqdm=True)
+):
     """
+    Generate handwriting in the style of the input image.
     Args:
+        style_image: Style reference image (PIL Image)
+        style_text: Optional transcription of text in style image
+        gen_text: Text to generate
     Returns:
+        Tuple of (generated_image, status_message)
     """
     try:
+        # Validate inputs
+        if not gen_text or gen_text.strip() == "":
+            return None, "❌ Error: Please provide text to generate"
+        if style_image is None:
+            return None, "❌ Error: Please upload a style image"
+        # Load model
         model = load_model()
         if model is None:
+            return None, "❌ Error: Model failed to load"
+        # Preprocess style image
+        style_img = style_image.convert('RGB')
+        width, height = style_img.size
+        # Resize to height 64
+        new_width = int(64 * width / height)
+        # Ensure minimum width
+        new_width = max(new_width, 64)
+        style_img = style_img.resize((new_width, 64), Image.LANCZOS)
+        # Generate
+        progress(0.3, desc="Generating...")
+        result = model.generate_handwriting(
+            style_image=style_img,
+            gen_text=gen_text,
+            style_text=style_text if style_text else "",
+            cfg_scale=1.25,  # Fixed CFG scale as specified
+            max_new_tokens=512
+        )
+        progress(1.0, desc="Done!")
+        return result, "✅ Generation successful!"
     except Exception as e:
         import traceback
         traceback.print_exc()
+        return None, f"❌ Error: {str(e)}"
+# Build the Gradio interface
+with gr.Blocks(
+    theme=gr.themes.Soft(),
+    title="Eruku - Styled Text Generation",
+    css="""
+    .main-title {
+        text-align: center;
+        margin-bottom: 1rem;
+    }
+    .feature-list {
+        background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+        border-radius: 10px;
+        padding: 1rem;
+        color: white;
+        margin: 1rem 0;
+    }
+    """
+) as demo:
+    gr.HTML("""
+    <div class="main-title">
+        <h1>🖋️ Eruku - Autoregressive Styled Text Image Generation</h1>
+        <p style="font-size: 1.1em; color: #666;">
+            Generate handwritten and styled text using a state-of-the-art autoregressive model
+        </p>
+    </div>
+    """)
+    gr.Markdown("""
+    Based on the papers:
+    - [**"Zero-Shot Styled Text Image Generation, but Make It Autoregressive"**](https://arxiv.org/abs/2510.23240) (CVPR 2025)
+    - **"Autoregressive Styled Text Image Generation, but Make it Reliable"** (WACV 2026)
+    📄 [Paper](https://arxiv.org/abs/2510.23240) | 🌐 [Website](https://eruku.carminezacc.com) | 🤗 [Model](https://huggingface.co/blowing-up-groundhogs/eruku)
+    """)
+    gr.HTML("""
+    <div class="feature-list">
+        <b>✨ Key Features:</b>
+        <ul style="margin: 0.5rem 0; padding-left: 1.5rem;">
+            <li><b>Zero-shot style transfer</b> - Works with any handwriting style</li>
+            <li><b>No transcription required</b> - Style text is optional</li>
+            <li><b>Reliable generation</b> - Proper stop mechanism prevents artifacts</li>
+            <li><b>Arbitrary length</b> - Generate text of any length</li>
+        </ul>
+    </div>
     """)
     with gr.Row():
+        with gr.Column(scale=1):
             style_image_input = gr.Image(
+                label="📷 Style Image",
                 type="pil",
+                sources=["upload", "clipboard"],
+                height=200,
+                elem_id="style-image"
             )
             style_text_input = gr.Textbox(
+                label="📝 Style Text (Optional)",
+                placeholder="Text visible in the style image (helps with style extraction)",
                 lines=2,
+                info="Providing the transcription of text in the style image can improve results"
             )
             gen_text_input = gr.Textbox(
+                label="✍️ Text to Generate",
+                placeholder="Enter the text you want to generate in this style",
                 lines=3,
+                value="Hello, World!"
             )
+            generate_btn = gr.Button(
+                "🚀 Generate",
+                variant="primary",
+                size="lg"
             )
+        with gr.Column(scale=1):
             output_image = gr.Image(
+                label="🎨 Generated Output",
+                type="pil",
+                height=200
+            )
+            status_text = gr.Textbox(
+                label="Status",
+                lines=2,
+                interactive=False
             )
+    # Load examples
+    examples = load_examples()
+    if examples:
+        gr.Examples(
+            examples=examples,
+            inputs=[style_image_input, style_text_input, gen_text_input],
+            label="📚 Example Styles",
+            examples_per_page=5
+        )
     # Connect the generation function
     generate_btn.click(
         fn=generate_handwriting,
+        inputs=[style_image_input, style_text_input, gen_text_input],
+        outputs=[output_image, status_text]
+    )
+    # Also trigger on Enter in gen_text
+    gen_text_input.submit(
+        fn=generate_handwriting,
+        inputs=[style_image_input, style_text_input, gen_text_input],
         outputs=[output_image, status_text]
     )
     gr.Markdown("""
     ---
+    ### 📖 How to Use
+    1. **Upload a style image**: A sample of handwriting or typewritten text whose style you want to replicate
+    2. **Enter style text** (optional): The text that appears in your style image - this helps the model understand the style better
+    3. **Enter generation text**: The text you want to render in the extracted style
+    4. **Click Generate**: The model will produce text in the style of your reference image
+    ### 💡 Tips
+    - **Better style images**: Clear, well-contrasted images work best
+    - **Style text helps**: While optional, providing the transcription improves style extraction
+    - **Length**: The model handles text of any length, but very long texts may take more time
+    ---
+    ### 📚 Citation
+    If you use this model in your research, please cite:
+    ```bibtex
+    @InProceedings{pippi2025zeroshot,
+        author    = {Pippi, Vittorio and Quattrini, Fabio and Cascianelli, Silvia and Tonioni, Alessio and Cucchiara, Rita},
+        title     = {Zero-Shot Styled Text Image Generation, but Make It Autoregressive},
+        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+        month     = {June},
+        year      = {2025},
+        pages     = {7910-7919}
+    }
+    @inproceedings{zaccagnino2026autoregressive,
+        author = {Carmine Zaccagnino and Fabio Quattrini and Vittorio Pippi and Silvia Cascianelli and Alessio Tonioni and Rita Cucchiara},
+        title = {Autoregressive Styled Text Image Generation, but Make it Reliable},
+        booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
+        month = {March},
+        year = {2026}
+    }
+    ```
     """)
 if __name__ == "__main__":
+    demo.queue(max_size=10)
     demo.launch()

examples/README.md ADDED Viewed

	@@ -0,0 +1,32 @@

+# Example Samples
+This directory contains example style images for the Eruku demo.
+## Adding New Samples
+To add new samples to the demo:
+1. Add your style image (PNG or JPG) to this `examples/` directory
+2. Edit `samples.json` and add a new entry:
+```json
+{
+    "style_image": "examples/your_image.png",
+    "style_text": "Text visible in the style image (optional but recommended)",
+    "gen_text": "Default text to generate with this style"
+}
+```
+## File Format
+- **style_image**: Path to the image file (relative to space root)
+- **style_text**: The text transcription of what's written in the style image. Leave empty `""` if unknown, but providing it improves results.
+- **gen_text**: The default text that will appear in the "Text to Generate" field when this example is selected
+## Tips for Good Style Images
+1. **Clear contrast**: Black text on white/light background works best
+2. **Single line**: One line of text is ideal
+3. **Consistent style**: The whole image should be in the same handwriting style
+4. **Height**: Images will be resized to height 64, so ensure text is readable at that scale

examples/handwritten_1.png ADDED Viewed

examples/handwritten_2.png ADDED Viewed

examples/handwritten_3.png ADDED Viewed

examples/samples.json ADDED Viewed

	@@ -0,0 +1,23 @@

+[
+    {
+        "style_image": "examples/handwritten_1.png",
+        "style_text": "",
+        "gen_text": "The quick brown fox jumps over the lazy dog"
+    },
+    {
+        "style_image": "examples/handwritten_2.png",
+        "style_text": "",
+        "gen_text": "Hello from the Eruku model!"
+    },
+    {
+        "style_image": "examples/handwritten_3.png",
+        "style_text": "",
+        "gen_text": "Artificial Intelligence and Machine Learning"
+    },
+    {
+        "style_image": "examples/typewritten_1.png",
+        "style_text": "",
+        "gen_text": "This is typewritten style generation"
+    }
+]

examples/typewritten_1.png ADDED Viewed

requirements.txt CHANGED Viewed

@@ -1,17 +1,10 @@
-# Eruku dependencies - Updated 2025-11-15
-# Critical: diffusers>=0.27.0 required for huggingface_hub compatibility
-# Critical: transformers>=4.38.0 required for tokenizers>=0.15 (fixes huggingface_hub conflict)
-gradio==4.44.0
-torch==2.1.0
-torchvision==0.16.0
-transformers==4.40.0
-diffusers==0.27.0
-accelerate==0.25.0
-einops==0.7.0
-numpy==1.24.3
-Pillow==10.1.0
-huggingface_hub>=0.19.3
-spaces==0.28.3
-sentencepiece==0.1.99
-protobuf==3.20.3

+torch>=2.0.0
+torchvision>=0.15.0
+transformers>=4.40.0
+diffusers>=0.25.0
+einops>=0.7.0
+pillow>=10.0.0
+gradio>=4.0.0
+spaces>=0.19.0
+numpy<2.0.0