Spaces:

samarthnaikk
/

im2latex

Sleeping

App Files Files Community

Samarth Naik commited on Dec 21, 2025

Commit

a3debee

1 Parent(s): 5736c30

added texteller

Browse files

Files changed (5) hide show

.dockerignore +15 -0
Dockerfile +52 -0
README.md +121 -0
main.py +124 -279
requirements.txt +4 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,15 @@

+PRINTED_TEX_230k/
+*.pyc
+__pycache__/
+.git/
+.gitignore
+*.md
+.DS_Store
+.env
+.venv/
+venv/
+*.log
+*.tmp
+.pytest_cache/
+.coverage
+htmlcov/

Dockerfile ADDED Viewed

	@@ -0,0 +1,52 @@

+# Use Python slim image as base
+FROM python:3.11-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies needed for TextTeller and image processing
+RUN apt-get update && apt-get install -y \
+    git \
+    wget \
+    curl \
+    build-essential \
+    libgl1-mesa-glx \
+    libglib2.0-0 \
+    libsm6 \
+    libxext6 \
+    libxrender-dev \
+    libgomp1 \
+    && rm -rf /var/lib/apt/lists/*
+# Install uv for faster Python package management
+RUN pip install --no-cache-dir uv
+# Copy requirements first for better caching
+COPY requirements.txt .
+# Install Python dependencies using uv
+RUN uv pip install --system -r requirements.txt
+# Install TextTeller with ONNX runtime support
+RUN uv pip install --system texteller[onnxruntime-gpu]
+# Copy the application code
+COPY main.py .
+# Create directory for temporary files
+RUN mkdir -p /tmp/image_uploads
+# Set environment variables
+ENV FLASK_APP=main.py
+ENV PYTHONUNBUFFERED=1
+ENV PORT=5000
+# Expose the port the app runs on
+EXPOSE 5000
+# Add healthcheck
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:5000/health || exit 1
+# Run the application
+CMD ["python", "main.py"]

README.md ADDED Viewed

	@@ -0,0 +1,121 @@

+# Image to LaTeX API
+A Flask API that converts mathematical formula images to LaTeX code using TextTeller.
+## Features
+- **POST /itl**: Upload an image and get LaTeX code back
+- **GET /health**: Health check endpoint
+- **GET /**: API documentation
+- Automatic image cleanup after processing
+- Support for multiple image formats (PNG, JPG, JPEG, GIF, BMP, TIFF)
+- Maximum file size: 16MB
+## Quick Start
+### Using Docker (Recommended)
+1. Build the Docker image:
+```bash
+docker build -t image-to-latex-api .
+```
+2. Run the container:
+```bash
+docker run -p 5000:5000 image-to-latex-api
+```
+### Local Development
+1. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+2. Run the Flask app:
+```bash
+python main.py
+```
+## API Usage
+### Convert Image to LaTeX
+**Endpoint:** `POST /itl`
+**Request:** Send image as multipart/form-data with key `"image"`
+**Example using curl:**
+```bash
+curl -X POST -F "image=@formula.png" http://localhost:5000/itl
+```
+**Example using Python requests:**
+```python
+import requests
+with open('formula.png', 'rb') as f:
+    response = requests.post('http://localhost:5000/itl', files={'image': f})
+    print(response.json())
+```
+**Response:**
+```json
+{
+    "success": true,
+    "latex": "\\frac{x^2 + y^2}{2}"
+}
+```
+### Health Check
+**Endpoint:** `GET /health`
+**Response:**
+```json
+{
+    "status": "healthy",
+    "texteller_available": true
+}
+```
+## Deployment
+### Hugging Face Spaces
+This API can be easily deployed to Hugging Face Spaces:
+1. Create a new Space on Hugging Face
+2. Upload the files: `main.py`, `Dockerfile`, `requirements.txt`
+3. Set the Space to use Docker
+4. The API will be available at your Space URL
+### Other Platforms
+The Dockerfile is compatible with most container platforms like:
+- Google Cloud Run
+- AWS ECS/Fargate
+- Azure Container Instances
+- Railway
+- Render
+## Error Handling
+The API returns appropriate HTTP status codes:
+- `200`: Success
+- `400`: Bad request (no image, invalid format)
+- `408`: Request timeout (processing took too long)
+- `413`: Payload too large (file > 16MB)
+- `500`: Internal server error
+- `503`: Service unavailable
+## Environment Variables
+- `PORT`: Port to run the Flask app on (default: 5000)
+## Notes
+- Images are automatically deleted after processing to save disk space
+- Processing timeout is set to 30 seconds
+- The API uses TextTeller's inference capabilities under the hood
+- Temporary files are created in the system's temp directory

main.py CHANGED Viewed

@@ -1,282 +1,127 @@
-# ----------------------------------------------------
-# Base image with CUDA + PyTorch
-# ----------------------------------------------------
-FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
-# ----------------------------------------------------
-# System dependencies
-# ----------------------------------------------------
-RUN apt-get update && apt-get install -y \
-    python3 python3-pip python3-dev \
-    git wget nano unzip findutils \
-    && rm -rf /var/lib/apt/lists/*
-# ----------------------------------------------------
-# Install Python dependencies
-# ----------------------------------------------------
-RUN pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu121
-RUN pip3 install kagglehub pillow tqdm
-# ----------------------------------------------------
-# Set working directory
-# ----------------------------------------------------
-WORKDIR /workspace
-# ----------------------------------------------------
-# Write FULL main.py directly into the container
-# ----------------------------------------------------
-RUN cat << 'EOF' > /workspace/main.py
 import os
-import argparse
 from pathlib import Path
-import torch
-from torch import nn
-from torch.utils.data import Dataset, DataLoader
-from torchvision import transforms, models
-from torch.optim import AdamW
-from torch.cuda.amp import autocast, GradScaler
-import torch.nn.functional as F
-from PIL import Image
-from tqdm import tqdm
-# ============================================================
-# TOKENIZER
-# ============================================================
-class CharTokenizer:
-    def __init__(self):
-        self.special = ["<pad>", "<sos>", "<eos>", "<unk>"]
-        self.idx2tok = list(self.special)
-        self.tok2idx = {t: i for i, t in enumerate(self.idx2tok)}
-    def build(self, formulas):
-        chars = set()
-        for f in formulas:
-            chars.update(list(f))
-        for c in sorted(chars):
-            if c not in self.tok2idx:
-                self.tok2idx[c] = len(self.idx2tok)
-                self.idx2tok.append(c)
-    def encode(self, text):
-        return [self.tok2idx.get(t, self.tok2idx["<unk>"]) for t in ["<sos>"] + list(text) + ["<eos>"]]
-    @property
-    def pad(self):
-        return self.tok2idx["<pad>"]
-    def __len__(self):
-        return len(self.idx2tok)
-# ============================================================
-# DATASET
-# ============================================================
-class TexDataset(Dataset):
-    def __init__(self, filenames, formulas, root):
-        self.filenames = filenames
-        self.formulas = formulas
-        self.root = Path(root) / "generated_png_images"
-        self.transform = transforms.Compose([
-            transforms.Resize((256, 1024)),
-            transforms.ToTensor(),
-            transforms.Normalize([0.5], [0.5]),
-        ])
-    def __len__(self):
-        return len(self.filenames)
-    def __getitem__(self, idx):
-        img_path = self.root / self.filenames[idx]
-        image = Image.open(img_path).convert("RGB")
-        image = self.transform(image)
-        return image, self.formulas[idx], self.filenames[idx]
-def collate_fn(batch, tokenizer):
-    images, texts, names = zip(*batch)
-    images = torch.stack(images)
-    encoded = [torch.tensor(tokenizer.encode(t)) for t in texts]
-    max_len = max(len(e) for e in encoded)
-    padded = torch.full((len(encoded), max_len), tokenizer.pad, dtype=torch.long)
-    for i, e in enumerate(encoded):
-        padded[i, :len(e)] = e
-    return images, padded, names
-# ============================================================
-# MODEL
-# ============================================================
-class Img2Latex(nn.Module):
-    def __init__(self, vocab_size, d_model=512):
-        super().__init__()
-        resnet = models.resnet34(weights=models.ResNet34_Weights.DEFAULT)
-        self.encoder = nn.Sequential(*list(resnet.children())[:-2])
-        self.proj = nn.Conv2d(512, d_model, 1)
-        self.embed = nn.Embedding(vocab_size, d_model)
-        self.pos = nn.Parameter(torch.randn(1, 1024, d_model))
-        dec_layer = nn.TransformerDecoderLayer(d_model, 8)
-        self.decoder = nn.TransformerDecoder(dec_layer, num_layers=4)
-        self.fc = nn.Linear(d_model, vocab_size)
-    def forward(self, images, tgt):
-        feats = self.encoder(images)
-        feats = self.proj(feats)
-        feats = feats.mean(dim=2)
-        feats = feats.permute(2, 0, 1)
-        tgt = tgt.permute(1, 0)
-        emb = self.embed(tgt)
-        emb = emb + self.pos[:, :emb.size(0), :]
-        mask = nn.Transformer.generate_square_subsequent_mask(emb.size(0)).to(emb.device)
-        dec = self.decoder(emb, feats, tgt_mask=mask)
-        return self.fc(dec)
-# ============================================================
-# TRAINING LOOP WITH LOGGING
-# ============================================================
-def train_epoch(model, loader, optimizer, scaler, tokenizer, device, epoch):
-    model.train()
-    total_loss = 0
-    processed_images = 0
-    progress = tqdm(loader, desc=f"Epoch {epoch} Training", unit="batch")
-    for images, tgt, batch_filenames in progress:
-        images, tgt = images.to(device), tgt.to(device)
-        processed_images += len(batch_filenames)
-        print("\n🖼️ Processing batch images:")
-        for name in batch_filenames:
-            print("  -", name)
-        print(f"📊 Processed {processed_images} / {len(loader.dataset)} images\n")
-        optimizer.zero_grad()
-        with autocast():
-            logits = model(images, tgt)
-            logits = logits.permute(1, 0, 2)
-            loss = F.cross_entropy(
-                logits.reshape(-1, logits.size(-1)),
-                tgt.reshape(-1),
-                ignore_index=tokenizer.pad
             )
-        scaler.scale(loss).backward()
-        scaler.step(optimizer)
-        scaler.update()
-        total_loss += loss.item()
-        progress.set_postfix({"loss": loss.item()})
-    return total_loss / len(loader)
-# ============================================================
-# MAIN
-# ============================================================
-def main():
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--data_dir", type=str, default="PRINTED_TEX_230k")
-    parser.add_argument("--batch", type=int, default=8)
-    parser.add_argument("--epochs", type=int, default=5)
-    args = parser.parse_args()
-    root = Path(args.data_dir)
-    image_files = open(root / "corresponding_png_images.txt").read().splitlines()
-    formulas = open(root / "final_png_formulas.txt").read().splitlines()
-    n = min(len(image_files), len(formulas))
-    image_files = image_files[:n]
-    formulas = formulas[:n]
-    print(f"Loaded {n} image-formula pairs")
-    tokenizer = CharTokenizer()
-    tokenizer.build(formulas)
-    dataset = TexDataset(image_files, formulas, root)
-    loader = DataLoader(
-        dataset,
-        batch_size=args.batch,
-        shuffle=True,
-        collate_fn=lambda b: collate_fn(b, tokenizer),
-        num_workers=2
-    )
-    device = "cuda" if torch.cuda.is_available() else "cpu"
-    print("Using device:", device)
-    model = Img2Latex(len(tokenizer)).to(device)
-    optimizer = AdamW(model.parameters(), lr=3e-4)
-    scaler = GradScaler()
-    for epoch in range(1, args.epochs + 1):
-        loss = train_epoch(model, loader, optimizer, scaler, tokenizer, device, epoch)
-        print(f"Epoch {epoch} complete — Loss: {loss:.4f}")
-        torch.save(model.state_dict(), f"model_epoch{epoch}.pth")
-    print("Training complete!")
-if __name__ == "__main__":
-    main()
-EOF
-# ----------------------------------------------------
-# ALWAYS DOWNLOAD DATASET FROM KAGGLEHUB AND BUILD STRUCTURE
-# ----------------------------------------------------
-RUN python3 - << 'EOF'
-import kagglehub, os, shutil
-print("\n⬇️  Downloading dataset from KaggleHub...\n")
-download_path = kagglehub.dataset_download("gregoryeritsyan/im2latex-230k")
-print("📥 Downloaded to:", download_path)
-# Create final dataset structure
-target = "/workspace/PRINTED_TEX_230k"
-if os.path.exists(target):
-    shutil.rmtree(target)
-os.makedirs(target + "/generated_png_images", exist_ok=True)
-# Mapping from KaggleHub structure to your structure
-mapping = {
-    "formulas.txt": "final_png_formulas.txt",
-    "formula_images.txt": "corresponding_png_images.txt",
-    "vocab.json": "230k.json",
-}
-# Move text files
-for src, dst in mapping.items():
-    src_path = os.path.join(download_path, src)
-    if os.path.exists(src_path):
-        shutil.move(src_path, os.path.join(target, dst))
-        print(f"✔ Mapped {src} → {dst}")
-# Move image directory
-images_src = os.path.join(download_path, "images")
-if os.path.exists(images_src):
-    shutil.move(images_src, os.path.join(target, "generated_png_images"))
-    print("✔ Mapped images/ → generated_png_images/")
-print("\n🎉 Dataset prepared at:", target)
-EOF
-# ----------------------------------------------------
-# Run training by default
-# ----------------------------------------------------
-CMD ["python3", "main.py", "--data_dir", "PRINTED_TEX_230k", "--epochs", "5"]

 import os
+import tempfile
+import subprocess
 from pathlib import Path
+from flask import Flask, request, jsonify
+import logging
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+app = Flask(__name__)
+# Configure maximum file size (16MB)
+app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024
+@app.route('/itl', methods=['POST'])
+def image_to_latex():
+    """Convert uploaded image to LaTeX code using TextTeller."""
+    try:
+        # Check if image file is present
+        if 'image' not in request.files:
+            return jsonify({'error': 'No image file provided'}), 400
+        file = request.files['image']
+        if file.filename == '':
+            return jsonify({'error': 'No file selected'}), 400
+        # Validate file type
+        allowed_extensions = {'png', 'jpg', 'jpeg', 'gif', 'bmp', 'tiff'}
+        file_ext = file.filename.rsplit('.', 1)[-1].lower() if '.' in file.filename else ''
+        if file_ext not in allowed_extensions:
+            return jsonify({'error': f'Invalid file type. Allowed: {", ".join(allowed_extensions)}'}), 400
+        # Create temporary file to save uploaded image
+        with tempfile.NamedTemporaryFile(delete=False, suffix=f'.{file_ext}') as tmp_file:
+            file.save(tmp_file.name)
+            temp_image_path = tmp_file.name
+        try:
+            # Run texteller inference command
+            logger.info(f"Processing image: {temp_image_path}")
+            result = subprocess.run(
+                ['texteller', 'inference', temp_image_path],
+                capture_output=True,
+                text=True,
+                timeout=30  # 30 second timeout
             )
+            if result.returncode == 0:
+                # Extract LaTeX from output
+                latex_output = result.stdout.strip()
+                logger.info(f"Successfully processed image. LaTeX length: {len(latex_output)}")
+                return jsonify({
+                    'success': True,
+                    'latex': latex_output
+                })
+            else:
+                logger.error(f"TextTeller inference failed: {result.stderr}")
+                return jsonify({
+                    'error': 'Failed to process image',
+                    'details': result.stderr
+                }), 500
+        except subprocess.TimeoutExpired:
+            logger.error("TextTeller inference timed out")
+            return jsonify({'error': 'Processing timed out'}), 408
+        except Exception as e:
+            logger.error(f"Error during processing: {str(e)}")
+            return jsonify({'error': f'Processing error: {str(e)}'}), 500
+        finally:
+            # Clean up temporary file
+            try:
+                os.unlink(temp_image_path)
+                logger.info(f"Cleaned up temporary file: {temp_image_path}")
+            except OSError as e:
+                logger.warning(f"Failed to delete temporary file {temp_image_path}: {e}")
+    except Exception as e:
+        logger.error(f"Unexpected error: {str(e)}")
+        return jsonify({'error': f'Unexpected error: {str(e)}'}), 500
+@app.route('/health', methods=['GET'])
+def health_check():
+    """Health check endpoint."""
+    try:
+        # Test if texteller is available
+        result = subprocess.run(['texteller', '--help'], capture_output=True, timeout=5)
+        texteller_available = result.returncode == 0
+        return jsonify({
+            'status': 'healthy',
+            'texteller_available': texteller_available
+        })
+    except Exception as e:
+        return jsonify({
+            'status': 'unhealthy',
+            'error': str(e)
+        }), 503
+@app.route('/', methods=['GET'])
+def root():
+    """Root endpoint with API documentation."""
+    return jsonify({
+        'service': 'Image to LaTeX API',
+        'version': '1.0.0',
+        'endpoints': {
+            'POST /itl': 'Convert image to LaTeX. Send image file as multipart/form-data with key "image"',
+            'GET /health': 'Health check endpoint',
+            'GET /': 'This documentation'
+        },
+        'supported_formats': ['png', 'jpg', 'jpeg', 'gif', 'bmp', 'tiff'],
+        'max_file_size': '16MB'
+    })
+if __name__ == '__main__':
+    # Check if texteller is installed
+    try:
+        result = subprocess.run(['texteller', '--help'], capture_output=True)
+        if result.returncode != 0:
+            logger.warning("TextTeller might not be properly installed")
+    except FileNotFoundError:
+        logger.error("TextTeller is not installed. Please install it with: pip install texteller")
+    app.run(host='0.0.0.0', port=int(os.environ.get('PORT', 5000)), debug=False)

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+flask>=2.3.0
+texteller
+pillow>=10.0.0
+gunicorn>=21.2.0