OCR_DATASET_MAKER / ARCHITECTURE.md
Omarrran's picture
Add architecture docs and setup guide
b79ce86

OCR Dataset Generator - Architecture

πŸ—οΈ System Architecture

flowchart TB
    subgraph Client["πŸ–₯️ Browser (Client-Side)"]
        UI[Next.js React UI]
        Config[Configuration Panel]
        Preview[Preview Panel]
        Generator[Dataset Generator]
        Canvas[HTML5 Canvas API]
        JSZip[JSZip Library]
    end
    
    subgraph Assets["πŸ“ User Assets"]
        Fonts[Custom Fonts TTF/OTF]
        TextFile[Text Corpus File]
        BgImages[Custom Backgrounds]
    end
    
    subgraph Output["πŸ“¦ Generated Output"]
        Images[PNG Images]
        Labels[labels.txt]
        JSONL[data.jsonl]
        CSV[data.csv]
        Metadata[metadata.csv]
        ZIP[ZIP Archive]
    end
    
    Assets --> Config
    Config --> Generator
    Generator --> Canvas
    Canvas --> Images
    Generator --> JSZip
    Images --> JSZip
    Labels --> JSZip
    JSONL --> JSZip
    CSV --> JSZip
    Metadata --> JSZip
    JSZip --> ZIP
    ZIP --> Download[Download to User]

πŸ“Š Data Flow

sequenceDiagram
    participant User
    participant ConfigPanel
    participant Generator
    participant Canvas
    participant JSZip
    
    User->>ConfigPanel: Upload text file + fonts
    ConfigPanel->>ConfigPanel: Parse text (word/char/line)
    User->>ConfigPanel: Set image size, background, augmentation
    User->>Generator: Click "Start Generation"
    
    loop For each text sample
        Generator->>Canvas: Create canvas with background
        Generator->>Canvas: Render text with font
        Generator->>Canvas: Apply augmentations
        Canvas->>Generator: Return PNG blob
        Generator->>JSZip: Add image to zip
    end
    
    Generator->>JSZip: Add label files
    JSZip->>User: Download ZIP

🧩 Component Architecture

flowchart LR
    subgraph Pages["app/"]
        Page[page.tsx]
    end
    
    subgraph Components["components/"]
        Header[header.tsx]
        ConfigPanel[config-panel.tsx]
        PreviewPanel[preview-panel.tsx]
        GenerationPanel[generation-panel.tsx]
        StatsPanel[stats-panel.tsx]
    end
    
    subgraph Library["lib/"]
        Generator[generator.ts]
        Constants[constants.ts]
        Utils[utils.ts]
    end
    
    Page --> Header
    Page --> ConfigPanel
    Page --> PreviewPanel
    Page --> GenerationPanel
    Page --> StatsPanel
    
    GenerationPanel --> Generator
    ConfigPanel --> Constants
    PreviewPanel --> Constants

🎨 Generation Pipeline

flowchart TD
    A[Text Data] --> B{For Each Sample}
    B --> C[Select Font by %]
    B --> D[Select Background]
    D --> D1{Mode?}
    D1 -->|Single| D2[Use Selected Style]
    D1 -->|Mix| D3[Random by Percentages]
    D1 -->|Custom| D4[Use Uploaded Image]
    
    C --> E[Create Canvas]
    D2 --> E
    D3 --> E
    D4 --> E
    
    E --> F{Apply Augmentation?}
    F -->|Yes| G[Random Transform]
    G --> G1[Rotation]
    G --> G2[Skew]
    G --> G3[Brightness]
    G --> G4[Noise]
    G --> G5[Blur]
    F -->|No| H[Clean Sample]
    
    G1 --> I[Render Text]
    G2 --> I
    G3 --> I
    G4 --> I
    G5 --> I
    H --> I
    
    I --> J[Export PNG]
    J --> K[Add to ZIP]
    K --> B

πŸ“ Project Structure

OCR_TEXT_RECOG_DATASET_MAKER/
β”œβ”€β”€ web/                      # Next.js Web Application
β”‚   β”œβ”€β”€ app/                  # Next.js App Router
β”‚   β”‚   β”œβ”€β”€ page.tsx          # Main page component
β”‚   β”‚   β”œβ”€β”€ layout.tsx        # Root layout
β”‚   β”‚   └── globals.css       # Global styles
β”‚   β”œβ”€β”€ components/           # React Components
β”‚   β”‚   β”œβ”€β”€ config-panel.tsx  # Configuration UI
β”‚   β”‚   β”œβ”€β”€ preview-panel.tsx # Live preview
β”‚   β”‚   β”œβ”€β”€ generation-panel.tsx # Generation controls
β”‚   β”‚   β”œβ”€β”€ stats-panel.tsx   # Statistics display
β”‚   β”‚   └── header.tsx        # App header
β”‚   β”œβ”€β”€ lib/                  # Utilities
β”‚   β”‚   β”œβ”€β”€ generator.ts      # Core generation logic
β”‚   β”‚   β”œβ”€β”€ constants.ts      # Types & defaults
β”‚   β”‚   └── utils.ts          # Helper functions
β”‚   └── public/               # Static assets
β”œβ”€β”€ src/                      # CLI Tool (TypeScript)
β”œβ”€β”€ input/                    # Sample text files
β”œβ”€β”€ Dockerfile                # HF Spaces deployment
β”œβ”€β”€ README.md                 # Documentation
└── .gitignore

πŸš€ Quick Setup Guide

Local Development

Prerequisites

  • Node.js 18+ (recommended: 20+)
  • npm or yarn

Installation

# Clone the repository
git clone https://github.com/YOUR_USERNAME/OCR_TEXT_RECOG_DATASET_MAKER.git
cd OCR_TEXT_RECOG_DATASET_MAKER

# Install web dependencies
cd web
npm install

# Start development server
npm run dev

Open http://localhost:3000 in your browser.

Production Build

cd web
npm run build
npm start

🐳 Docker Deployment

Local Docker

# Build image
docker build -t ocr-dataset-generator .

# Run container
docker run -p 7860:7860 ocr-dataset-generator

Open http://localhost:7860

Hugging Face Spaces

  1. Create a new Space at https://huggingface.co/spaces
  2. Select "Docker" as the SDK
  3. Push your code:
git remote add hf https://YOUR_USERNAME:YOUR_HF_TOKEN@huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
git push --force hf master:main

πŸ“€ GitHub Push Commands

First Time Setup

cd d:\OCR_TEXT_RECOG_DATASET_MAKER

# Initialize git (if not already)
git init

# Add all files
git add -A

# Commit
git commit -m "OCR Dataset Generator - Full Release"

# Add GitHub remote
git remote add origin https://github.com/YOUR_USERNAME/OCR_TEXT_RECOG_DATASET_MAKER.git

# Push to GitHub
git push -u origin master

Subsequent Updates

git add -A
git commit -m "Your commit message"
git push origin master

βš™οΈ Configuration Options

Setting Description Default
Dataset Size Number of images to generate 100
Image Width Output image width in pixels 256
Image Height Output image height in pixels 64
Segmentation word, character, line, sentence, ngram word
Background 12 preset styles + custom images clean_white
Augmentation % Percentage of samples to augment 70%
Text Direction RTL or LTR RTL

πŸ“¦ Output Formats

Format Files Use Case
CRNN labels.txt PaddleOCR, CRNN training
TrOCR data.jsonl HuggingFace TrOCR
CSV data.csv General ML pipelines
HuggingFace metadata.csv HF Datasets upload

πŸ”§ Tech Stack

  • Frontend: Next.js 14, React, TypeScript
  • Styling: Tailwind CSS, Glassmorphism
  • Generation: HTML5 Canvas API
  • Packaging: JSZip, FileSaver.js
  • Deployment: Docker, HuggingFace Spaces