Spaces:
Running
Running
OCR Dataset Generator - Architecture
ποΈ System Architecture
flowchart TB
subgraph Client["π₯οΈ Browser (Client-Side)"]
UI[Next.js React UI]
Config[Configuration Panel]
Preview[Preview Panel]
Generator[Dataset Generator]
Canvas[HTML5 Canvas API]
JSZip[JSZip Library]
end
subgraph Assets["π User Assets"]
Fonts[Custom Fonts TTF/OTF]
TextFile[Text Corpus File]
BgImages[Custom Backgrounds]
end
subgraph Output["π¦ Generated Output"]
Images[PNG Images]
Labels[labels.txt]
JSONL[data.jsonl]
CSV[data.csv]
Metadata[metadata.csv]
ZIP[ZIP Archive]
end
Assets --> Config
Config --> Generator
Generator --> Canvas
Canvas --> Images
Generator --> JSZip
Images --> JSZip
Labels --> JSZip
JSONL --> JSZip
CSV --> JSZip
Metadata --> JSZip
JSZip --> ZIP
ZIP --> Download[Download to User]
π Data Flow
sequenceDiagram
participant User
participant ConfigPanel
participant Generator
participant Canvas
participant JSZip
User->>ConfigPanel: Upload text file + fonts
ConfigPanel->>ConfigPanel: Parse text (word/char/line)
User->>ConfigPanel: Set image size, background, augmentation
User->>Generator: Click "Start Generation"
loop For each text sample
Generator->>Canvas: Create canvas with background
Generator->>Canvas: Render text with font
Generator->>Canvas: Apply augmentations
Canvas->>Generator: Return PNG blob
Generator->>JSZip: Add image to zip
end
Generator->>JSZip: Add label files
JSZip->>User: Download ZIP
π§© Component Architecture
flowchart LR
subgraph Pages["app/"]
Page[page.tsx]
end
subgraph Components["components/"]
Header[header.tsx]
ConfigPanel[config-panel.tsx]
PreviewPanel[preview-panel.tsx]
GenerationPanel[generation-panel.tsx]
StatsPanel[stats-panel.tsx]
end
subgraph Library["lib/"]
Generator[generator.ts]
Constants[constants.ts]
Utils[utils.ts]
end
Page --> Header
Page --> ConfigPanel
Page --> PreviewPanel
Page --> GenerationPanel
Page --> StatsPanel
GenerationPanel --> Generator
ConfigPanel --> Constants
PreviewPanel --> Constants
π¨ Generation Pipeline
flowchart TD
A[Text Data] --> B{For Each Sample}
B --> C[Select Font by %]
B --> D[Select Background]
D --> D1{Mode?}
D1 -->|Single| D2[Use Selected Style]
D1 -->|Mix| D3[Random by Percentages]
D1 -->|Custom| D4[Use Uploaded Image]
C --> E[Create Canvas]
D2 --> E
D3 --> E
D4 --> E
E --> F{Apply Augmentation?}
F -->|Yes| G[Random Transform]
G --> G1[Rotation]
G --> G2[Skew]
G --> G3[Brightness]
G --> G4[Noise]
G --> G5[Blur]
F -->|No| H[Clean Sample]
G1 --> I[Render Text]
G2 --> I
G3 --> I
G4 --> I
G5 --> I
H --> I
I --> J[Export PNG]
J --> K[Add to ZIP]
K --> B
π Project Structure
OCR_TEXT_RECOG_DATASET_MAKER/
βββ web/ # Next.js Web Application
β βββ app/ # Next.js App Router
β β βββ page.tsx # Main page component
β β βββ layout.tsx # Root layout
β β βββ globals.css # Global styles
β βββ components/ # React Components
β β βββ config-panel.tsx # Configuration UI
β β βββ preview-panel.tsx # Live preview
β β βββ generation-panel.tsx # Generation controls
β β βββ stats-panel.tsx # Statistics display
β β βββ header.tsx # App header
β βββ lib/ # Utilities
β β βββ generator.ts # Core generation logic
β β βββ constants.ts # Types & defaults
β β βββ utils.ts # Helper functions
β βββ public/ # Static assets
βββ src/ # CLI Tool (TypeScript)
βββ input/ # Sample text files
βββ Dockerfile # HF Spaces deployment
βββ README.md # Documentation
βββ .gitignore
π Quick Setup Guide
Local Development
Prerequisites
- Node.js 18+ (recommended: 20+)
- npm or yarn
Installation
# Clone the repository
git clone https://github.com/YOUR_USERNAME/OCR_TEXT_RECOG_DATASET_MAKER.git
cd OCR_TEXT_RECOG_DATASET_MAKER
# Install web dependencies
cd web
npm install
# Start development server
npm run dev
Open http://localhost:3000 in your browser.
Production Build
cd web
npm run build
npm start
π³ Docker Deployment
Local Docker
# Build image
docker build -t ocr-dataset-generator .
# Run container
docker run -p 7860:7860 ocr-dataset-generator
Hugging Face Spaces
- Create a new Space at https://huggingface.co/spaces
- Select "Docker" as the SDK
- Push your code:
git remote add hf https://YOUR_USERNAME:YOUR_HF_TOKEN@huggingface.co/spaces/YOUR_USERNAME/SPACE_NAME
git push --force hf master:main
π€ GitHub Push Commands
First Time Setup
cd d:\OCR_TEXT_RECOG_DATASET_MAKER
# Initialize git (if not already)
git init
# Add all files
git add -A
# Commit
git commit -m "OCR Dataset Generator - Full Release"
# Add GitHub remote
git remote add origin https://github.com/YOUR_USERNAME/OCR_TEXT_RECOG_DATASET_MAKER.git
# Push to GitHub
git push -u origin master
Subsequent Updates
git add -A
git commit -m "Your commit message"
git push origin master
βοΈ Configuration Options
| Setting | Description | Default |
|---|---|---|
| Dataset Size | Number of images to generate | 100 |
| Image Width | Output image width in pixels | 256 |
| Image Height | Output image height in pixels | 64 |
| Segmentation | word, character, line, sentence, ngram | word |
| Background | 12 preset styles + custom images | clean_white |
| Augmentation % | Percentage of samples to augment | 70% |
| Text Direction | RTL or LTR | RTL |
π¦ Output Formats
| Format | Files | Use Case |
|---|---|---|
| CRNN | labels.txt | PaddleOCR, CRNN training |
| TrOCR | data.jsonl | HuggingFace TrOCR |
| CSV | data.csv | General ML pipelines |
| HuggingFace | metadata.csv | HF Datasets upload |
π§ Tech Stack
- Frontend: Next.js 14, React, TypeScript
- Styling: Tailwind CSS, Glassmorphism
- Generation: HTML5 Canvas API
- Packaging: JSZip, FileSaver.js
- Deployment: Docker, HuggingFace Spaces