--- title: Odia OCR Annotation + Synthetic Generator emoji: 🧩 colorFrom: indigo colorTo: yellow sdk: docker sdk_version: "1.0.0" pinned: false --- # Odia OCR Annotation + Synthetic Text Generator A unified repository that provides: - An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs. - A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing. ## Repository Structure - `backend/` - `app/main.py`: FastAPI app with two routers: `/api/ocr` and `/api/synthetic` - `app/api/routers/ocr.py`: OCR endpoints (upload, OCR, annotations import/export) - `app/api/routers/synthetic.py`: Synthetic generation endpoints - `app/services/`: Shared services - `ocr_processor.py`: Gemini OCR - `annotations.py`: CSV/JSON I/O - `synthetic/`: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor) - `data/`: runtime storage - `uploaded_images/`: uploaded images (served at `/images`) - `annotations/`: `annotations.csv` and JSON - `synth_outputs/`: generated images and CSVs (served at `/static/synthetic`) - `requirements.txt`: backend dependencies - `frontend/` - Vite + React + Tailwind app - Routes: `/ocr` (annotation UI) and `/synthetic` (generator UI) - `content/static/`: NotoSans Oriya fonts used by generator ## Run Locally 1) Backend - `pip install -r backend/requirements.txt` - From `backend/`: `uvicorn app.main:app --reload` - Static mounts: - `/images` → `backend/data/uploaded_images` - `/static/synthetic` → `backend/data/synth_outputs` 2) Frontend - `cd frontend && npm install && npm run dev` - Open `http://localhost:5173` - Use navigation to switch between OCR and Synthetic pages ## OCR API (FastAPI) - `POST /api/ocr/upload`: - Multipart files field: `files` - Stores images in `backend/data/uploaded_images` - `POST /api/ocr/process`: - JSON: `{ "api_key": "", "image_filenames": ["img1.png", ...] }` - Returns: `{ "img1.png": "extracted text", ... }` - `GET /api/ocr/annotations`: - Returns current annotations, valid/missing images - `POST /api/ocr/save`: - JSON: `{ "": { "extracted_text": "...", "validated_text": "..." } }` - Saves to CSV and JSON in `backend/data/annotations` - `POST /api/ocr/import`: - Multipart: `file` (CSV), `image_folder` (e.g., `uploaded_images`) - Validates and returns annotations + image presence - `POST /api/ocr/export`: - JSON: `{ annotations: {...}, validated_texts: {...} }` - Returns a downloadable CSV Note: Legacy endpoints (`/upload/`, `/process-ocr/`, etc.) are temporarily supported for the older UI. Prefer `/api/ocr/...` going forward. ## Synthetic API (FastAPI) - `POST /api/synthetic/generate` - Modes: `single`, `comprehensive`, `ultra-realistic`, `huggingface` - Request body examples: - Non-HF: `{ "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }` - HF CSV: `{ "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }` - Response: - Non-HF: `{ "status": "ok", "output_dir": "/static/synthetic/" }` - HF: `{ "status": "ok", "output_dir": "/static/synthetic/", "csv": "/static/synthetic//dataset.csv", "images_dir": "/static/synthetic//images" }` - Outputs are stored under `backend/data/synth_outputs//` and publicly served at `/static/synthetic//...`. ## Fonts - Generator uses fonts from `content/static/`. - Default: `NotoSansOriya_Condensed-Regular.ttf` (configurable). Ensure the directory exists. ## Effects & Styles - Paper styles: lined paper, old paper, birch, parchment - Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps ## Notes - The backend expects the Gemini API key to be provided per-request to `/api/ocr/process`. Do not hardcode keys server-side. - For HuggingFace datasets, the backend uses `datasets` when possible, or downloads raw CSV URLs. - You can browse generated outputs via the links returned by `/api/synthetic/generate`. ## Deploy to Hugging Face Spaces (Docker) This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space. Steps: - Create a new Space → Type: Docker - Push this repository to the Space - In Space Settings: - Enable Persistent Storage - (Optional) Add Secrets/Env Vars as needed, e.g., `DATA_DIR=/data` (default already) and `FRONTEND_DIST=/app/frontend_dist` - The container exposes port `7860` by default. What the image does: - Builds the frontend (`frontend/`) and copies the `dist/` to `/app/frontend_dist` - Installs backend dependencies and runs `uvicorn app.main:app` from `backend/` - Serves: - API at `/api/...` - Uploaded images at `/images` - Synthetic outputs at `/static/synthetic` - Frontend SPA at `/` (served from `/app/frontend_dist`) 1. **Paper Textures**: Realistic paper fiber patterns using Perlin noise 2. **Aging Effects**: Edge darkening and aging patterns 3. **Physical Damage**: Fold lines, creases, and ink bleeding 4. **Scanner Artifacts**: Dust, compression artifacts, scanning lines 5. **Geometric Distortions**: Perspective changes, cylindrical warping 6. **Lighting Effects**: Shadows and lens distortions ## Font Requirements The generator requires appropriate fonts for text rendering. Default configuration expects: - Font directory: `/content/static/` - Font file: `NotoSansOriya_ExtraCondensed-Regular.ttf` You can specify custom fonts using `--font-dir` and `--font` parameters. ## Performance Tips - Use `--max-samples` to limit processing for large datasets - Disable advanced effects with `--no-advanced-effects` for faster generation - Use multiprocessing with `--use-multiprocessing` for batch jobs - Adjust image dimensions to balance quality and speed ## Error Handling The package includes comprehensive error handling: - Graceful fallbacks for missing dependencies - Detailed logging for debugging - Validation of input parameters - Safe handling of malformed datasets ## Contributing The modular structure makes it easy to extend: - Add new effects in `effects.py` - Implement new background styles in `backgrounds.py` - Create custom transformations in `transformations.py` - Extend dataset processing in `huggingface_processor.py` ## License [Add your license information here] --- **Note**: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.