Spaces:
Sleeping
Sleeping
| title: Odia OCR Annotation + Synthetic Generator | |
| emoji: 🧩 | |
| colorFrom: indigo | |
| colorTo: yellow | |
| sdk: docker | |
| sdk_version: "1.0.0" | |
| pinned: false | |
| # Odia OCR Annotation + Synthetic Text Generator | |
| A unified repository that provides: | |
| - An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs. | |
| - A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing. | |
| ## Repository Structure | |
| - `backend/` | |
| - `app/main.py`: FastAPI app with two routers: `/api/ocr` and `/api/synthetic` | |
| - `app/api/routers/ocr.py`: OCR endpoints (upload, OCR, annotations import/export) | |
| - `app/api/routers/synthetic.py`: Synthetic generation endpoints | |
| - `app/services/`: Shared services | |
| - `ocr_processor.py`: Gemini OCR | |
| - `annotations.py`: CSV/JSON I/O | |
| - `synthetic/`: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor) | |
| - `data/`: runtime storage | |
| - `uploaded_images/`: uploaded images (served at `/images`) | |
| - `annotations/`: `annotations.csv` and JSON | |
| - `synth_outputs/`: generated images and CSVs (served at `/static/synthetic`) | |
| - `requirements.txt`: backend dependencies | |
| - `frontend/` | |
| - Vite + React + Tailwind app | |
| - Routes: `/ocr` (annotation UI) and `/synthetic` (generator UI) | |
| - `content/static/`: NotoSans Oriya fonts used by generator | |
| ## Run Locally | |
| 1) Backend | |
| - `pip install -r backend/requirements.txt` | |
| - From `backend/`: `uvicorn app.main:app --reload` | |
| - Static mounts: | |
| - `/images` → `backend/data/uploaded_images` | |
| - `/static/synthetic` → `backend/data/synth_outputs` | |
| 2) Frontend | |
| - `cd frontend && npm install && npm run dev` | |
| - Open `http://localhost:5173` | |
| - Use navigation to switch between OCR and Synthetic pages | |
| ## OCR API (FastAPI) | |
| - `POST /api/ocr/upload`: | |
| - Multipart files field: `files` | |
| - Stores images in `backend/data/uploaded_images` | |
| - `POST /api/ocr/process`: | |
| - JSON: `{ "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] }` | |
| - Returns: `{ "img1.png": "extracted text", ... }` | |
| - `GET /api/ocr/annotations`: | |
| - Returns current annotations, valid/missing images | |
| - `POST /api/ocr/save`: | |
| - JSON: `{ "<filename>": { "extracted_text": "...", "validated_text": "..." } }` | |
| - Saves to CSV and JSON in `backend/data/annotations` | |
| - `POST /api/ocr/import`: | |
| - Multipart: `file` (CSV), `image_folder` (e.g., `uploaded_images`) | |
| - Validates and returns annotations + image presence | |
| - `POST /api/ocr/export`: | |
| - JSON: `{ annotations: {...}, validated_texts: {...} }` | |
| - Returns a downloadable CSV | |
| Note: Legacy endpoints (`/upload/`, `/process-ocr/`, etc.) are temporarily supported for the older UI. Prefer `/api/ocr/...` going forward. | |
| ## Synthetic API (FastAPI) | |
| - `POST /api/synthetic/generate` | |
| - Modes: `single`, `comprehensive`, `ultra-realistic`, `huggingface` | |
| - Request body examples: | |
| - Non-HF: | |
| `{ "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }` | |
| - HF CSV: | |
| `{ "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }` | |
| - Response: | |
| - Non-HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>" }` | |
| - HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }` | |
| - Outputs are stored under `backend/data/synth_outputs/<job_id>/` and publicly served at `/static/synthetic/<job_id>/...`. | |
| ## Fonts | |
| - Generator uses fonts from `content/static/`. | |
| - Default: `NotoSansOriya_Condensed-Regular.ttf` (configurable). Ensure the directory exists. | |
| ## Effects & Styles | |
| - Paper styles: lined paper, old paper, birch, parchment | |
| - Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps | |
| ## Notes | |
| - The backend expects the Gemini API key to be provided per-request to `/api/ocr/process`. Do not hardcode keys server-side. | |
| - For HuggingFace datasets, the backend uses `datasets` when possible, or downloads raw CSV URLs. | |
| - You can browse generated outputs via the links returned by `/api/synthetic/generate`. | |
| ## Deploy to Hugging Face Spaces (Docker) | |
| This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space. | |
| Steps: | |
| - Create a new Space → Type: Docker | |
| - Push this repository to the Space | |
| - In Space Settings: | |
| - Enable Persistent Storage | |
| - (Optional) Add Secrets/Env Vars as needed, e.g., `DATA_DIR=/data` (default already) and `FRONTEND_DIST=/app/frontend_dist` | |
| - The container exposes port `7860` by default. | |
| What the image does: | |
| - Builds the frontend (`frontend/`) and copies the `dist/` to `/app/frontend_dist` | |
| - Installs backend dependencies and runs `uvicorn app.main:app` from `backend/` | |
| - Serves: | |
| - API at `/api/...` | |
| - Uploaded images at `/images` | |
| - Synthetic outputs at `/static/synthetic` | |
| - Frontend SPA at `/` (served from `/app/frontend_dist`) | |
| 1. **Paper Textures**: Realistic paper fiber patterns using Perlin noise | |
| 2. **Aging Effects**: Edge darkening and aging patterns | |
| 3. **Physical Damage**: Fold lines, creases, and ink bleeding | |
| 4. **Scanner Artifacts**: Dust, compression artifacts, scanning lines | |
| 5. **Geometric Distortions**: Perspective changes, cylindrical warping | |
| 6. **Lighting Effects**: Shadows and lens distortions | |
| ## Font Requirements | |
| The generator requires appropriate fonts for text rendering. Default configuration expects: | |
| - Font directory: `/content/static/` | |
| - Font file: `NotoSansOriya_ExtraCondensed-Regular.ttf` | |
| You can specify custom fonts using `--font-dir` and `--font` parameters. | |
| ## Performance Tips | |
| - Use `--max-samples` to limit processing for large datasets | |
| - Disable advanced effects with `--no-advanced-effects` for faster generation | |
| - Use multiprocessing with `--use-multiprocessing` for batch jobs | |
| - Adjust image dimensions to balance quality and speed | |
| ## Error Handling | |
| The package includes comprehensive error handling: | |
| - Graceful fallbacks for missing dependencies | |
| - Detailed logging for debugging | |
| - Validation of input parameters | |
| - Safe handling of malformed datasets | |
| ## Contributing | |
| The modular structure makes it easy to extend: | |
| - Add new effects in `effects.py` | |
| - Implement new background styles in `backgrounds.py` | |
| - Create custom transformations in `transformations.py` | |
| - Extend dataset processing in `huggingface_processor.py` | |
| ## License | |
| [Add your license information here] | |
| --- | |
| **Note**: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities. | |