OCR-annotation / README.md
Sk4467's picture
Upload 108 files
1e83c8a verified
---
title: Odia OCR Annotation + Synthetic Generator
emoji: 🧩
colorFrom: indigo
colorTo: yellow
sdk: docker
sdk_version: "1.0.0"
pinned: false
---
# Odia OCR Annotation + Synthetic Text Generator
A unified repository that provides:
- An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs.
- A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing.
## Repository Structure
- `backend/`
- `app/main.py`: FastAPI app with two routers: `/api/ocr` and `/api/synthetic`
- `app/api/routers/ocr.py`: OCR endpoints (upload, OCR, annotations import/export)
- `app/api/routers/synthetic.py`: Synthetic generation endpoints
- `app/services/`: Shared services
- `ocr_processor.py`: Gemini OCR
- `annotations.py`: CSV/JSON I/O
- `synthetic/`: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor)
- `data/`: runtime storage
- `uploaded_images/`: uploaded images (served at `/images`)
- `annotations/`: `annotations.csv` and JSON
- `synth_outputs/`: generated images and CSVs (served at `/static/synthetic`)
- `requirements.txt`: backend dependencies
- `frontend/`
- Vite + React + Tailwind app
- Routes: `/ocr` (annotation UI) and `/synthetic` (generator UI)
- `content/static/`: NotoSans Oriya fonts used by generator
## Run Locally
1) Backend
- `pip install -r backend/requirements.txt`
- From `backend/`: `uvicorn app.main:app --reload`
- Static mounts:
- `/images``backend/data/uploaded_images`
- `/static/synthetic``backend/data/synth_outputs`
2) Frontend
- `cd frontend && npm install && npm run dev`
- Open `http://localhost:5173`
- Use navigation to switch between OCR and Synthetic pages
## OCR API (FastAPI)
- `POST /api/ocr/upload`:
- Multipart files field: `files`
- Stores images in `backend/data/uploaded_images`
- `POST /api/ocr/process`:
- JSON: `{ "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] }`
- Returns: `{ "img1.png": "extracted text", ... }`
- `GET /api/ocr/annotations`:
- Returns current annotations, valid/missing images
- `POST /api/ocr/save`:
- JSON: `{ "<filename>": { "extracted_text": "...", "validated_text": "..." } }`
- Saves to CSV and JSON in `backend/data/annotations`
- `POST /api/ocr/import`:
- Multipart: `file` (CSV), `image_folder` (e.g., `uploaded_images`)
- Validates and returns annotations + image presence
- `POST /api/ocr/export`:
- JSON: `{ annotations: {...}, validated_texts: {...} }`
- Returns a downloadable CSV
Note: Legacy endpoints (`/upload/`, `/process-ocr/`, etc.) are temporarily supported for the older UI. Prefer `/api/ocr/...` going forward.
## Synthetic API (FastAPI)
- `POST /api/synthetic/generate`
- Modes: `single`, `comprehensive`, `ultra-realistic`, `huggingface`
- Request body examples:
- Non-HF:
`{ "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }`
- HF CSV:
`{ "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }`
- Response:
- Non-HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>" }`
- HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }`
- Outputs are stored under `backend/data/synth_outputs/<job_id>/` and publicly served at `/static/synthetic/<job_id>/...`.
## Fonts
- Generator uses fonts from `content/static/`.
- Default: `NotoSansOriya_Condensed-Regular.ttf` (configurable). Ensure the directory exists.
## Effects & Styles
- Paper styles: lined paper, old paper, birch, parchment
- Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps
## Notes
- The backend expects the Gemini API key to be provided per-request to `/api/ocr/process`. Do not hardcode keys server-side.
- For HuggingFace datasets, the backend uses `datasets` when possible, or downloads raw CSV URLs.
- You can browse generated outputs via the links returned by `/api/synthetic/generate`.
## Deploy to Hugging Face Spaces (Docker)
This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space.
Steps:
- Create a new Space → Type: Docker
- Push this repository to the Space
- In Space Settings:
- Enable Persistent Storage
- (Optional) Add Secrets/Env Vars as needed, e.g., `DATA_DIR=/data` (default already) and `FRONTEND_DIST=/app/frontend_dist`
- The container exposes port `7860` by default.
What the image does:
- Builds the frontend (`frontend/`) and copies the `dist/` to `/app/frontend_dist`
- Installs backend dependencies and runs `uvicorn app.main:app` from `backend/`
- Serves:
- API at `/api/...`
- Uploaded images at `/images`
- Synthetic outputs at `/static/synthetic`
- Frontend SPA at `/` (served from `/app/frontend_dist`)
1. **Paper Textures**: Realistic paper fiber patterns using Perlin noise
2. **Aging Effects**: Edge darkening and aging patterns
3. **Physical Damage**: Fold lines, creases, and ink bleeding
4. **Scanner Artifacts**: Dust, compression artifacts, scanning lines
5. **Geometric Distortions**: Perspective changes, cylindrical warping
6. **Lighting Effects**: Shadows and lens distortions
## Font Requirements
The generator requires appropriate fonts for text rendering. Default configuration expects:
- Font directory: `/content/static/`
- Font file: `NotoSansOriya_ExtraCondensed-Regular.ttf`
You can specify custom fonts using `--font-dir` and `--font` parameters.
## Performance Tips
- Use `--max-samples` to limit processing for large datasets
- Disable advanced effects with `--no-advanced-effects` for faster generation
- Use multiprocessing with `--use-multiprocessing` for batch jobs
- Adjust image dimensions to balance quality and speed
## Error Handling
The package includes comprehensive error handling:
- Graceful fallbacks for missing dependencies
- Detailed logging for debugging
- Validation of input parameters
- Safe handling of malformed datasets
## Contributing
The modular structure makes it easy to extend:
- Add new effects in `effects.py`
- Implement new background styles in `backgrounds.py`
- Create custom transformations in `transformations.py`
- Extend dataset processing in `huggingface_processor.py`
## License
[Add your license information here]
---
**Note**: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.