Spaces:

Sk4467
/

OCR-annotation

Sleeping

App Files Files Community

OCR-annotation / README.md

Sk4467

Upload 108 files

1e83c8a verified 4 months ago

preview code

raw

history blame contribute delete

6.85 kB

	---
	title: Odia OCR Annotation + Synthetic Generator
	emoji: 🧩
	colorFrom: indigo
	colorTo: yellow
	sdk: docker
	sdk_version: "1.0.0"
	pinned: false
	---

	# Odia OCR Annotation + Synthetic Text Generator

	A unified repository that provides:
	- An OCR annotation tool (React frontend + FastAPI backend) to upload images, run OCR via Gemini, edit validated text, and export CSVs.
	- A synthetic text generator (exposed via backend API) to render Odia/Sanskrit-like text with realistic paper/effects, including HuggingFace dataset processing.

	## Repository Structure

	- `backend/`
	- `app/main.py`: FastAPI app with two routers: `/api/ocr` and `/api/synthetic`
	- `app/api/routers/ocr.py`: OCR endpoints (upload, OCR, annotations import/export)
	- `app/api/routers/synthetic.py`: Synthetic generation endpoints
	- `app/services/`: Shared services
	- `ocr_processor.py`: Gemini OCR
	- `annotations.py`: CSV/JSON I/O
	- `synthetic/`: generator modules (config, core, effects, backgrounds, text_renderer, transformations, huggingface_processor)
	- `data/`: runtime storage
	- `uploaded_images/`: uploaded images (served at `/images`)
	- `annotations/`: `annotations.csv` and JSON
	- `synth_outputs/`: generated images and CSVs (served at `/static/synthetic`)
	- `requirements.txt`: backend dependencies
	- `frontend/`
	- Vite + React + Tailwind app
	- Routes: `/ocr` (annotation UI) and `/synthetic` (generator UI)
	- `content/static/`: NotoSans Oriya fonts used by generator

	## Run Locally

	1) Backend
	- `pip install -r backend/requirements.txt`
	- From `backend/`: `uvicorn app.main:app --reload`
	- Static mounts:
	- `/images` → `backend/data/uploaded_images`
	- `/static/synthetic` → `backend/data/synth_outputs`

	2) Frontend
	- `cd frontend && npm install && npm run dev`
	- Open `http://localhost:5173`
	- Use navigation to switch between OCR and Synthetic pages

	## OCR API (FastAPI)

	- `POST /api/ocr/upload`:
	- Multipart files field: `files`
	- Stores images in `backend/data/uploaded_images`
	- `POST /api/ocr/process`:
	- JSON: `{ "api_key": "<GEMINI_KEY>", "image_filenames": ["img1.png", ...] }`
	- Returns: `{ "img1.png": "extracted text", ... }`
	- `GET /api/ocr/annotations`:
	- Returns current annotations, valid/missing images
	- `POST /api/ocr/save`:
	- JSON: `{ "<filename>": { "extracted_text": "...", "validated_text": "..." } }`
	- Saves to CSV and JSON in `backend/data/annotations`
	- `POST /api/ocr/import`:
	- Multipart: `file` (CSV), `image_folder` (e.g., `uploaded_images`)
	- Validates and returns annotations + image presence
	- `POST /api/ocr/export`:
	- JSON: `{ annotations: {...}, validated_texts: {...} }`
	- Returns a downloadable CSV

	Note: Legacy endpoints (`/upload/`, `/process-ocr/`, etc.) are temporarily supported for the older UI. Prefer `/api/ocr/...` going forward.

	## Synthetic API (FastAPI)

	- `POST /api/synthetic/generate`
	- Modes: `single`, `comprehensive`, `ultra-realistic`, `huggingface`
	- Request body examples:
	- Non-HF:
	`{ "mode": "single", "text": "some Odia text", "output_subdir": "demo_run_01" }`
	- HF CSV:
	`{ "mode": "huggingface", "dataset_url": "https://.../data.csv", "text_column": "text", "max_samples": 100, "output_subdir": "hf_demo" }`
	- Response:
	- Non-HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>" }`
	- HF: `{ "status": "ok", "output_dir": "/static/synthetic/<job_id>", "csv": "/static/synthetic/<job_id>/dataset.csv", "images_dir": "/static/synthetic/<job_id>/images" }`
	- Outputs are stored under `backend/data/synth_outputs/<job_id>/` and publicly served at `/static/synthetic/<job_id>/...`.

	## Fonts

	- Generator uses fonts from `content/static/`.
	- Default: `NotoSansOriya_Condensed-Regular.ttf` (configurable). Ensure the directory exists.

	## Effects & Styles

	- Paper styles: lined paper, old paper, birch, parchment
	- Effects: rotation, brightness/contrast/noise/blur, fold/crease, ink bleed, perspective, shadows, morphological ops, scanner artifacts, lens distortion, washboard/cylinder warps

	## Notes

	- The backend expects the Gemini API key to be provided per-request to `/api/ocr/process`. Do not hardcode keys server-side.
	- For HuggingFace datasets, the backend uses `datasets` when possible, or downloads raw CSV URLs.
	- You can browse generated outputs via the links returned by `/api/synthetic/generate`.

	## Deploy to Hugging Face Spaces (Docker)

	This repo includes a multi-stage Dockerfile to deploy both backend and the built frontend as a single Space.

	Steps:
	- Create a new Space → Type: Docker
	- Push this repository to the Space
	- In Space Settings:
	- Enable Persistent Storage
	- (Optional) Add Secrets/Env Vars as needed, e.g., `DATA_DIR=/data` (default already) and `FRONTEND_DIST=/app/frontend_dist`
	- The container exposes port `7860` by default.

	What the image does:
	- Builds the frontend (`frontend/`) and copies the `dist/` to `/app/frontend_dist`
	- Installs backend dependencies and runs `uvicorn app.main:app` from `backend/`
	- Serves:
	- API at `/api/...`
	- Uploaded images at `/images`
	- Synthetic outputs at `/static/synthetic`
	- Frontend SPA at `/` (served from `/app/frontend_dist`)


	1. Paper Textures: Realistic paper fiber patterns using Perlin noise
	2. Aging Effects: Edge darkening and aging patterns
	3. Physical Damage: Fold lines, creases, and ink bleeding
	4. Scanner Artifacts: Dust, compression artifacts, scanning lines
	5. Geometric Distortions: Perspective changes, cylindrical warping
	6. Lighting Effects: Shadows and lens distortions

	## Font Requirements

	The generator requires appropriate fonts for text rendering. Default configuration expects:
	- Font directory: `/content/static/`
	- Font file: `NotoSansOriya_ExtraCondensed-Regular.ttf`

	You can specify custom fonts using `--font-dir` and `--font` parameters.

	## Performance Tips

	- Use `--max-samples` to limit processing for large datasets
	- Disable advanced effects with `--no-advanced-effects` for faster generation
	- Use multiprocessing with `--use-multiprocessing` for batch jobs
	- Adjust image dimensions to balance quality and speed

	## Error Handling

	The package includes comprehensive error handling:
	- Graceful fallbacks for missing dependencies
	- Detailed logging for debugging
	- Validation of input parameters
	- Safe handling of malformed datasets

	## Contributing

	The modular structure makes it easy to extend:
	- Add new effects in `effects.py`
	- Implement new background styles in `backgrounds.py`
	- Create custom transformations in `transformations.py`
	- Extend dataset processing in `huggingface_processor.py`

	## License

	[Add your license information here]

	---

	Note: This is a complete rewrite of the original monolithic code into a modular, extensible package with added HuggingFace dataset processing capabilities.