Spaces:

fabioantonini
/

grapholab

Running

App Files Files Community

grapholab / README.md

Fabio Antonini

fix: shorten HF Spaces short_description to <=60 chars

c766f4d about 2 months ago

preview code

raw

history blame contribute delete

9.71 kB

A newer version of the Gradio SDK is available: 6.15.1

Upgrade

metadata

title: GraphoLab — AI for Forensic Graphology
emoji: 🔬
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.25.0
app_file: app/grapholab_demo.py
pinned: false
license: apache-2.0
short_description: AI-powered forensic graphology platform

GraphoLab — Forensic Graphology Laboratory

An AI-powered platform for forensic graphology: scientific examination of handwriting and signatures for legal purposes.

GraphoLab ships in two forms:

Mode	Description
Professional app	FastAPI backend + React frontend — multi-user, JWT auth, PostgreSQL, MinIO, audit log
Gradio demo	Single-user interactive demo, runs locally or on Hugging Face Spaces

AI Capabilities

Engine	Technique	Model
Handwritten OCR (HTR)	Transformer OCR	`microsoft/trocr-base-handwritten` + EasyOCR
Signature Verification	Siamese Network	SigNet (luizgh/sigver)
Signature Detection	Object Detection	Conditional DETR (tech4humans, Apache 2.0)
Named Entity Recognition	Token Classification	`Babelscape/wikineural-multilingual-ner`
Writer Identification	HOG + SVM	scikit-learn
Graphological Analysis	Image Processing	OpenCV + scikit-image
Document Dating	OCR + dateparser	EasyOCR + multilingual date parsing
Full Forensic Pipeline	All engines in sequence	Ollama LLM synthesis
RAG / AI Consultant	Retrieval-Augmented Generation	Ollama + nomic-embed-text
ENFSI Compliance Checker	LLM structured analysis	Ollama (qwen3:8b recommended)
Agente Documentale	LangChain ReAct agent + tools	Ollama + LangChain + PaddleOCR

Professional App (FastAPI + React)

Features

Case management: create, manage and archive forensic cases
Document storage: MinIO S3-compatible storage (on-premise)
AI analysis: all 8 engines via REST API with streaming SSE
PDF report generation: forensic report with images and formatted tables
RAG chatbot: upload PDF/DOCX to build a knowledge base, query with local LLM
ENFSI Compliance Checker: upload a perizia PDF → LLM analysis against 20 ENFSI BPM-FHX-01 Ed.03 requirements → structured report with ✅/⚠️/❌ verdicts, suggestions, PDF export
Agente Documentale: LangChain ReAct agent with document tools (OCR, NER, graphology, dating) — upload files, run tool-augmented multi-turn conversations, stop mid-stream
OCR model selector: switch between EasyOCR, TrOCR, PaddleOCR, VLM from the sidebar at runtime
Immutable audit log: append-only forensic chain of custody
JWT authentication: login, refresh, password reset
Role-based access: admin, examiner, viewer
Multilingual UI: Italian / English (react-i18next)

Ollama Model Configuration

GraphoLab separates LLM usage into three independent slots, each configurable from the sidebar at runtime:

Slot	Default	Used for	GPU impact
LLM Model	`qwen3:4b`	Agent reasoning, RAG chat, ENFSI compliance, pipeline synthesis	low (~2.5 GB VRAM)
VLM Model	`qwen3-vl:8b`	OCR=vlm transcription, table/figure analysis	high (~5 GB VRAM)
OCR Model	`easyocr`	Handwritten text transcription	none (CPU)

OCR model options: easyocr (CPU, default) · trocr (CPU/GPU, no Ollama) · paddleocr (CPU, no Ollama) · vlm (delegates to VLM Model above).

Example — "transcribe + NER + dating" with OCR=easyocr: EasyOCR (CPU) → spaCy NER (CPU) → dateparser (CPU) → qwen3:4b reasoning. The VLM is never loaded.

# Pull the two recommended models
ollama pull qwen3:4b       # text/reasoning (~2.5 GB)
ollama pull qwen3-vl:8b    # vision/image analysis (~5 GB)
ollama pull nomic-embed-text  # RAG embeddings

Quick Start (Docker)

# Copy and edit environment variables
cp .env.example .env   # set SECRET_KEY at minimum

# Start all services (PostgreSQL, MinIO, backend, frontend, Ollama)
docker compose up

# Services:
#   Frontend  → http://localhost:3000
#   Backend   → http://localhost:8000/docs  (OpenAPI)
#   MinIO     → http://localhost:9001       (admin console)
#   Ollama    → http://localhost:11434

# Pull models
ollama pull qwen3:4b
ollama pull qwen3-vl:8b
ollama pull nomic-embed-text

Quick Start (local development)

# Backend
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt -r requirements-backend.txt
uvicorn backend.main:app --reload

# Frontend
cd frontend
npm install
npm run dev   # http://localhost:5173

Architecture

grapholab/
├── core/                    # Shared AI logic (no web framework dependency)
│   ├── ocr.py               # TrOCR + EasyOCR
│   ├── signature.py         # SigNet + Conditional DETR
│   ├── graphology.py        # HOG, LBP, graphological features
│   ├── ner.py               # Named entity recognition
│   ├── writer.py            # Writer identification
│   ├── dating.py            # Document dating
│   ├── pipeline.py          # Full forensic pipeline
│   ├── rag.py               # RAG + Ollama integration
│   ├── compliance.py        # ENFSI compliance checker
│   └── agent.py             # LangChain ReAct agent + document tools
├── backend/                 # FastAPI professional app
│   ├── routers/             # auth, users, projects, analysis, rag, compliance, agent, audit
│   ├── models/              # SQLAlchemy models
│   └── storage/             # MinIO client
├── frontend/                # React + Tailwind CSS + shadcn/ui SPA
│   └── src/
│       ├── pages/           # ProjectsPage, ProjectPage, RagPage, CompliancePage, AgentPage, AdminPage
│       └── components/
├── app/
│   └── grapholab_demo.py    # Gradio demo (preserved, imports from core/)
├── notebooks/               # Jupyter labs (01–08)
├── docker-compose.yml
├── requirements.txt         # Core + Gradio dependencies
└── requirements-backend.txt # FastAPI + PostgreSQL + MinIO dependencies

API

The FastAPI backend auto-generates OpenAPI docs at http://localhost:8000/docs.

Main endpoint groups:

Prefix	Description
`/auth`	Login, refresh, password reset
`/users`	User management
`/projects`	Case CRUD + document upload
`/analysis`	Run AI engines, download PDF report
`/rag`	RAG chatbot, document indexing, model selection
`/compliance`	ENFSI compliance check (SSE stream + PDF export)
`/agent`	LangChain document agent (SSE stream, file attachment, stop)
`/audit`	Immutable activity log

Gradio Demo

Interactive single-user demo, also available on Hugging Face Spaces.

Run locally

pip install -r requirements.txt
python app/grapholab_demo.py
# Open http://localhost:7860

Tabs

Tab	Name	Description
1	OCR Manoscritto	Handwritten text transcription
2	Verifica Firma	Signature verification (SigNet)
3	Rilevamento Firma	Signature detection (Conditional DETR)
4	Riconoscimento Entità	Named entity recognition
5	Identificazione Scrittore	Writer identification (HOG + SVM)
6	Analisi Grafologica	Graphological feature analysis
7	Perizia Forense Automatica	Full forensic pipeline + LLM synthesis
8	Datazione Documenti	Document dating
9	Consulente Forense IA	RAG chatbot (local Ollama LLM)

Jupyter Notebooks

Lab	Notebook	AI Technique
01	Introduction	Conceptual overview
02	Handwritten OCR	TrOCR
03	Signature Verification	SigNet (Siamese network)
04	Signature Detection	Conditional DETR
05	Writer Identification	HOG + SVM
06	Graphological Analysis	OpenCV
07	Named Entity Recognition	Token classification
08	dots.ocr VLM	Vision-Language Model (1.7B)

jupyter lab notebooks/

Requirements

Python 3.11–3.13
NVIDIA GPU recommended (CUDA 12.x) — CPU fallback available
Ollama for LLM features (pipeline synthesis, RAG, compliance checker)
- Recommended model: qwen3:8b (fits in 8GB VRAM, RTX 4070 Laptop GPU)
Docker + nvidia-container-toolkit for containerized GPU inference
LangChain + langchain-ollama for the Document Agent

Key Models & Resources

Use case	Resource
Handwritten OCR	microsoft/trocr-base-handwritten
Signature Detection	tech4humans/conditional-detr-50-signature-detector (Apache 2.0)
Signature Verification	luizgh/sigver
NER	Babelscape/wikineural-multilingual-ner
Embeddings (RAG)	nomic-embed-text via Ollama
LLM inference	Ollama — local, no data sent online
ENFSI standard	BPM-FHX-01 Ed.03 — Best Practice Manual for Forensic Examination of Handwriting
Document OCR (agent)	PaddleOCR — layout + text detection

License

Apache License 2.0 — see LICENSE.