Spaces:
Running
A newer version of the Gradio SDK is available: 6.15.1
title: GraphoLab — AI for Forensic Graphology
emoji: 🔬
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.25.0
app_file: app/grapholab_demo.py
pinned: false
license: apache-2.0
short_description: AI-powered forensic graphology platform
GraphoLab — Forensic Graphology Laboratory
An AI-powered platform for forensic graphology: scientific examination of handwriting and signatures for legal purposes.
GraphoLab ships in two forms:
| Mode | Description |
|---|---|
| Professional app | FastAPI backend + React frontend — multi-user, JWT auth, PostgreSQL, MinIO, audit log |
| Gradio demo | Single-user interactive demo, runs locally or on Hugging Face Spaces |
AI Capabilities
| Engine | Technique | Model |
|---|---|---|
| Handwritten OCR (HTR) | Transformer OCR | microsoft/trocr-base-handwritten + EasyOCR |
| Signature Verification | Siamese Network | SigNet (luizgh/sigver) |
| Signature Detection | Object Detection | Conditional DETR (tech4humans, Apache 2.0) |
| Named Entity Recognition | Token Classification | Babelscape/wikineural-multilingual-ner |
| Writer Identification | HOG + SVM | scikit-learn |
| Graphological Analysis | Image Processing | OpenCV + scikit-image |
| Document Dating | OCR + dateparser | EasyOCR + multilingual date parsing |
| Full Forensic Pipeline | All engines in sequence | Ollama LLM synthesis |
| RAG / AI Consultant | Retrieval-Augmented Generation | Ollama + nomic-embed-text |
| ENFSI Compliance Checker | LLM structured analysis | Ollama (qwen3:8b recommended) |
| Agente Documentale | LangChain ReAct agent + tools | Ollama + LangChain + PaddleOCR |
Professional App (FastAPI + React)
Features
- Case management: create, manage and archive forensic cases
- Document storage: MinIO S3-compatible storage (on-premise)
- AI analysis: all 8 engines via REST API with streaming SSE
- PDF report generation: forensic report with images and formatted tables
- RAG chatbot: upload PDF/DOCX to build a knowledge base, query with local LLM
- ENFSI Compliance Checker: upload a perizia PDF → LLM analysis against 20 ENFSI BPM-FHX-01 Ed.03 requirements → structured report with ✅/⚠️/❌ verdicts, suggestions, PDF export
- Agente Documentale: LangChain ReAct agent with document tools (OCR, NER, graphology, dating) — upload files, run tool-augmented multi-turn conversations, stop mid-stream
- OCR model selector: switch between EasyOCR, TrOCR, PaddleOCR, VLM from the sidebar at runtime
- Immutable audit log: append-only forensic chain of custody
- JWT authentication: login, refresh, password reset
- Role-based access: admin, examiner, viewer
- Multilingual UI: Italian / English (react-i18next)
Ollama Model Configuration
GraphoLab separates LLM usage into three independent slots, each configurable from the sidebar at runtime:
| Slot | Default | Used for | GPU impact |
|---|---|---|---|
| LLM Model | qwen3:4b |
Agent reasoning, RAG chat, ENFSI compliance, pipeline synthesis | low (~2.5 GB VRAM) |
| VLM Model | qwen3-vl:8b |
OCR=vlm transcription, table/figure analysis | high (~5 GB VRAM) |
| OCR Model | easyocr |
Handwritten text transcription | none (CPU) |
OCR model options: easyocr (CPU, default) · trocr (CPU/GPU, no Ollama) · paddleocr (CPU, no Ollama) · vlm (delegates to VLM Model above).
Example — "transcribe + NER + dating" with OCR=easyocr: EasyOCR (CPU) → spaCy NER (CPU) → dateparser (CPU) → qwen3:4b reasoning. The VLM is never loaded.
# Pull the two recommended models
ollama pull qwen3:4b # text/reasoning (~2.5 GB)
ollama pull qwen3-vl:8b # vision/image analysis (~5 GB)
ollama pull nomic-embed-text # RAG embeddings
Quick Start (Docker)
# Copy and edit environment variables
cp .env.example .env # set SECRET_KEY at minimum
# Start all services (PostgreSQL, MinIO, backend, frontend, Ollama)
docker compose up
# Services:
# Frontend → http://localhost:3000
# Backend → http://localhost:8000/docs (OpenAPI)
# MinIO → http://localhost:9001 (admin console)
# Ollama → http://localhost:11434
# Pull models
ollama pull qwen3:4b
ollama pull qwen3-vl:8b
ollama pull nomic-embed-text
Quick Start (local development)
# Backend
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt -r requirements-backend.txt
uvicorn backend.main:app --reload
# Frontend
cd frontend
npm install
npm run dev # http://localhost:5173
Architecture
grapholab/
├── core/ # Shared AI logic (no web framework dependency)
│ ├── ocr.py # TrOCR + EasyOCR
│ ├── signature.py # SigNet + Conditional DETR
│ ├── graphology.py # HOG, LBP, graphological features
│ ├── ner.py # Named entity recognition
│ ├── writer.py # Writer identification
│ ├── dating.py # Document dating
│ ├── pipeline.py # Full forensic pipeline
│ ├── rag.py # RAG + Ollama integration
│ ├── compliance.py # ENFSI compliance checker
│ └── agent.py # LangChain ReAct agent + document tools
├── backend/ # FastAPI professional app
│ ├── routers/ # auth, users, projects, analysis, rag, compliance, agent, audit
│ ├── models/ # SQLAlchemy models
│ └── storage/ # MinIO client
├── frontend/ # React + Tailwind CSS + shadcn/ui SPA
│ └── src/
│ ├── pages/ # ProjectsPage, ProjectPage, RagPage, CompliancePage, AgentPage, AdminPage
│ └── components/
├── app/
│ └── grapholab_demo.py # Gradio demo (preserved, imports from core/)
├── notebooks/ # Jupyter labs (01–08)
├── docker-compose.yml
├── requirements.txt # Core + Gradio dependencies
└── requirements-backend.txt # FastAPI + PostgreSQL + MinIO dependencies
API
The FastAPI backend auto-generates OpenAPI docs at http://localhost:8000/docs.
Main endpoint groups:
| Prefix | Description |
|---|---|
/auth |
Login, refresh, password reset |
/users |
User management |
/projects |
Case CRUD + document upload |
/analysis |
Run AI engines, download PDF report |
/rag |
RAG chatbot, document indexing, model selection |
/compliance |
ENFSI compliance check (SSE stream + PDF export) |
/agent |
LangChain document agent (SSE stream, file attachment, stop) |
/audit |
Immutable activity log |
Gradio Demo
Interactive single-user demo, also available on Hugging Face Spaces.
Run locally
pip install -r requirements.txt
python app/grapholab_demo.py
# Open http://localhost:7860
Tabs
| Tab | Name | Description |
|---|---|---|
| 1 | OCR Manoscritto | Handwritten text transcription |
| 2 | Verifica Firma | Signature verification (SigNet) |
| 3 | Rilevamento Firma | Signature detection (Conditional DETR) |
| 4 | Riconoscimento Entità | Named entity recognition |
| 5 | Identificazione Scrittore | Writer identification (HOG + SVM) |
| 6 | Analisi Grafologica | Graphological feature analysis |
| 7 | Perizia Forense Automatica | Full forensic pipeline + LLM synthesis |
| 8 | Datazione Documenti | Document dating |
| 9 | Consulente Forense IA | RAG chatbot (local Ollama LLM) |
Jupyter Notebooks
| Lab | Notebook | AI Technique |
|---|---|---|
| 01 | Introduction | Conceptual overview |
| 02 | Handwritten OCR | TrOCR |
| 03 | Signature Verification | SigNet (Siamese network) |
| 04 | Signature Detection | Conditional DETR |
| 05 | Writer Identification | HOG + SVM |
| 06 | Graphological Analysis | OpenCV |
| 07 | Named Entity Recognition | Token classification |
| 08 | dots.ocr VLM | Vision-Language Model (1.7B) |
jupyter lab notebooks/
Requirements
- Python 3.11–3.13
- NVIDIA GPU recommended (CUDA 12.x) — CPU fallback available
- Ollama for LLM features (pipeline synthesis, RAG, compliance checker)
- Recommended model:
qwen3:8b(fits in 8GB VRAM, RTX 4070 Laptop GPU)
- Recommended model:
- Docker + nvidia-container-toolkit for containerized GPU inference
- LangChain +
langchain-ollamafor the Document Agent
Key Models & Resources
| Use case | Resource |
|---|---|
| Handwritten OCR | microsoft/trocr-base-handwritten |
| Signature Detection | tech4humans/conditional-detr-50-signature-detector (Apache 2.0) |
| Signature Verification | luizgh/sigver |
| NER | Babelscape/wikineural-multilingual-ner |
| Embeddings (RAG) | nomic-embed-text via Ollama |
| LLM inference | Ollama — local, no data sent online |
| ENFSI standard | BPM-FHX-01 Ed.03 — Best Practice Manual for Forensic Examination of Handwriting |
| Document OCR (agent) | PaddleOCR — layout + text detection |
License
Apache License 2.0 — see LICENSE.