grapholab / README.md
Fabio Antonini
fix: shorten HF Spaces short_description to <=60 chars
c766f4d

A newer version of the Gradio SDK is available: 6.15.1

Upgrade
metadata
title: GraphoLab  AI for Forensic Graphology
emoji: 🔬
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.25.0
app_file: app/grapholab_demo.py
pinned: false
license: apache-2.0
short_description: AI-powered forensic graphology platform

GraphoLab — Forensic Graphology Laboratory

An AI-powered platform for forensic graphology: scientific examination of handwriting and signatures for legal purposes.

GraphoLab ships in two forms:

Mode Description
Professional app FastAPI backend + React frontend — multi-user, JWT auth, PostgreSQL, MinIO, audit log
Gradio demo Single-user interactive demo, runs locally or on Hugging Face Spaces

AI Capabilities

Engine Technique Model
Handwritten OCR (HTR) Transformer OCR microsoft/trocr-base-handwritten + EasyOCR
Signature Verification Siamese Network SigNet (luizgh/sigver)
Signature Detection Object Detection Conditional DETR (tech4humans, Apache 2.0)
Named Entity Recognition Token Classification Babelscape/wikineural-multilingual-ner
Writer Identification HOG + SVM scikit-learn
Graphological Analysis Image Processing OpenCV + scikit-image
Document Dating OCR + dateparser EasyOCR + multilingual date parsing
Full Forensic Pipeline All engines in sequence Ollama LLM synthesis
RAG / AI Consultant Retrieval-Augmented Generation Ollama + nomic-embed-text
ENFSI Compliance Checker LLM structured analysis Ollama (qwen3:8b recommended)
Agente Documentale LangChain ReAct agent + tools Ollama + LangChain + PaddleOCR

Professional App (FastAPI + React)

Features

  • Case management: create, manage and archive forensic cases
  • Document storage: MinIO S3-compatible storage (on-premise)
  • AI analysis: all 8 engines via REST API with streaming SSE
  • PDF report generation: forensic report with images and formatted tables
  • RAG chatbot: upload PDF/DOCX to build a knowledge base, query with local LLM
  • ENFSI Compliance Checker: upload a perizia PDF → LLM analysis against 20 ENFSI BPM-FHX-01 Ed.03 requirements → structured report with ✅/⚠️/❌ verdicts, suggestions, PDF export
  • Agente Documentale: LangChain ReAct agent with document tools (OCR, NER, graphology, dating) — upload files, run tool-augmented multi-turn conversations, stop mid-stream
  • OCR model selector: switch between EasyOCR, TrOCR, PaddleOCR, VLM from the sidebar at runtime
  • Immutable audit log: append-only forensic chain of custody
  • JWT authentication: login, refresh, password reset
  • Role-based access: admin, examiner, viewer
  • Multilingual UI: Italian / English (react-i18next)

Ollama Model Configuration

GraphoLab separates LLM usage into three independent slots, each configurable from the sidebar at runtime:

Slot Default Used for GPU impact
LLM Model qwen3:4b Agent reasoning, RAG chat, ENFSI compliance, pipeline synthesis low (~2.5 GB VRAM)
VLM Model qwen3-vl:8b OCR=vlm transcription, table/figure analysis high (~5 GB VRAM)
OCR Model easyocr Handwritten text transcription none (CPU)

OCR model options: easyocr (CPU, default) · trocr (CPU/GPU, no Ollama) · paddleocr (CPU, no Ollama) · vlm (delegates to VLM Model above).

Example — "transcribe + NER + dating" with OCR=easyocr: EasyOCR (CPU) → spaCy NER (CPU) → dateparser (CPU) → qwen3:4b reasoning. The VLM is never loaded.

# Pull the two recommended models
ollama pull qwen3:4b       # text/reasoning (~2.5 GB)
ollama pull qwen3-vl:8b    # vision/image analysis (~5 GB)
ollama pull nomic-embed-text  # RAG embeddings

Quick Start (Docker)

# Copy and edit environment variables
cp .env.example .env   # set SECRET_KEY at minimum

# Start all services (PostgreSQL, MinIO, backend, frontend, Ollama)
docker compose up

# Services:
#   Frontend  → http://localhost:3000
#   Backend   → http://localhost:8000/docs  (OpenAPI)
#   MinIO     → http://localhost:9001       (admin console)
#   Ollama    → http://localhost:11434

# Pull models
ollama pull qwen3:4b
ollama pull qwen3-vl:8b
ollama pull nomic-embed-text

Quick Start (local development)

# Backend
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt -r requirements-backend.txt
uvicorn backend.main:app --reload

# Frontend
cd frontend
npm install
npm run dev   # http://localhost:5173

Architecture

grapholab/
├── core/                    # Shared AI logic (no web framework dependency)
│   ├── ocr.py               # TrOCR + EasyOCR
│   ├── signature.py         # SigNet + Conditional DETR
│   ├── graphology.py        # HOG, LBP, graphological features
│   ├── ner.py               # Named entity recognition
│   ├── writer.py            # Writer identification
│   ├── dating.py            # Document dating
│   ├── pipeline.py          # Full forensic pipeline
│   ├── rag.py               # RAG + Ollama integration
│   ├── compliance.py        # ENFSI compliance checker
│   └── agent.py             # LangChain ReAct agent + document tools
├── backend/                 # FastAPI professional app
│   ├── routers/             # auth, users, projects, analysis, rag, compliance, agent, audit
│   ├── models/              # SQLAlchemy models
│   └── storage/             # MinIO client
├── frontend/                # React + Tailwind CSS + shadcn/ui SPA
│   └── src/
│       ├── pages/           # ProjectsPage, ProjectPage, RagPage, CompliancePage, AgentPage, AdminPage
│       └── components/
├── app/
│   └── grapholab_demo.py    # Gradio demo (preserved, imports from core/)
├── notebooks/               # Jupyter labs (01–08)
├── docker-compose.yml
├── requirements.txt         # Core + Gradio dependencies
└── requirements-backend.txt # FastAPI + PostgreSQL + MinIO dependencies

API

The FastAPI backend auto-generates OpenAPI docs at http://localhost:8000/docs.

Main endpoint groups:

Prefix Description
/auth Login, refresh, password reset
/users User management
/projects Case CRUD + document upload
/analysis Run AI engines, download PDF report
/rag RAG chatbot, document indexing, model selection
/compliance ENFSI compliance check (SSE stream + PDF export)
/agent LangChain document agent (SSE stream, file attachment, stop)
/audit Immutable activity log

Gradio Demo

Interactive single-user demo, also available on Hugging Face Spaces.

Run locally

pip install -r requirements.txt
python app/grapholab_demo.py
# Open http://localhost:7860

Tabs

Tab Name Description
1 OCR Manoscritto Handwritten text transcription
2 Verifica Firma Signature verification (SigNet)
3 Rilevamento Firma Signature detection (Conditional DETR)
4 Riconoscimento Entità Named entity recognition
5 Identificazione Scrittore Writer identification (HOG + SVM)
6 Analisi Grafologica Graphological feature analysis
7 Perizia Forense Automatica Full forensic pipeline + LLM synthesis
8 Datazione Documenti Document dating
9 Consulente Forense IA RAG chatbot (local Ollama LLM)

Jupyter Notebooks

Lab Notebook AI Technique
01 Introduction Conceptual overview
02 Handwritten OCR TrOCR
03 Signature Verification SigNet (Siamese network)
04 Signature Detection Conditional DETR
05 Writer Identification HOG + SVM
06 Graphological Analysis OpenCV
07 Named Entity Recognition Token classification
08 dots.ocr VLM Vision-Language Model (1.7B)
jupyter lab notebooks/

Requirements

  • Python 3.11–3.13
  • NVIDIA GPU recommended (CUDA 12.x) — CPU fallback available
  • Ollama for LLM features (pipeline synthesis, RAG, compliance checker)
    • Recommended model: qwen3:8b (fits in 8GB VRAM, RTX 4070 Laptop GPU)
  • Docker + nvidia-container-toolkit for containerized GPU inference
  • LangChain + langchain-ollama for the Document Agent

Key Models & Resources

Use case Resource
Handwritten OCR microsoft/trocr-base-handwritten
Signature Detection tech4humans/conditional-detr-50-signature-detector (Apache 2.0)
Signature Verification luizgh/sigver
NER Babelscape/wikineural-multilingual-ner
Embeddings (RAG) nomic-embed-text via Ollama
LLM inference Ollama — local, no data sent online
ENFSI standard BPM-FHX-01 Ed.03 — Best Practice Manual for Forensic Examination of Handwriting
Document OCR (agent) PaddleOCR — layout + text detection

License

Apache License 2.0 — see LICENSE.