--- title: GraphoLab β€” AI for Forensic Graphology emoji: πŸ”¬ colorFrom: blue colorTo: indigo sdk: gradio sdk_version: "5.25.0" app_file: app/grapholab_demo.py pinned: false license: apache-2.0 short_description: AI-powered forensic graphology platform --- # GraphoLab β€” Forensic Graphology Laboratory An AI-powered platform for **forensic graphology**: scientific examination of handwriting and signatures for legal purposes. GraphoLab ships in two forms: | Mode | Description | |------|-------------| | **Professional app** | FastAPI backend + React frontend β€” multi-user, JWT auth, PostgreSQL, MinIO, audit log | | **Gradio demo** | Single-user interactive demo, runs locally or on Hugging Face Spaces | --- ## AI Capabilities | Engine | Technique | Model | |--------|-----------|-------| | Handwritten OCR (HTR) | Transformer OCR | `microsoft/trocr-base-handwritten` + EasyOCR | | Signature Verification | Siamese Network | SigNet (luizgh/sigver) | | Signature Detection | Object Detection | Conditional DETR (tech4humans, Apache 2.0) | | Named Entity Recognition | Token Classification | `Babelscape/wikineural-multilingual-ner` | | Writer Identification | HOG + SVM | scikit-learn | | Graphological Analysis | Image Processing | OpenCV + scikit-image | | Document Dating | OCR + dateparser | EasyOCR + multilingual date parsing | | Full Forensic Pipeline | All engines in sequence | Ollama LLM synthesis | | RAG / AI Consultant | Retrieval-Augmented Generation | Ollama + nomic-embed-text | | **ENFSI Compliance Checker** | LLM structured analysis | Ollama (qwen3:8b recommended) | | **Agente Documentale** | LangChain ReAct agent + tools | Ollama + LangChain + PaddleOCR | --- ## Professional App (FastAPI + React) ### Features - **Case management**: create, manage and archive forensic cases - **Document storage**: MinIO S3-compatible storage (on-premise) - **AI analysis**: all 8 engines via REST API with streaming SSE - **PDF report generation**: forensic report with images and formatted tables - **RAG chatbot**: upload PDF/DOCX to build a knowledge base, query with local LLM - **ENFSI Compliance Checker**: upload a perizia PDF β†’ LLM analysis against 20 ENFSI BPM-FHX-01 Ed.03 requirements β†’ structured report with βœ…/⚠️/❌ verdicts, suggestions, PDF export - **Agente Documentale**: LangChain ReAct agent with document tools (OCR, NER, graphology, dating) β€” upload files, run tool-augmented multi-turn conversations, stop mid-stream - **OCR model selector**: switch between EasyOCR, TrOCR, PaddleOCR, VLM from the sidebar at runtime - **Immutable audit log**: append-only forensic chain of custody - **JWT authentication**: login, refresh, password reset - **Role-based access**: admin, examiner, viewer - **Multilingual UI**: Italian / English (react-i18next) ### Ollama Model Configuration GraphoLab separates LLM usage into three independent slots, each configurable from the sidebar at runtime: | Slot | Default | Used for | GPU impact | | ---- | ------- | -------- | --------- | | **LLM Model** | `qwen3:4b` | Agent reasoning, RAG chat, ENFSI compliance, pipeline synthesis | low (~2.5 GB VRAM) | | **VLM Model** | `qwen3-vl:8b` | OCR=vlm transcription, table/figure analysis | high (~5 GB VRAM) | | **OCR Model** | `easyocr` | Handwritten text transcription | none (CPU) | OCR model options: `easyocr` (CPU, default) Β· `trocr` (CPU/GPU, no Ollama) Β· `paddleocr` (CPU, no Ollama) Β· `vlm` (delegates to VLM Model above). **Example β€” "transcribe + NER + dating" with OCR=easyocr:** EasyOCR (CPU) β†’ spaCy NER (CPU) β†’ dateparser (CPU) β†’ qwen3:4b reasoning. The VLM is never loaded. ```bash # Pull the two recommended models ollama pull qwen3:4b # text/reasoning (~2.5 GB) ollama pull qwen3-vl:8b # vision/image analysis (~5 GB) ollama pull nomic-embed-text # RAG embeddings ``` ### Quick Start (Docker) ```bash # Copy and edit environment variables cp .env.example .env # set SECRET_KEY at minimum # Start all services (PostgreSQL, MinIO, backend, frontend, Ollama) docker compose up # Services: # Frontend β†’ http://localhost:3000 # Backend β†’ http://localhost:8000/docs (OpenAPI) # MinIO β†’ http://localhost:9001 (admin console) # Ollama β†’ http://localhost:11434 # Pull models ollama pull qwen3:4b ollama pull qwen3-vl:8b ollama pull nomic-embed-text ``` ### Quick Start (local development) ```bash # Backend pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 pip install -r requirements.txt -r requirements-backend.txt uvicorn backend.main:app --reload # Frontend cd frontend npm install npm run dev # http://localhost:5173 ``` ### Architecture ``` grapholab/ β”œβ”€β”€ core/ # Shared AI logic (no web framework dependency) β”‚ β”œβ”€β”€ ocr.py # TrOCR + EasyOCR β”‚ β”œβ”€β”€ signature.py # SigNet + Conditional DETR β”‚ β”œβ”€β”€ graphology.py # HOG, LBP, graphological features β”‚ β”œβ”€β”€ ner.py # Named entity recognition β”‚ β”œβ”€β”€ writer.py # Writer identification β”‚ β”œβ”€β”€ dating.py # Document dating β”‚ β”œβ”€β”€ pipeline.py # Full forensic pipeline β”‚ β”œβ”€β”€ rag.py # RAG + Ollama integration β”‚ β”œβ”€β”€ compliance.py # ENFSI compliance checker β”‚ └── agent.py # LangChain ReAct agent + document tools β”œβ”€β”€ backend/ # FastAPI professional app β”‚ β”œβ”€β”€ routers/ # auth, users, projects, analysis, rag, compliance, agent, audit β”‚ β”œβ”€β”€ models/ # SQLAlchemy models β”‚ └── storage/ # MinIO client β”œβ”€β”€ frontend/ # React + Tailwind CSS + shadcn/ui SPA β”‚ └── src/ β”‚ β”œβ”€β”€ pages/ # ProjectsPage, ProjectPage, RagPage, CompliancePage, AgentPage, AdminPage β”‚ └── components/ β”œβ”€β”€ app/ β”‚ └── grapholab_demo.py # Gradio demo (preserved, imports from core/) β”œβ”€β”€ notebooks/ # Jupyter labs (01–08) β”œβ”€β”€ docker-compose.yml β”œβ”€β”€ requirements.txt # Core + Gradio dependencies └── requirements-backend.txt # FastAPI + PostgreSQL + MinIO dependencies ``` ### API The FastAPI backend auto-generates OpenAPI docs at `http://localhost:8000/docs`. Main endpoint groups: | Prefix | Description | |--------|-------------| | `/auth` | Login, refresh, password reset | | `/users` | User management | | `/projects` | Case CRUD + document upload | | `/analysis` | Run AI engines, download PDF report | | `/rag` | RAG chatbot, document indexing, model selection | | `/compliance` | ENFSI compliance check (SSE stream + PDF export) | | `/agent` | LangChain document agent (SSE stream, file attachment, stop) | | `/audit` | Immutable activity log | --- ## Gradio Demo Interactive single-user demo, also available on [Hugging Face Spaces](https://huggingface.co/spaces/fabioantonini/grapholab). ### Run locally ```bash pip install -r requirements.txt python app/grapholab_demo.py # Open http://localhost:7860 ``` ### Tabs | Tab | Name | Description | |-----|------|-------------| | 1 | OCR Manoscritto | Handwritten text transcription | | 2 | Verifica Firma | Signature verification (SigNet) | | 3 | Rilevamento Firma | Signature detection (Conditional DETR) | | 4 | Riconoscimento EntitΓ  | Named entity recognition | | 5 | Identificazione Scrittore | Writer identification (HOG + SVM) | | 6 | Analisi Grafologica | Graphological feature analysis | | 7 | Perizia Forense Automatica | Full forensic pipeline + LLM synthesis | | 8 | Datazione Documenti | Document dating | | 9 | Consulente Forense IA | RAG chatbot (local Ollama LLM) | --- ## Jupyter Notebooks | Lab | Notebook | AI Technique | |-----|----------|-------------| | 01 | Introduction | Conceptual overview | | 02 | Handwritten OCR | TrOCR | | 03 | Signature Verification | SigNet (Siamese network) | | 04 | Signature Detection | Conditional DETR | | 05 | Writer Identification | HOG + SVM | | 06 | Graphological Analysis | OpenCV | | 07 | Named Entity Recognition | Token classification | | 08 | dots.ocr VLM | Vision-Language Model (1.7B) | ```bash jupyter lab notebooks/ ``` --- ## Requirements - Python 3.11–3.13 - NVIDIA GPU recommended (CUDA 12.x) β€” CPU fallback available - [Ollama](https://ollama.com) for LLM features (pipeline synthesis, RAG, compliance checker) - Recommended model: `qwen3:8b` (fits in 8GB VRAM, RTX 4070 Laptop GPU) - Docker + nvidia-container-toolkit for containerized GPU inference - [LangChain](https://python.langchain.com) + `langchain-ollama` for the Document Agent --- ## Key Models & Resources | Use case | Resource | |----------|----------| | Handwritten OCR | [microsoft/trocr-base-handwritten](https://huggingface.co/microsoft/trocr-base-handwritten) | | Signature Detection | [tech4humans/conditional-detr-50-signature-detector](https://huggingface.co/tech4humans/conditional-detr-50-signature-detector) (Apache 2.0) | | Signature Verification | [luizgh/sigver](https://github.com/luizgh/sigver) | | NER | [Babelscape/wikineural-multilingual-ner](https://huggingface.co/Babelscape/wikineural-multilingual-ner) | | Embeddings (RAG) | [nomic-embed-text](https://ollama.com/library/nomic-embed-text) via Ollama | | LLM inference | [Ollama](https://ollama.com) β€” local, no data sent online | | ENFSI standard | BPM-FHX-01 Ed.03 β€” Best Practice Manual for Forensic Examination of Handwriting | | Document OCR (agent) | [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) β€” layout + text detection | --- ## License Apache License 2.0 β€” see [LICENSE](LICENSE).