Spaces:
Running
Running
File size: 7,532 Bytes
632bab6 e3ef569 632bab6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | # GraphoLab: When Artificial Intelligence Meets Forensic Graphology
<!--
HOW TO PUBLISH ON LINKEDIN:
1. Open https://markdowntolinkedin.com
2. Paste the content of this file (from the title onwards)
3. Copy the formatted text
4. On LinkedIn → "Write an article" → paste the text
5. Upload docs/linkedin_cover.png as the cover image (1920×1080)
6. Use the H1 title as the LinkedIn article title
-->
---
Picture this: a law firm, a will sitting on the desk, and a forensic graphologist holding a magnifying glass. The question is simple, but the answer can be worth millions — or change the outcome of a criminal trial: **is this signature genuine?**
For decades, answering that question has required years of expertise, hours of painstaking manual work and, inevitably, a degree of subjectivity. Today, artificial intelligence offers new tools to support, accelerate and make this process more rigorous.
I built **GraphoLab** — an open-source collection of eight demonstration labs showing how machine learning and computer vision can be applied to forensic graphology.
---
## The Problem: Limits of Traditional Analysis
Forensic graphology is a serious, well-established discipline. But like any manual process, it has structural limitations:
- **Subjectivity**: two expert examiners can reach different conclusions on the same document
- **Scalability**: finding a specific signature among ten thousand scanned pages takes weeks
- **Reproducibility**: qualitative observations ("the pen pressure seems different") are hard to verify independently
- **Speed**: in complex legal proceedings, the timeline of manual analysis can become a bottleneck
AI does not eliminate these problems — but it significantly reduces them, bringing objective measurements alongside human expertise.
---
## The Solution: GraphoLab
GraphoLab is an open-source project that integrates seven AI technologies into a unified demonstration platform, accessible via browser thanks to a **Gradio** app. Each lab addresses a specific forensic graphology task.
### The Eight Labs
| Lab | Functionality | AI Technology |
|-----|--------------|---------------|
| 01 | Conceptual introduction | — |
| 02 | Handwritten Text Recognition (HTR) | TrOCR / EasyOCR |
| 03 | Signature authenticity verification | SigNet (Siamese Network) |
| 04 | Signature detection in documents | Conditional DETR fine-tuned |
| 05 | Writer identification | HOG + LBP + SVM |
| 06 | Graphological feature analysis | OpenCV + signal processing |
| 07 | Named Entity Recognition (NER) | Multilingual BERT-NER |
| 08 | Advanced OCR for Italian cursive | dots.ocr (1.7B VLM) |
---
## Deep Dive 1: Verifying a Signature Without Knowing the Signer
One of the most frequent questions I get: *"Does SigNet only work with signatures already in its training database?"*
The answer is no — and the reason is architectural.
SigNet uses **metric learning** (a Siamese network with contrastive loss), not a traditional classifier. A classifier learns "who is person X" and cannot generalise to unseen identities. A metric model, by contrast, learns to answer a different question: *"did these two signatures come from the same hand?"*
This question is **identity-agnostic**. SigNet can therefore be applied to any pair of signatures — even from people never seen during training — producing a cosine similarity score that the examiner can use as quantitative evidence.
**Limitations exist and must be communicated clearly:**
- The training set (GPDS) contains primarily Brazilian/Portuguese signatures: styles far from that distribution may yield less reliable scores
- The decision threshold (0.35) was calibrated on CEDAR and may need adjustment in different contexts
- The system is a screening tool, not standalone proof: the final judgement always rests with the qualified examiner
---
## Deep Dive 2: Transcribing Italian Cursive Is an Open Problem
Not all OCR engines are equal — and in the Italian forensic context, the difference can be decisive.
**EasyOCR** (used in the interactive app) runs a CNN + BiLSTM + CTC pipeline: fast (1–3 seconds per image on CPU), but with limited linguistic context. It works well on printed text and regular cursive.
**TrOCR** (Lab 02) is a full Transformer: BEiT visual encoder + RoBERTa decoder. Global linguistic context (self-attention) makes it more accurate on complex cursive, at the cost of 10–20 seconds per image on CPU.
**dots.ocr** (Lab 08) is a 1.7-billion-parameter Vision-Language Model. The LLM component corrects visual ambiguities using sentence-level semantic context — delivering the best publicly available accuracy on Italian cursive, requiring ~7 GB RAM and 2–5 minutes per image on CPU.
The right tool depends on the context: EasyOCR for interactive demos, dots.ocr for forensic transcription of a holographic will.
---
## The Interactive Demo: Eight Tabs, All in a Browser
GraphoLab's Gradio app aggregates all capabilities in a single, accessible interface (no coding required — just Docker):
- **Handwritten OCR** — upload an image, get the transcribed text
- **Signature Verification** — upload two signatures, get a genuine/forged verdict with a confidence score
- **Signature Detection** — upload a multi-page document, automatically extract all signatures
- **Named Entity Recognition** — identify persons, locations and organisations in the text
- **Writer Identification** — attribute authorship of an anonymous handwriting sample
- **Graphological Analysis** — measure slant, spacing and stroke pressure
- **Forensic Pipeline** — full report in a single pass
- **Document Dating** — upload multiple documents, get them sorted chronologically by extracted date
---
## Ethics and Limits: AI Does Not Replace the Expert
This point deserves to be stated plainly.
GraphoLab is a support tool, not an oracle. AI models produce scores and measurements — not verdicts. The appropriate model is **human-AI collaboration**: AI handles the quantitative and labour-intensive aspects of analysis, while the expert focuses on interpretation, contextualisation and legal accountability.
In a courtroom, "the model says X" is not evidence. "The expert used this tool to corroborate their analysis, here is how" is a different matter entirely.
Transparency about model limitations — training datasets, decision thresholds, conditions of validity — is an integral part of responsible use of these tools in forensic practice.
---
## Try It Yourself
GraphoLab is fully open source under the Apache 2.0 licence.
**GitHub repository:** https://github.com/fabioantonini/GraphoLab
You will find:
- Eight runnable Jupyter notebooks (Python 3.11/3.12 + PyTorch)
- The Gradio app, launchable locally or via Docker
- Full documentation in Italian and English
- Scripts to generate synthetic test data
```bash
git clone https://github.com/fabioantonini/GraphoLab.git
cd GraphoLab
docker compose up gradio
# Open http://localhost:7860
```
---
I am curious to hear your thoughts — especially from those working in forensic or legal practice. AI in this domain is still largely unexplored, and I believe there are significant opportunities for professionals who want to bring quantitative rigour to expert witness work.
**Forensic examiners, lawyers, notaries, document specialists: what do you expect from tools like this? What is still missing?**
---
*Fabio Antonini — AI Engineer & Researcher*
*GitHub: https://github.com/fabioantonini/GraphoLab*
|