Spaces:

fabioantonini
/

grapholab

Running

App Files Files Community

grapholab / docs /linkedin_article_EN.md

Fabio Antonini

docs: update all YOLOv8 references to Conditional DETR

e3ef569 about 2 months ago

preview code

raw

history blame contribute delete

7.53 kB

	# GraphoLab: When Artificial Intelligence Meets Forensic Graphology

	<!--
	HOW TO PUBLISH ON LINKEDIN:
	1. Open https://markdowntolinkedin.com
	2. Paste the content of this file (from the title onwards)
	3. Copy the formatted text
	4. On LinkedIn → "Write an article" → paste the text
	5. Upload docs/linkedin_cover.png as the cover image (1920×1080)
	6. Use the H1 title as the LinkedIn article title
	-->

	---

	Picture this: a law firm, a will sitting on the desk, and a forensic graphologist holding a magnifying glass. The question is simple, but the answer can be worth millions — or change the outcome of a criminal trial: is this signature genuine?

	For decades, answering that question has required years of expertise, hours of painstaking manual work and, inevitably, a degree of subjectivity. Today, artificial intelligence offers new tools to support, accelerate and make this process more rigorous.

	I built GraphoLab — an open-source collection of eight demonstration labs showing how machine learning and computer vision can be applied to forensic graphology.

	---

	## The Problem: Limits of Traditional Analysis

	Forensic graphology is a serious, well-established discipline. But like any manual process, it has structural limitations:

	- Subjectivity: two expert examiners can reach different conclusions on the same document
	- Scalability: finding a specific signature among ten thousand scanned pages takes weeks
	- Reproducibility: qualitative observations ("the pen pressure seems different") are hard to verify independently
	- Speed: in complex legal proceedings, the timeline of manual analysis can become a bottleneck

	AI does not eliminate these problems — but it significantly reduces them, bringing objective measurements alongside human expertise.

	---

	## The Solution: GraphoLab

	GraphoLab is an open-source project that integrates seven AI technologies into a unified demonstration platform, accessible via browser thanks to a Gradio app. Each lab addresses a specific forensic graphology task.

	### The Eight Labs

	\| Lab \| Functionality \| AI Technology \|
	\|-----\|--------------\|---------------\|
	\| 01 \| Conceptual introduction \| — \|
	\| 02 \| Handwritten Text Recognition (HTR) \| TrOCR / EasyOCR \|
	\| 03 \| Signature authenticity verification \| SigNet (Siamese Network) \|
	\| 04 \| Signature detection in documents \| Conditional DETR fine-tuned \|
	\| 05 \| Writer identification \| HOG + LBP + SVM \|
	\| 06 \| Graphological feature analysis \| OpenCV + signal processing \|
	\| 07 \| Named Entity Recognition (NER) \| Multilingual BERT-NER \|
	\| 08 \| Advanced OCR for Italian cursive \| dots.ocr (1.7B VLM) \|

	---

	## Deep Dive 1: Verifying a Signature Without Knowing the Signer

	One of the most frequent questions I get: "Does SigNet only work with signatures already in its training database?"

	The answer is no — and the reason is architectural.

	SigNet uses metric learning (a Siamese network with contrastive loss), not a traditional classifier. A classifier learns "who is person X" and cannot generalise to unseen identities. A metric model, by contrast, learns to answer a different question: "did these two signatures come from the same hand?"

	This question is identity-agnostic. SigNet can therefore be applied to any pair of signatures — even from people never seen during training — producing a cosine similarity score that the examiner can use as quantitative evidence.

	Limitations exist and must be communicated clearly:
	- The training set (GPDS) contains primarily Brazilian/Portuguese signatures: styles far from that distribution may yield less reliable scores
	- The decision threshold (0.35) was calibrated on CEDAR and may need adjustment in different contexts
	- The system is a screening tool, not standalone proof: the final judgement always rests with the qualified examiner

	---

	## Deep Dive 2: Transcribing Italian Cursive Is an Open Problem

	Not all OCR engines are equal — and in the Italian forensic context, the difference can be decisive.

	EasyOCR (used in the interactive app) runs a CNN + BiLSTM + CTC pipeline: fast (1–3 seconds per image on CPU), but with limited linguistic context. It works well on printed text and regular cursive.

	TrOCR (Lab 02) is a full Transformer: BEiT visual encoder + RoBERTa decoder. Global linguistic context (self-attention) makes it more accurate on complex cursive, at the cost of 10–20 seconds per image on CPU.

	dots.ocr (Lab 08) is a 1.7-billion-parameter Vision-Language Model. The LLM component corrects visual ambiguities using sentence-level semantic context — delivering the best publicly available accuracy on Italian cursive, requiring ~7 GB RAM and 2–5 minutes per image on CPU.

	The right tool depends on the context: EasyOCR for interactive demos, dots.ocr for forensic transcription of a holographic will.

	---

	## The Interactive Demo: Eight Tabs, All in a Browser

	GraphoLab's Gradio app aggregates all capabilities in a single, accessible interface (no coding required — just Docker):

	- Handwritten OCR — upload an image, get the transcribed text
	- Signature Verification — upload two signatures, get a genuine/forged verdict with a confidence score
	- Signature Detection — upload a multi-page document, automatically extract all signatures
	- Named Entity Recognition — identify persons, locations and organisations in the text
	- Writer Identification — attribute authorship of an anonymous handwriting sample
	- Graphological Analysis — measure slant, spacing and stroke pressure
	- Forensic Pipeline — full report in a single pass
	- Document Dating — upload multiple documents, get them sorted chronologically by extracted date

	---

	## Ethics and Limits: AI Does Not Replace the Expert

	This point deserves to be stated plainly.

	GraphoLab is a support tool, not an oracle. AI models produce scores and measurements — not verdicts. The appropriate model is human-AI collaboration: AI handles the quantitative and labour-intensive aspects of analysis, while the expert focuses on interpretation, contextualisation and legal accountability.

	In a courtroom, "the model says X" is not evidence. "The expert used this tool to corroborate their analysis, here is how" is a different matter entirely.

	Transparency about model limitations — training datasets, decision thresholds, conditions of validity — is an integral part of responsible use of these tools in forensic practice.

	---

	## Try It Yourself

	GraphoLab is fully open source under the Apache 2.0 licence.

	GitHub repository: https://github.com/fabioantonini/GraphoLab

	You will find:
	- Eight runnable Jupyter notebooks (Python 3.11/3.12 + PyTorch)
	- The Gradio app, launchable locally or via Docker
	- Full documentation in Italian and English
	- Scripts to generate synthetic test data

	```bash
	git clone https://github.com/fabioantonini/GraphoLab.git
	cd GraphoLab
	docker compose up gradio
	# Open http://localhost:7860
	```

	---

	I am curious to hear your thoughts — especially from those working in forensic or legal practice. AI in this domain is still largely unexplored, and I believe there are significant opportunities for professionals who want to bring quantitative rigour to expert witness work.

	Forensic examiners, lawyers, notaries, document specialists: what do you expect from tools like this? What is still missing?

	---

	Fabio Antonini — AI Engineer & Researcher
	GitHub: https://github.com/fabioantonini/GraphoLab