Spaces:

fabioantonini
/

grapholab

Running

App Files Files Community

Fabio Antonini commited on Mar 25

Commit

839d89c

1 Parent(s): 8210db5

Create 08_dots_ocr_vlm.ipynb

Browse files

Files changed (1) hide show

notebooks/08_dots_ocr_vlm.ipynb +597 -0

notebooks/08_dots_ocr_vlm.ipynb ADDED Viewed

	@@ -0,0 +1,597 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Lab 08 — dots.ocr: OCR con Vision-Language Model\n",
+    "\n",
+    "> **GraphoLab** | Forensic Graphology Laboratory\n",
+    "\n",
+    "**Modello:** `rednote-hilab/dots.ocr` (Hugging Face)  \n",
+    "**Task:** Trascrizione di testo manoscritto e stampato da immagini di documenti  \n",
+    "**Caso d'uso forense:** Testamenti, lettere anonime, documenti storici in italiano\n",
+    "\n",
+    "---\n",
+    "\n",
+    "## Come funziona dots.ocr\n",
+    "\n",
+    "dots.ocr è fondamentalmente diverso da EasyOCR e TrOCR. Invece di una pipeline CNN+CRNN,\n",
+    "usa un **Vision-Language Model (VLM)** da **1.7 miliardi di parametri**:\n",
+    "\n",
+    "```\n",
+    "EasyOCR / TrOCR:                    dots.ocr:\n",
+    "────────────────                    ─────────────────────────────\n",
+    "Immagine                            Immagine\n",
+    "   ↓                                   ↓\n",
+    "CRAFT (detector CNN)                Vision Encoder (ViT)\n",
+    "   ↓                                   ↓\n",
+    "CRNN (recognizer)                   Visual Tokens\n",
+    "   ↓                                   ↓\n",
+    "Testo                               LLM (1.7B params) ← comprende il contesto!\n",
+    "                                       ↓\n",
+    "                                    Testo\n",
+    "```\n",
+    "\n",
+    "Il vantaggio chiave: il **componente LLM usa il contesto linguistico** per correggere\n",
+    "ambiguità visive. Per l'italiano, questo significa meno errori su parole con accenti,\n",
+    "apostrofi e congiunzioni (es. `è`, `l'arte`, `nell'atto`).\n",
+    "\n",
+    "| Caratteristica | EasyOCR | TrOCR | **dots.ocr** |\n",
+    "|---|---|---|---|\n",
+    "| Architettura | CNN + CRNN | ViT + RoBERTa | ViT + LLM 1.7B |\n",
+    "| Comprensione layout | parziale | no | **si** (tabelle, formule) |\n",
+    "| Contesto linguistico | no | limitato (inglese) | **si** (100+ lingue) |\n",
+    "| Dimensione modello | ~100 MB | ~1.3 GB | ~3.5 GB (bf16) |\n",
+    "| Velocità su CPU | veloce | lenta | **molto lenta** |\n",
+    "| Qualita' su corsivo | media | media | **migliore** |\n",
+    "\n",
+    "> **Paper:** [arxiv 2512.02498](https://arxiv.org/abs/2512.02498) — RedNote / Xiaohongshu, dic 2024"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Verifica Hardware\n",
+    "\n",
+    "dots.ocr e' pesante. Prima di caricare il modello controlliamo le risorse disponibili\n",
+    "e scegliamo la configurazione piu' adatta al laptop."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import psutil\n",
+    "import platform\n",
+    "\n",
+    "ram_gb   = psutil.virtual_memory().total / 1e9\n",
+    "ram_free = psutil.virtual_memory().available / 1e9\n",
+    "has_gpu  = torch.cuda.is_available()\n",
+    "gpu_name = torch.cuda.get_device_name(0) if has_gpu else 'N/A'\n",
+    "vram_gb  = torch.cuda.get_device_properties(0).total_memory / 1e9 if has_gpu else 0\n",
+    "\n",
+    "print(f\"Sistema       : {platform.system()} {platform.release()}\")\n",
+    "print(f\"CPU           : {platform.processor()[:60]}\")\n",
+    "print(f\"RAM totale    : {ram_gb:.1f} GB  (libera: {ram_free:.1f} GB)\")\n",
+    "print(f\"GPU           : {gpu_name}\")\n",
+    "print(f\"VRAM GPU      : {vram_gb:.1f} GB\" if has_gpu else \"VRAM GPU      : N/A\")\n",
+    "print()\n",
+    "\n",
+    "# Raccomandazione\n",
+    "if has_gpu and vram_gb >= 8:\n",
+    "    DEVICE = 'cuda'\n",
+    "    DTYPE  = torch.bfloat16\n",
+    "    ATTN   = 'flash_attention_2'\n",
+    "    print(\"[OK] GPU con VRAM >= 8 GB — usero' CUDA + bf16 + flash_attention_2 (configurazione ottimale)\")\n",
+    "elif has_gpu and vram_gb >= 4:\n",
+    "    DEVICE = 'cuda'\n",
+    "    DTYPE  = torch.float16\n",
+    "    ATTN   = 'sdpa'\n",
+    "    print(\"[OK] GPU con VRAM 4-8 GB — usero' CUDA + fp16 + sdpa\")\n",
+    "elif ram_free >= 8:\n",
+    "    DEVICE = 'cpu'\n",
+    "    DTYPE  = torch.float32\n",
+    "    ATTN   = 'eager'\n",
+    "    print(\"[OK] Solo CPU con RAM libera >= 8 GB — usero' CPU + fp32 (lento ma funziona)\")\n",
+    "    print(\"     Stima tempo per immagine: 2-5 minuti su CPU moderna\")\n",
+    "else:\n",
+    "    DEVICE = 'cpu'\n",
+    "    DTYPE  = torch.float32\n",
+    "    ATTN   = 'eager'\n",
+    "    print(\"[ATTENZIONE] RAM libera < 8 GB — il modello potrebbe non caricarsi completamente.\")\n",
+    "    print(\"             Chiudi altre applicazioni prima di procedere.\")\n",
+    "\n",
+    "print(f\"\\nConfigurazione scelta: device={DEVICE}, dtype={DTYPE}, attn={ATTN}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Installazione\n",
+    "\n",
+    "dots.ocr non e' su PyPI. Richiede il clone del repo e l'installazione locale.\n",
+    "Eseguire **una volta sola** — la cella e' commentata per evitare reinstallazioni accidentali."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Decommenta ed esegui SOLO la prima volta\n",
+    "# -------------------------------------------------------\n",
+    "# import subprocess, sys\n",
+    "#\n",
+    "# # 1. Dipendenze base\n",
+    "# subprocess.run([sys.executable, '-m', 'pip', 'install',\n",
+    "#                 'transformers>=4.49', 'qwen_vl_utils',\n",
+    "#                 'accelerate', 'Pillow', 'psutil'], check=True)\n",
+    "#\n",
+    "# # 2. Clona il repo di dots.ocr (usa il nome 'DotsOCR' senza punti!)\n",
+    "# subprocess.run(['git', 'clone',\n",
+    "#                 'https://github.com/rednote-hilab/dots.ocr.git',\n",
+    "#                 'DotsOCR'], check=True)\n",
+    "#\n",
+    "# # 3. Installa il pacchetto locale\n",
+    "# subprocess.run([sys.executable, '-m', 'pip', 'install', '-e', 'DotsOCR'], check=True)\n",
+    "#\n",
+    "# print('Installazione completata!')\n",
+    "# -------------------------------------------------------\n",
+    "print('Cella di installazione — decommenta per eseguire.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Import e Utility"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "warnings.filterwarnings('ignore')\n",
+    "\n",
+    "from pathlib import Path\n",
+    "import time\n",
+    "\n",
+    "import torch\n",
+    "from PIL import Image\n",
+    "import matplotlib.pyplot as plt\n",
+    "import matplotlib.gridspec as gridspec\n",
+    "\n",
+    "# Percorso root del progetto (notebook si trova in notebooks/)\n",
+    "ROOT = Path('..').resolve()\n",
+    "print(f'Root progetto: {ROOT}')\n",
+    "print(f'PyTorch: {torch.__version__}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Caricamento del Modello\n",
+    "\n",
+    "Il modello viene scaricato da Hugging Face la prima volta (~3.5 GB in bf16, ~7 GB in fp32)\n",
+    "e messo in cache in `~/.cache/huggingface/hub`.\n",
+    "\n",
+    "> Su CPU la prima inferenza richiede 2-5 minuti. Le successive sono piu' veloci\n",
+    "> perche' il modello resta in RAM."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from transformers import AutoModelForCausalLM, AutoProcessor\n",
+    "\n",
+    "MODEL_ID = 'rednote-hilab/dots.ocr'\n",
+    "\n",
+    "print(f'Caricamento {MODEL_ID} ...')\n",
+    "print(f'Device: {DEVICE} | dtype: {DTYPE} | attn: {ATTN}')\n",
+    "print('(Prima volta: scarica ~3.5 GB. Attendi.)')\n",
+    "\n",
+    "t0 = time.time()\n",
+    "\n",
+    "processor = AutoProcessor.from_pretrained(\n",
+    "    MODEL_ID,\n",
+    "    trust_remote_code=True\n",
+    ")\n",
+    "\n",
+    "load_kwargs = dict(\n",
+    "    torch_dtype=DTYPE,\n",
+    "    trust_remote_code=True,\n",
+    ")\n",
+    "if DEVICE == 'cuda':\n",
+    "    load_kwargs['device_map'] = 'auto'\n",
+    "    if ATTN == 'flash_attention_2':\n",
+    "        load_kwargs['attn_implementation'] = 'flash_attention_2'\n",
+    "else:\n",
+    "    load_kwargs['device_map'] = 'cpu'\n",
+    "\n",
+    "model = AutoModelForCausalLM.from_pretrained(MODEL_ID, **load_kwargs)\n",
+    "model.eval()\n",
+    "\n",
+    "print(f'Modello pronto in {time.time()-t0:.1f}s')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Funzione di Trascrizione\n",
+    "\n",
+    "dots.ocr accetta messaggi nel formato chat (come ChatGPT): un'immagine + un prompt testuale\n",
+    "che specifica cosa estrarre. Le modalita' principali sono:\n",
+    "\n",
+    "- `full_ocr` — trascrive tutto il testo mantenendo l'ordine di lettura\n",
+    "- `layout_parse` — restituisce anche la struttura (titoli, paragrafi, tabelle)\n",
+    "- `formula` — rileva formule matematiche\n",
+    "\n",
+    "Per i nostri documenti forensi usiamo `full_ocr`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "try:\n",
+    "    from dots_ocr.utils import dict_promptmode_to_prompt\n",
+    "    PROMPT_TEXT = dict_promptmode_to_prompt.get('full_ocr',\n",
+    "        'Please perform OCR on this image and output all text you can read, preserving line breaks.')\n",
+    "    print(f'Prompt ufficiale caricato: {PROMPT_TEXT[:80]}...')\n",
+    "except ImportError:\n",
+    "    # Fallback se dots_ocr non e' installato come pacchetto\n",
+    "    PROMPT_TEXT = (\n",
+    "        'Please perform OCR on this image. '\n",
+    "        'Output only the transcribed text, preserving line breaks and reading order.'\n",
+    "    )\n",
+    "    print('dots_ocr non trovato come pacchetto — uso prompt generico.')\n",
+    "\n",
+    "\n",
+    "def transcribe(image_path: str | Path, max_new_tokens: int = 1024) -> tuple[str, float]:\n",
+    "    \"\"\"Trascrive il testo in un'immagine con dots.ocr.\n",
+    "    \n",
+    "    Args:\n",
+    "        image_path: percorso all'immagine\n",
+    "        max_new_tokens: token massimi generati (aumentare per documenti lunghi)\n",
+    "    \n",
+    "    Returns:\n",
+    "        (testo_trascritto, secondi_impiegati)\n",
+    "    \"\"\"\n",
+    "    # Prepara il messaggio nel formato chat\n",
+    "    messages = [\n",
+    "        {\n",
+    "            'role': 'user',\n",
+    "            'content': [\n",
+    "                {'type': 'image', 'image': str(image_path)},\n",
+    "                {'type': 'text',  'text': PROMPT_TEXT},\n",
+    "            ]\n",
+    "        }\n",
+    "    ]\n",
+    "\n",
+    "    # Tokenizzazione\n",
+    "    try:\n",
+    "        from qwen_vl_utils import process_vision_info\n",
+    "        text = processor.apply_chat_template(\n",
+    "            messages, tokenize=False, add_generation_prompt=True\n",
+    "        )\n",
+    "        image_inputs, video_inputs = process_vision_info(messages)\n",
+    "        inputs = processor(\n",
+    "            text=[text],\n",
+    "            images=image_inputs,\n",
+    "            videos=video_inputs,\n",
+    "            padding=True,\n",
+    "            return_tensors='pt',\n",
+    "        )\n",
+    "    except ImportError:\n",
+    "        # Fallback senza qwen_vl_utils\n",
+    "        img = Image.open(image_path).convert('RGB')\n",
+    "        inputs = processor(\n",
+    "            text=PROMPT_TEXT,\n",
+    "            images=img,\n",
+    "            return_tensors='pt'\n",
+    "        )\n",
+    "\n",
+    "    inputs = {k: v.to(DEVICE) for k, v in inputs.items()}\n",
+    "\n",
+    "    t0 = time.time()\n",
+    "    with torch.no_grad():\n",
+    "        output_ids = model.generate(\n",
+    "            **inputs,\n",
+    "            max_new_tokens=max_new_tokens,\n",
+    "            do_sample=False,\n",
+    "        )\n",
+    "    elapsed = time.time() - t0\n",
+    "\n",
+    "    # Decodifica (rimuove i token di input dal risultato)\n",
+    "    generated = output_ids[:, inputs['input_ids'].shape[1]:]\n",
+    "    text_out  = processor.batch_decode(generated, skip_special_tokens=True)[0]\n",
+    "    return text_out.strip(), elapsed\n",
+    "\n",
+    "\n",
+    "def show_result(image_path: str | Path, text: str, elapsed: float) -> None:\n",
+    "    \"\"\"Visualizza immagine e trascrizione affiancate.\"\"\"\n",
+    "    img = Image.open(image_path)\n",
+    "    fig = plt.figure(figsize=(16, max(5, img.height / img.width * 8)))\n",
+    "    gs  = gridspec.GridSpec(1, 2, width_ratios=[1, 1])\n",
+    "\n",
+    "    ax_img  = fig.add_subplot(gs[0])\n",
+    "    ax_text = fig.add_subplot(gs[1])\n",
+    "\n",
+    "    ax_img.imshow(img)\n",
+    "    ax_img.set_title('Immagine originale', fontsize=12)\n",
+    "    ax_img.axis('off')\n",
+    "\n",
+    "    ax_text.text(\n",
+    "        0.03, 0.97, text,\n",
+    "        fontsize=10, va='top', wrap=True,\n",
+    "        fontfamily='monospace',\n",
+    "        transform=ax_text.transAxes,\n",
+    "        bbox=dict(boxstyle='round', facecolor='#f5f5dc', alpha=0.9)\n",
+    "    )\n",
+    "    ax_text.set_title(f'Trascrizione dots.ocr  ({elapsed:.1f}s)', fontsize=12)\n",
+    "    ax_text.axis('off')\n",
+    "\n",
+    "    plt.tight_layout()\n",
+    "    plt.show()\n",
+    "\n",
+    "print('Funzioni pronte.')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Demo 1 — Campione writer_00 (testo manoscritto)\n",
+    "\n",
+    "Trascriviamo uno dei campioni di writer_00 che usiamo anche per l'identificazione scrittore.\n",
+    "Ogni immagine e' 320x140 px e contiene 3 righe di testo italiano."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sample_path = ROOT / 'data/samples/writer_00/sample_000.png'\n",
+    "\n",
+    "print(f'Immagine: {sample_path}')\n",
+    "print('Avvio trascrizione ... (su CPU: 2-5 minuti)')\n",
+    "\n",
+    "text, elapsed = transcribe(sample_path)\n",
+    "\n",
+    "print(f'\\nTrascrizione ({elapsed:.1f}s):')\n",
+    "print('-' * 40)\n",
+    "print(text)\n",
+    "\n",
+    "show_result(sample_path, text, elapsed)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Demo 2 — Documento testamento (documento completo)\n",
+    "\n",
+    "Trascriviamo `testamento_writer00.png`, il documento fittizio composto da campioni\n",
+    "reali di writer_00. Questo e' il caso d'uso forense principale.\n",
+    "\n",
+    "> Con `max_new_tokens=2048` diamo al modello spazio sufficiente per trascrivere\n",
+    "> un intero documento di piu' pagine."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "doc_path = ROOT / 'data/samples/testamento_writer00.png'\n",
+    "\n",
+    "if not doc_path.exists():\n",
+    "    print(f'File non trovato: {doc_path}')\n",
+    "    print('Esegui prima scripts/create_testamento_writer00.py')\n",
+    "else:\n",
+    "    print(f'Immagine: {doc_path}')\n",
+    "    print('Avvio trascrizione documento completo ... (piu\\' lungo del campione singolo)')\n",
+    "\n",
+    "    text_doc, elapsed_doc = transcribe(doc_path, max_new_tokens=2048)\n",
+    "\n",
+    "    print(f'\\nTrascrizione ({elapsed_doc:.1f}s):')\n",
+    "    print('-' * 40)\n",
+    "    print(text_doc)\n",
+    "\n",
+    "    show_result(doc_path, text_doc, elapsed_doc)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Demo 3 — Immagine Lorella (scrittura reale del mondo reale)\n",
+    "\n",
+    "Trascriviamo una delle immagini dal dataset Lorella — scrittura reale,\n",
+    "non campioni di dettato standardizzati."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lorella_dir = ROOT / 'data/lorella'\n",
+    "lorella_images = sorted(lorella_dir.glob('*.png'))[:2]  # prime 2 per velocita'\n",
+    "\n",
+    "if not lorella_images:\n",
+    "    print(f'Nessuna immagine trovata in {lorella_dir}')\n",
+    "else:\n",
+    "    for img_path in lorella_images:\n",
+    "        print(f'\\n--- {img_path.name} ---')\n",
+    "        text_l, elapsed_l = transcribe(img_path)\n",
+    "        print(f'Trascrizione ({elapsed_l:.1f}s):')\n",
+    "        print(text_l)\n",
+    "        show_result(img_path, text_l, elapsed_l)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Demo 4 — Confronto EasyOCR vs dots.ocr\n",
+    "\n",
+    "Confronto diretto sullo stesso campione per valutare la differenza di qualita'."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "compare_path = ROOT / 'data/samples/writer_00/sample_000.png'\n",
+    "img_np = np.array(Image.open(compare_path).convert('RGB'))\n",
+    "\n",
+    "# --- EasyOCR ---\n",
+    "print('EasyOCR ...')\n",
+    "import easyocr\n",
+    "reader = easyocr.Reader(['it', 'en'], gpu=DEVICE == 'cuda')\n",
+    "t_easy = time.time()\n",
+    "easy_result = reader.readtext(img_np, detail=0, paragraph=True)\n",
+    "easy_text   = '\\n'.join(easy_result)\n",
+    "easy_time   = time.time() - t_easy\n",
+    "\n",
+    "# --- dots.ocr ---\n",
+    "print('dots.ocr ...')\n",
+    "dots_text, dots_time = transcribe(compare_path)\n",
+    "\n",
+    "# --- Visualizzazione ---\n",
+    "fig, axes = plt.subplots(1, 3, figsize=(18, 5))\n",
+    "\n",
+    "axes[0].imshow(img_np)\n",
+    "axes[0].set_title('Immagine originale', fontsize=12)\n",
+    "axes[0].axis('off')\n",
+    "\n",
+    "for ax, title, text, t in [\n",
+    "    (axes[1], f'EasyOCR ({easy_time:.1f}s)',  easy_text, easy_time),\n",
+    "    (axes[2], f'dots.ocr ({dots_time:.1f}s)', dots_text, dots_time),\n",
+    "]:\n",
+    "    ax.text(0.05, 0.5, text or '(nessun risultato)',\n",
+    "            fontsize=12, va='center', fontfamily='monospace',\n",
+    "            transform=ax.transAxes,\n",
+    "            bbox=dict(boxstyle='round', facecolor='#f0f8ff', alpha=0.9))\n",
+    "    ax.set_title(title, fontsize=12)\n",
+    "    ax.axis('off')\n",
+    "\n",
+    "plt.suptitle('Confronto EasyOCR vs dots.ocr', fontsize=14, fontweight='bold')\n",
+    "plt.tight_layout()\n",
+    "plt.show()\n",
+    "\n",
+    "print(f'\\nEasyOCR : {easy_text}')\n",
+    "print(f'\\ndots.ocr: {dots_text}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Misurazione CER (Character Error Rate)\n",
+    "\n",
+    "Se conosci il testo esatto dell'immagine, puoi misurare l'errore con il CER\n",
+    "(frazione di caratteri errati, 0 = perfetto, 1 = tutto sbagliato)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def cer(reference: str, hypothesis: str) -> float:\n",
+    "    \"\"\"Character Error Rate tramite distanza di edit.\"\"\"\n",
+    "    r, h = list(reference.replace(' ', '')), list(hypothesis.replace(' ', ''))\n",
+    "    d = [[0] * (len(h) + 1) for _ in range(len(r) + 1)]\n",
+    "    for i in range(len(r) + 1): d[i][0] = i\n",
+    "    for j in range(len(h) + 1): d[0][j] = j\n",
+    "    for i in range(1, len(r)+1):\n",
+    "        for j in range(1, len(h)+1):\n",
+    "            cost = 0 if r[i-1] == h[j-1] else 1\n",
+    "            d[i][j] = min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1]+cost)\n",
+    "    return d[len(r)][len(h)] / max(len(r), 1)\n",
+    "\n",
+    "\n",
+    "# Testo atteso per sample_000.png\n",
+    "# (da leggere manualmente dall'immagine)\n",
+    "ground_truth = \"il gatto dorme sul tetto\\nla casa e piccola e bella\\noggi il cielo e molto blu\"\n",
+    "\n",
+    "cer_easy = cer(ground_truth, easy_text)\n",
+    "cer_dots = cer(ground_truth, dots_text)\n",
+    "\n",
+    "print(f'Ground truth : {ground_truth!r}')\n",
+    "print(f'EasyOCR      : {easy_text!r}  →  CER = {cer_easy:.3f} ({cer_easy*100:.1f}%)')\n",
+    "print(f'dots.ocr     : {dots_text!r}  →  CER = {cer_dots:.3f} ({cer_dots*100:.1f}%)')\n",
+    "\n",
+    "winner = 'dots.ocr' if cer_dots < cer_easy else 'EasyOCR'\n",
+    "print(f'\\nModello migliore su questo campione: {winner}')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Note Forensi\n",
+    "\n",
+    "- **dots.ocr e' un VLM**: genera testo token per token. In rari casi puo' \"allucinare\"\n",
+    "  parole plausibili ma non presenti nell'immagine. Verificare sempre contro l'originale.\n",
+    "\n",
+    "- **Velocita' su CPU**: 2-5 minuti per immagine su laptop moderno senza GPU. Accettabile\n",
+    "  per analisi forensi manuali, non adatto a pipeline automatizzate in tempo reale.\n",
+    "\n",
+    "- **Qualita' su corsivo**: migliore di EasyOCR grazie al contesto linguistico LLM,\n",
+    "  ma non perfetto — la scrittura corsiva personale rimane la sfida principale.\n",
+    "\n",
+    "- **Alternativa commerciale per qualita' massima**: [Transkribus](https://www.transkribus.org)\n",
+    "  ha modelli specializzati su manoscritti storici italiani.\n",
+    "\n",
+    "- **Integrazione nella demo Gradio**: il modello e' troppo lento per una demo interattiva\n",
+    "  su laptop. Manteniamo EasyOCR nel tab HTR e usiamo dots.ocr solo offline (questo notebook).\n",
+    "\n",
+    "---\n",
+    "\n",
+    "**Lab precedente →** [07 — Named Entity Recognition](07_named_entity_recognition.ipynb)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (GraphoLab)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}