Update book from local render
Browse files- .gitattributes +14 -0
- AI-Design-Patterns-for-GLAM.pdf +2 -2
- patterns/structured-generation/advisor-index-cards.html +0 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-10-output-1.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-1.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-2.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-3.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-4.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-5.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-6.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-1.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-2.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-3.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-4.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-5.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-6.png +3 -0
- patterns/structured-generation/advisor-index-cards_files/figure-html/cell-3-output-1.png +3 -0
- patterns/structured-generation/vlm-structured-generation.html +4 -4
- search.json +18 -7
.gitattributes
CHANGED
|
@@ -68,3 +68,17 @@ patterns/structured-generation/vlm-structured-generation_files/figure-html/cell-
|
|
| 68 |
patterns/structured-generation/vlm-structured-generation_files/figure-html/cell-3-output-1.png filter=lfs diff=lfs merge=lfs -text
|
| 69 |
patterns/structured-generation/vlm-structured-generation_files/figure-html/cell-4-output-1.png filter=lfs diff=lfs merge=lfs -text
|
| 70 |
patterns/structured-generation/vlm-structured-generation_files/figure-html/cell-5-output-1.png filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
patterns/structured-generation/vlm-structured-generation_files/figure-html/cell-3-output-1.png filter=lfs diff=lfs merge=lfs -text
|
| 69 |
patterns/structured-generation/vlm-structured-generation_files/figure-html/cell-4-output-1.png filter=lfs diff=lfs merge=lfs -text
|
| 70 |
patterns/structured-generation/vlm-structured-generation_files/figure-html/cell-5-output-1.png filter=lfs diff=lfs merge=lfs -text
|
| 71 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-10-output-1.png filter=lfs diff=lfs merge=lfs -text
|
| 72 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-1.png filter=lfs diff=lfs merge=lfs -text
|
| 73 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-2.png filter=lfs diff=lfs merge=lfs -text
|
| 74 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-3.png filter=lfs diff=lfs merge=lfs -text
|
| 75 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-4.png filter=lfs diff=lfs merge=lfs -text
|
| 76 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-5.png filter=lfs diff=lfs merge=lfs -text
|
| 77 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-6.png filter=lfs diff=lfs merge=lfs -text
|
| 78 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-1.png filter=lfs diff=lfs merge=lfs -text
|
| 79 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-2.png filter=lfs diff=lfs merge=lfs -text
|
| 80 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-3.png filter=lfs diff=lfs merge=lfs -text
|
| 81 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-4.png filter=lfs diff=lfs merge=lfs -text
|
| 82 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-5.png filter=lfs diff=lfs merge=lfs -text
|
| 83 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-6.png filter=lfs diff=lfs merge=lfs -text
|
| 84 |
+
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-3-output-1.png filter=lfs diff=lfs merge=lfs -text
|
AI-Design-Patterns-for-GLAM.pdf
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1788a4bea657c058ecf2da2887fb853ade4d9584e60e7499b0fbdf66c40a1715
|
| 3 |
+
size 33470173
|
patterns/structured-generation/advisor-index-cards.html
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-10-output-1.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-1.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-2.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-3.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-4.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-5.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-14-output-6.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-1.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-2.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-3.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-4.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-5.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-17-output-6.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/advisor-index-cards_files/figure-html/cell-3-output-1.png
ADDED
|
Git LFS Details
|
patterns/structured-generation/vlm-structured-generation.html
CHANGED
|
@@ -227,8 +227,8 @@ pre > code.sourceCode > span > a:first-child::before { text-decoration: underlin
|
|
| 227 |
<li><a href="#classification" id="toc-classification" class="nav-link" data-scroll-target="#classification"><span class="header-section-number">3.6</span> Classification</a>
|
| 228 |
<ul class="collapse">
|
| 229 |
<li><a href="#classifying-with-structured-labels" id="toc-classifying-with-structured-labels" class="nav-link" data-scroll-target="#classifying-with-structured-labels"><span class="header-section-number">3.6.1</span> Classifying with structured labels</a></li>
|
| 230 |
-
<li><a href="#beyond-classifying" id="toc-beyond-classifying" class="nav-link" data-scroll-target="#beyond-classifying"><span class="header-section-number">3.6.2</span> Beyond classifying</a></li>
|
| 231 |
</ul></li>
|
|
|
|
| 232 |
</ul>
|
| 233 |
</nav>
|
| 234 |
</div>
|
|
@@ -981,8 +981,9 @@ Projected time for full dataset: 455.91 minutes (7.60 hours)</code></pre>
|
|
| 981 |
</div>
|
| 982 |
</div>
|
| 983 |
</section>
|
| 984 |
-
|
| 985 |
-
<
|
|
|
|
| 986 |
<p>So far we’ve focused on classifying images but what if we want to extract information from the images? Let’s take the first example from the dataset again.</p>
|
| 987 |
<div id="64b59bb9" class="cell" data-cache="true" data-execution_count="22">
|
| 988 |
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a>index_image <span class="op">=</span> ds[<span class="dv">0</span>][<span class="st">'image'</span>]</span>
|
|
@@ -1142,7 +1143,6 @@ Projected time for full dataset: 455.91 minutes (7.60 hours)</code></pre>
|
|
| 1142 |
</div>
|
| 1143 |
|
| 1144 |
|
| 1145 |
-
</section>
|
| 1146 |
</section>
|
| 1147 |
|
| 1148 |
</main> <!-- /main -->
|
|
|
|
| 227 |
<li><a href="#classification" id="toc-classification" class="nav-link" data-scroll-target="#classification"><span class="header-section-number">3.6</span> Classification</a>
|
| 228 |
<ul class="collapse">
|
| 229 |
<li><a href="#classifying-with-structured-labels" id="toc-classifying-with-structured-labels" class="nav-link" data-scroll-target="#classifying-with-structured-labels"><span class="header-section-number">3.6.1</span> Classifying with structured labels</a></li>
|
|
|
|
| 230 |
</ul></li>
|
| 231 |
+
<li><a href="#beyond-classifying---extracting-structured-information" id="toc-beyond-classifying---extracting-structured-information" class="nav-link" data-scroll-target="#beyond-classifying---extracting-structured-information"><span class="header-section-number">3.7</span> Beyond classifying - Extracting structured information</a></li>
|
| 232 |
</ul>
|
| 233 |
</nav>
|
| 234 |
</div>
|
|
|
|
| 981 |
</div>
|
| 982 |
</div>
|
| 983 |
</section>
|
| 984 |
+
</section>
|
| 985 |
+
<section id="beyond-classifying---extracting-structured-information" class="level2" data-number="3.7">
|
| 986 |
+
<h2 data-number="3.7" class="anchored" data-anchor-id="beyond-classifying---extracting-structured-information"><span class="header-section-number">3.7</span> Beyond classifying - Extracting structured information</h2>
|
| 987 |
<p>So far we’ve focused on classifying images but what if we want to extract information from the images? Let’s take the first example from the dataset again.</p>
|
| 988 |
<div id="64b59bb9" class="cell" data-cache="true" data-execution_count="22">
|
| 989 |
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb28"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a>index_image <span class="op">=</span> ds[<span class="dv">0</span>][<span class="st">'image'</span>]</span>
|
|
|
|
| 1143 |
</div>
|
| 1144 |
|
| 1145 |
|
|
|
|
| 1146 |
</section>
|
| 1147 |
|
| 1148 |
</main> <!-- /main -->
|
search.json
CHANGED
|
@@ -122,7 +122,18 @@
|
|
| 122 |
"href": "patterns/structured-generation/vlm-structured-generation.html#classification",
|
| 123 |
"title": "3 Structured Information Extraction with Vision Language Models",
|
| 124 |
"section": "3.6 Classification",
|
| 125 |
-
"text": "3.6 Classification\n\nWe’ll define a fairly simple prompt that asks the VLM to decide if a page is one of three categories. We describe each of these categopries and then ask the model to only return one of these as the output. We’ll do this for ten examples and we’ll also log how long it’s taking.\n\nimport time\nfrom tqdm.auto import tqdm\n\nsample_size = 10\n\nsample = ds.take(sample_size)\n\nprompt = \"\"\"Classify this image into one of the following categories:\n\n1. **Index/Reference Card**: A library catalog or reference card\n\n2. **Manuscript Page**: A handwritten or historical document page\n\n3. **Other**: Any document that doesn't fit the above categories\n\nExamine the overall structure, layout, and content type to determine the classification. Focus on whether the document is a structured catalog/reference tool (Index Card) or a historical manuscript with continuous text (Manuscript Page).\n\nReturn only the category name: \"Index/Reference Card\", \"Manuscript Page\", or \"Other\"\n\"\"\"\n\nresults = []\n# Time the execution using standard Python\nstart_time = time.time()\nfor row in tqdm(sample):\n image = row['image']\n results.append(query_image(image, prompt))\nelapsed_time = time.time() - start_time\nprint(f\"Execution time: {elapsed_time:.2f} seconds\")\nrprint(results)\n\n\n\n\nExecution time: 100.05 seconds\n\n\n[\n 'Index/Reference Card',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page'\n]\n\n\n\nLet’s check the result that was predicted as “index/reference card”\n\nsample[0]['image']\n\n\n\n\n\n\n\n\nWe can extrapolate how long this would take for the full dataset\n\n# Calculate average time per image\navg_time_per_image = elapsed_time / sample_size\n\n# Project time for full dataset\ntotal_images = len(ds)\nprojected_time = avg_time_per_image * total_images\n\nprint(f\"Sample processing time: {elapsed_time:.2f} seconds ({elapsed_time/60:.2f} minutes)\")\nprint(f\"Average time per image: {avg_time_per_image:.2f} seconds\")\nprint(f\"Total images in dataset: {total_images}\")\nprint(f\"Projected time for full dataset: {projected_time/60:.2f} minutes ({projected_time/3600:.2f} hours)\")\n\nSample processing time: 100.05 seconds (1.67 minutes)\nAverage time per image: 10.01 seconds\nTotal images in dataset: 2734\nProjected time for full dataset: 455.91 minutes (7.60 hours)\n\n\n\n3.6.1 Classifying with structured labels\nIn the previous example, we relied on the model to return the label in the correct format. While this often works, it can sometimes lead to inconsistencies in the output. To address this, we can use Pydantic models to define a structured output format. This way, we can ensure that the output adheres to a specific schema.\nIn this example, we’ll define a Pydantic model for our classification task. The model will have a single field category which can take one of three literal values \"Index/Reference Card\", \"Manuscript Page\", or \"other\".\nWhat this means in practice is that the model will only be able to return one of these three values for the category field.\n\nfrom pydantic import BaseModel, Field\nfrom typing import Literal\n\nclass PageCategory(BaseModel):\n category: Literal[\"Index/Reference Card\", \"Manuscript Page\", \"other\"] = Field(\n ..., description=\"The category of the image\"\n )\n\nWhen using the OpenAI client we can specify this Pydantic model as the response_format when making the request. This tells the model to return the output in a format that can be parsed into the Pydantic model (the APIs for this are still evolving so may change slightly over time).\n\nbuffered = BytesIO()\nimage.save(buffered, format=\"JPEG\")\nimage_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')\ncompletion = client.beta.chat.completions.parse(\n model=\"qwen/qwen2.5-vl-7b\",\n messages=[\n {\n \"role\": \"user\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": prompt,\n },\n {\n \"type\": \"image_url\",\n \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image_base64}\"},\n },\n ],\n },\n ],\n max_tokens=100,\n temperature=0.7,\n response_format=PageCategory,\n)\nrprint(completion)\nrprint(completion.choices[0].message.parsed)\n\nParsedChatCompletion[PageCategory](\n id='chatcmpl-v8bizojixwds0z7pg8j0th',\n choices=[\n ParsedChoice[PageCategory](\n finish_reason='stop',\n index=0,\n logprobs=None,\n message=ParsedChatCompletionMessage[PageCategory](\n content='{\"category\": \"Manuscript Page\"}',\n refusal=None,\n role='assistant',\n annotations=None,\n audio=None,\n function_call=None,\n tool_calls=None,\n parsed=PageCategory(category='Manuscript Page')\n )\n )\n ],\n created=1761588374,\n model='qwen/qwen2.5-vl-7b',\n object='chat.completion',\n service_tier=None,\n system_fingerprint='qwen/qwen2.5-vl-7b',\n usage=CompletionUsage(\n completion_tokens=10,\n prompt_tokens=142,\n total_tokens=152,\n completion_tokens_details=None,\n prompt_tokens_details=None\n ),\n stats={}\n)\n\n\n\nPageCategory(category='Manuscript Page')\n\n\n\n\nimage\n\n\n\n\n\n\n\n\n\n\n3.6.2 Beyond classifying\nSo far we’ve focused on classifying images but what if we want to extract information from the images? Let’s take the first example from the dataset again.\n\nindex_image = ds[0]['image']\nindex_image\n\n\n\n\n\n\n\n\nIf we have an image like this we don’t just want to assign a label from it (we may do this as a first step) we actually want to extract the various fields from the card in a structured way. We can again use a Pydantic model to define the structure of the data we want to extract.\n\nfrom pydantic import BaseModel, Field\nfrom typing import Optional\n\n\nclass BritishLibraryReprographicCard(BaseModel):\n \"\"\"\n Pydantic model for extracting information from British Library Reference Division \n reprographic cards used to document manuscripts and other materials.\n \"\"\"\n \n department: str = Field(\n ..., \n description=\"The division that holds the material (e.g., 'MANUSCRIPTS')\"\n )\n \n shelfmark: str = Field(\n ..., \n description=\"The library's classification/location code (e.g., 'SLOANE 3972.C. (VOL 1)')\"\n )\n \n order: str = Field(\n ..., \n description=\"Order reference, typically starting with 'SCH NO' followed by numbers\"\n )\n \n author: Optional[str] = Field(\n None, \n description=\"Author name if present, null if blank or marked with diagonal line\"\n )\n \n title: str = Field(\n ..., \n description=\"The name of the work or manuscript\"\n )\n \n place_and_date_of_publication: Optional[str] = Field(\n None, \n description=\"Place and date of publication if present, null if blank\"\n )\n \n reduction: int = Field(\n ..., \n description=\"The reduction number shown at the bottom of the card\"\n )\n\nWe’ll now create a function to handle the querying process using this structured schema.\n\ndef query_image_structured(image, prompt, schema, model='qwen3-vl-2b-instruct-mlx'):\n \"\"\"\n Query VLM with an image and get structured output based on a Pydantic schema.\n \n Args:\n image: PIL Image or file path to the image\n prompt: Text prompt describing what to extract\n schema: Pydantic model class defining the expected output structure\n model: Model ID to use for the query\n \n Returns:\n Parsed Pydantic model instance with the extracted data\n \"\"\"\n # Convert image to base64\n if isinstance(image, PILImage):\n buffered = BytesIO()\n image.save(buffered, format=\"JPEG\")\n image_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')\n else:\n with open(image, \"rb\") as f:\n image_base64 = base64.b64encode(f.read()).decode('utf-8')\n \n # Query with structured output\n completion = client.beta.chat.completions.parse(\n model=model,\n messages=[{\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": prompt},\n {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image_base64}\"}}\n ]\n }],\n response_format=schema,\n temperature=0.3 # Lower temperature for more consistent extraction\n )\n \n # Return the parsed structured data\n return completion.choices[0].message.parsed\n\nWe also need to define a prompt that describes what information we want to extract from the card.\n\n# Example usage\nextraction_prompt = \"\"\"\nExtract the information from this British Library card into structured data (JSON format).\n\nRead each field on the card and extract the following information:\n- department: The division name (e.g., \"MANUSCRIPTS\")\n- shelfmark: The catalog number (e.g., \"SLOANE 3972.C. (VOL 1)\")\n- order: The SCH NO reference number\n- author: The author name, or null if blank\n- title: The full title of the work\n- place_and_date_of_publication: Publication info, or null if blank\n- reduction: The reduction number (as integer) at bottom of card\n\nReturn the exact text as shown on the card. For empty fields with diagonal lines or no text, use null.\n\"\"\"\nresult = query_image_structured(index_image, extraction_prompt, BritishLibraryReprographicCard)\nrprint(result)\n\nBritishLibraryReprographicCard(\n department='MANUSCRIPTS',\n shelfmark='SLOANE 3972.C. (VOL 1)',\n order='98876',\n author='HANS SLOANES',\n title='CATALOGUE OF SIR HANS SLOANES LIBRARY',\n place_and_date_of_publication=None,\n reduction=12\n)\n\n\n\n\nrprint(result)\n\nBritishLibraryReprographicCard(\n department='MANUSCRIPTS',\n shelfmark='SLOANE 3972.C. (VOL 1)',\n order='98876',\n author='HANS SLOANES',\n title='CATALOGUE OF SIR HANS SLOANES LIBRARY',\n place_and_date_of_publication=None,\n reduction=12\n)\n\n\n\n\nindex_image",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
"crumbs": [
|
| 127 |
"Structured Information Extraction",
|
| 128 |
"<span class='chapter-number'>3</span> <span class='chapter-title'>Structured Information Extraction with Vision Language Models</span>"
|
|
@@ -133,7 +144,7 @@
|
|
| 133 |
"href": "patterns/structured-generation/advisor-index-cards.html",
|
| 134 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 135 |
"section": "",
|
| 136 |
-
"text": "4.1 Introduction\nThis chapter demonstrates a practical application of VLM-based structured extraction on a real-world GLAM digitization project: extracting structured metadata from historical index cards from the National Library of Scotland’s Advocate’s Library collection.\nUnlike the previous chapter which focused on explaining VLM concepts and setup, this chapter assumes you’re familiar with the basics and focuses on
|
| 137 |
"crumbs": [
|
| 138 |
"Structured Information Extraction",
|
| 139 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
@@ -155,7 +166,7 @@
|
|
| 155 |
"href": "patterns/structured-generation/advisor-index-cards.html#the-task-advisor-index-cards",
|
| 156 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 157 |
"section": "4.2 The Task: Advisor Index Cards",
|
| 158 |
-
"text": "4.2 The Task: Advisor Index Cards\nThe National Library of Scotland has a collection of historical index cards documenting manuscripts and correspondence. Each card follows a consistent format:\n\nSurname: Family name\nForenames: Given names\nEpithet: Role, title, or occupation\nMS no: Manuscript reference number\nDescription: Document type and date\nFolios: Page references\n\nThe goal is to extract this structured information to enable: - Searchable digital catalog - Integration with library management systems - Research access to historical collections\n\n4.2.1 Example Cards\nLet’s look at a few sample cards from the collection:\n\nfrom pathlib import Path\nimport matplotlib.pyplot as plt\n\nimages = list(Path(\"assets/vllm-structured-generation/indexes
|
| 159 |
"crumbs": [
|
| 160 |
"Structured Information Extraction",
|
| 161 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
@@ -166,7 +177,7 @@
|
|
| 166 |
"href": "patterns/structured-generation/advisor-index-cards.html#schema-design",
|
| 167 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 168 |
"section": "4.3 Schema Design",
|
| 169 |
-
"text": "4.3 Schema Design\nWorking with the library curators, we designed a schema that matches their cataloging requirements. The schema is intentionally simple - complex schemas are harder for VLMs to extract reliably.\n\nfrom pydantic import BaseModel, Field\nfrom typing import Optional\n\nclass IndexCardEntry(BaseModel):\n \"\"\"Schema for index card extraction matching curator specification\"\"\"\n \n surname: str = Field(..., description=\"Family name as written on card\")\n forenames: Optional[str] = Field(None, description=\"Given names\")\n epithet: Optional[str] = Field(None, description=\"Title, occupation, or role\")\n ms_no: str = Field(..., description=\"Manuscript number\")\n description: str = Field(..., description=\"Document description with date\")\n folios: str = Field(..., description=\"Folio reference\")\n \n failed_to_parse: bool = Field(\n False,\n description=\"Set to True if the card cannot be reliably extracted (illegible, damaged, etc.)\"\n )\n notes: Optional[str] = Field(\n None, \n description=\"Optional notes about the card: handwritten annotations, ambiguities, \"\n \"corrections, or reasons for failed parsing.\"\n )\n\n# Display the schema\nprint(IndexCardEntry.model_json_schema())\n\n{'description': 'Schema for index card extraction matching curator specification'
|
| 170 |
"crumbs": [
|
| 171 |
"Structured Information Extraction",
|
| 172 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
@@ -177,7 +188,7 @@
|
|
| 177 |
"href": "patterns/structured-generation/advisor-index-cards.html#setup",
|
| 178 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 179 |
"section": "4.4 Setup",
|
| 180 |
-
"text": "4.4 Setup\nWe’ll reuse the VLM setup from the previous chapter. If you haven’t already, make sure LM Studio is running with a VLM loaded.\n\nfrom openai import OpenAI\nimport base64\nfrom io import BytesIO\nfrom PIL import Image as PILImage\n\n\
|
| 181 |
"crumbs": [
|
| 182 |
"Structured Information Extraction",
|
| 183 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
@@ -188,7 +199,7 @@
|
|
| 188 |
"href": "patterns/structured-generation/advisor-index-cards.html#extraction-examples",
|
| 189 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 190 |
"section": "4.5 Extraction Examples",
|
| 191 |
-
"text": "4.5 Extraction Examples\nLet’s run extraction on several sample cards to see how the model performs.\n\nprompt = \"\"\"Extract structured information from this historical library index card and return it as JSON.\n\n This is an index card from the National Library of Scotland's Advocate's Library collection. Each card documents a person and associated manuscript references.\n\n Return a JSON object with these exact fields:\n\n {\n \"surname\": \"Family name exactly as typed (e.g., 'ABAD', 'ABARACA Y BOLEA')\",\n \"forenames\": \"Given names (e.g., 'Joseph', 'Thomas') or null if not present\",\n \"epithet\": \"Title, occupation, or role (e.g., 'Captain, Spanish Army') or null if not present\",\n \"ms_no\": \"Manuscript number exactly as written (e.g., '5538', '5529')\",\n \"description\": \"Document description with date (e.g., 'letter of (1783)', 'copy of petition of (ca. 1783)')\",\n \"folios\": \"Folio reference exactly as written (e.g., 'f.11', 'f.169')\",\n \"failed_to_parse\": false (or true if card is illegible/severely damaged),\n \"notes\": \"Optional notes about handwritten corrections, ambiguities, or parsing issues\"\n }\n\n Guidelines:\n - Extract text exactly as it appears - do not correct spelling or expand abbreviations\n - Preserve original punctuation and formatting\n - If a field is unclear but you can make a reasonable inference, extract it and note the ambiguity in \"notes\"\n - Only set \"failed_to_parse\" to true if you genuinely cannot extract the required fields\n - Use null for optional fields (forenames, epithet, notes) if they are not present or marked with a line\"\"\"\n\n\nimage = PILImage.open(images[0])\nimage \n\n\n\n\n\n\n\n\n\nfrom rich import print\nresult = query_image_structured(image, prompt, IndexCardEntry, model='qwen/qwen3-vl-4b')\n \n\n\nprint(result)\n\nIndexCardEntry(\n surname='ABBAATE',\n forenames='Itala',\n epithet='Daughter of the Physician',\n ms_no='2633',\n description='letter of (1878)',\n folios='f. 38',\n failed_to_parse=False,\n notes=\"Handwritten
|
| 192 |
"crumbs": [
|
| 193 |
"Structured Information Extraction",
|
| 194 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
@@ -199,7 +210,7 @@
|
|
| 199 |
"href": "patterns/structured-generation/advisor-index-cards.html#evaluation-strategies",
|
| 200 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 201 |
"section": "4.6 Evaluation Strategies",
|
| 202 |
-
"text": "4.6 Evaluation Strategies\nHow do we know if the extraction is working well? There are several approaches to evaluation, each with different tradeoffs.\nChecking against the images\n\n# Display images with extracted data side-by-side\n# Two columns: left = image, right = extracted text\n\nfor i, (img_stem, result) in enumerate(results):\n fig, (ax_img, ax_text) = plt.subplots(1, 2, figsize=(16, 6), \n gridspec_kw={'width_ratios': [1, 1]})\n \n # Left: Display image\n img = plt.imread(images[i])\n ax_img.imshow(img)\n ax_img.axis('off')\n ax_img.set_title(f\"Card {i+1}: {img_stem}\", fontsize=14, fontweight='bold')\n \n # Right: Display extracted data as formatted text\n ax_text.axis('off')\n \n # Format the extracted data nicely\n text_lines = [\n \"Extracted Data:\",\n \"\",\n f\"Surname: {result.surname}\",\n f\"Forenames: {result.forenames or 'N/A'}\",\n f\"Epithet: {result.epithet or 'N/A'}\",\n f\"MS No: {result.ms_no}\",\n f\"Description: {result.description}\",\n f\"Folios: {result.folios}\",\n \"\",\n f\"Failed to Parse: {result.failed_to_parse}\",\n ]\n \n # Add notes if present\n if result.notes:\n text_lines.append(\"\")\n text_lines.append(\"Notes:\")\n # Wrap long notes\n import textwrap\n wrapped_notes = textwrap.fill(result.notes, width=60)\n text_lines.append(wrapped_notes)\n \n # Join and display\n formatted_text = \"\\n\".join(text_lines)\n ax_text.text(0.05, 0.95, formatted_text, \n transform=ax_text.transAxes,\n fontsize=11,\n verticalalignment='top',\n fontfamily='monospace',\n bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))\n \n plt.tight_layout()\n plt.show()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nnotes\nNotes don’t seem that useful and waste tokens. Let’s remove.\nWe should also test that the “failed to parse” flag works correctly. let’s try an image that is blank or has no text.\n\nfrom pydantic import BaseModel, Field\nfrom typing import Optional\n\nclass IndexCardEntry(BaseModel):\n \"\"\"Schema for index card extraction matching curator specification\"\"\"\n \n surname: str = Field(..., description=\"Family name as written on card\")\n forenames: Optional[str] = Field(None, description=\"Given names\")\n epithet: Optional[str] = Field(None, description=\"Title, occupation, or role\")\n ms_no: str = Field(..., description=\"Manuscript number\")\n description: str = Field(..., description=\"Document description with date\")\n folios: str = Field(..., description=\"Folio reference\")\n \n\nprompt = \"\"\"Extract structured information from this historical library index card and return it as JSON.\n\n This is an index card from the National Library of Scotland's Advocate's Library collection. Each card documents a person and associated manuscript references.\n\n Return a JSON object with these exact fields:\n\n {\n \"surname\": \"Family name exactly as typed (e.g., 'ABAD', 'ABARACA Y BOLEA')\",\n \"forenames\": \"Given names (e.g., 'Joseph', 'Thomas') or null if not present\",\n \"epithet\": \"Title, occupation, or role (e.g., 'Captain, Spanish Army') or null if not present\",\n \"ms_no\": \"Manuscript number exactly as written (e.g., '5538', '5529')\",\n \"description\": \"Document description with date (e.g., 'letter of (1783)', 'copy of petition of (ca. 1783)')\",\n \"folios\": \"Folio reference exactly as written (e.g., 'f.11', 'f.169')\",\n \"notes\": \"Optional notes about handwritten corrections, ambiguities, or parsing issues\"\n }\n\n Guidelines:\n - Extract text exactly as it appears - do not correct spelling or expand abbreviations\n - Preserve original punctuation and formatting\n - Use null for optional fields (forenames, epithet, notes) if they are not present or marked with a line\"\"\"\n\n\nresults = []\nfor img_path in tqdm(images):\n image = PILImage.open(img_path)\n result = query_image_structured(image, prompt, IndexCardEntry, model='qwen/qwen3-vl-4b')\n results.append((img_path.stem, result))\n\n\n\n\n\n# Display images with extracted data side-by-side\n# Two columns: left = image, right = extracted text\n\nfor i, (img_stem, result) in enumerate(results):\n fig, (ax_img, ax_text) = plt.subplots(1, 2, figsize=(16, 6), \n gridspec_kw={'width_ratios': [1, 1]})\n \n # Left: Display image\n img = plt.imread(images[i])\n ax_img.imshow(img)\n ax_img.axis('off')\n ax_img.set_title(f\"Card {i+1}: {img_stem}\", fontsize=14, fontweight='bold')\n \n # Right: Display extracted data as formatted text\n ax_text.axis('off')\n \n # Format the extracted data nicely\n text_lines = [\n \"Extracted Data:\",\n \"\",\n f\"Surname: {result.surname}\",\n f\"Forenames: {result.forenames or 'N/A'}\",\n f\"Epithet: {result.epithet or 'N/A'}\",\n f\"MS No: {result.ms_no}\",\n f\"Description: {result.description}\",\n f\"Folios: {result.folios}\",\n \"\",\n f\"Failed to Parse: {result.failed_to_parse}\",\n ]\n \n # Add notes if present\n if result.notes:\n text_lines.append(\"\")\n text_lines.append(\"Notes:\")\n # Wrap long notes\n import textwrap\n wrapped_notes = textwrap.fill(result.notes, width=60)\n text_lines.append(wrapped_notes)\n \n # Join and display\n formatted_text = \"\\n\".join(text_lines)\n ax_text.text(0.05, 0.95, formatted_text, \n transform=ax_text.transAxes,\n fontsize=11,\n verticalalignment='top',\n fontfamily='monospace',\n bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))\n \n plt.tight_layout()\n plt.show()\n\n\n---------------------------------------------------------------------------\nAttributeError Traceback (most recent call last)\nCell In[69], line 28\n 15 ax_text.axis('off')\n 17 # Format the extracted data nicely\n 18 text_lines = [\n 19 \"Extracted Data:\",\n 20 \"\",\n 21 f\"Surname: {result.surname}\",\n 22 f\"Forenames: {result.forenames or 'N/A'}\",\n 23 f\"Epithet: {result.epithet or 'N/A'}\",\n 24 f\"MS No: {result.ms_no}\",\n 25 f\"Description: {result.description}\",\n 26 f\"Folios: {result.folios}\",\n 27 \"\",\n---> 28 f\"Failed to Parse: {result.failed_to_parse}\",\n 29 ]\n 31 # Add notes if present\n 32 if result.notes:\n\nFile ~/Documents/nls-work/ai-patterns-for-glam/.venv/lib/python3.13/site-packages/pydantic/main.py:991, in BaseModel.__getattr__(self, item)\n 988 return super().__getattribute__(item) # Raises AttributeError if appropriate\n 989 else:\n 990 # this is the current error\n--> 991 raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')\n\nAttributeError: 'IndexCardEntry' object has no attribute 'failed_to_parse'\n\n\n\n\n\n\n\n\n\n\n\n4.6.1 1. Manual Ground Truth Evaluation\nThe Gold Standard: Manually annotate a sample of cards and compare.\nPros: - Most accurate measure of performance - Catches all types of errors - Builds training data for future improvements\nCons: - Time consuming - Requires expert annotators - Limited sample size\nBest for: Final validation, establishing baselines, understanding failure modes\n\n# TODO: Load manually annotated ground truth\n# Compare predictions to ground truth\n# Calculate field-level accuracy\n\n# Example metrics:\n# - Exact match accuracy per field\n# - Character error rate\n# - Common error patterns\n\n\n\n4.6.2 2. Cross-Model Evaluation (Model-as-Judge)\nThe Pragmatic Approach: Use a stronger/different model to evaluate outputs.\nPros: - Much faster than manual annotation - Can evaluate full dataset - Good for catching obvious errors\nCons: - Requires access to multiple models - May miss subtle errors - Judge model can be wrong too\nBest for: Large-scale quality monitoring, automated testing, identifying problem areas for manual review\n\n# TODO: Implement model-as-judge evaluation\n# - Extract with Model A (e.g., local Qwen)\n# - Show image + extraction to Model B (e.g., Claude/GPT-4)\n# - Ask Model B to rate accuracy and identify errors\n# - Aggregate results\n\n# Example judge prompt:\n# \"\"\"\n# Compare this extracted data to the index card image:\n# [extraction]\n# \n# For each field, rate accuracy:\n# - Correct: Field matches card exactly\n# - Minor error: Small typo or formatting difference\n# - Major error: Wrong information\n# - Missing: Field is on card but not extracted\n# \"\"\"\n\n\n\n4.6.3 3. Internal Consistency Checks\nThe Automated Approach: Use business rules and patterns to identify suspicious outputs.\nExamples: - Manuscript numbers should follow known patterns - Dates should be within expected ranges - Folio references have consistent formats - Certain fields should always be present\nPros: - Completely automated - Fast - can run on full dataset - No additional model costs\nCons: - Only catches specific error types - Requires domain knowledge to design rules - Can miss errors that follow valid patterns\nBest for: Flagging outliers for review, automated quality gates, monitoring production systems\n\n# TODO: Implement consistency checks\n\n# def validate_extraction(entry: IndexCardEntry) -> list[str]:\n# \"\"\"Run validation checks and return list of warnings.\"\"\"\n# warnings = []\n# \n# # Check MS number format\n# if not re.match(r'^\\d+', entry.ms_no):\n# warnings.append(f\"Unusual MS number format: {entry.ms_no}\")\n# \n# # Check for dates in expected range\n# dates = re.findall(r'\\d{4}', entry.description)\n# for date in dates:\n# if not (1500 <= int(date) <= 1950):\n# warnings.append(f\"Date outside expected range: {date}\")\n# \n# # Check folio format\n# if not re.match(r'^f+\\.?\\s*\\d+', entry.folios, re.IGNORECASE):\n# warnings.append(f\"Unusual folio format: {entry.folios}\")\n# \n# return warnings\n\n\n\n4.6.4 4. Confidence Scoring\nMany VLM APIs return confidence scores or logprobs. We can use these to identify uncertain extractions.\nPros: - No additional cost or models needed - Can prioritize review efforts - Helps establish quality thresholds\nCons: - Not all models/APIs provide confidence scores - High confidence doesn’t guarantee correctness - Requires calibration\nBest for: Prioritizing manual review, quality-based routing, understanding model uncertainty\n\n# TODO: If available, extract and analyze confidence scores\n# Plot distribution of confidence scores\n# Correlate confidence with manual evaluation results\n\n\n\n4.6.5 Combining Evaluation Approaches\nIn practice, a robust evaluation strategy uses multiple approaches:\n\nStart with manual ground truth on a small sample (~50-100 cards) to establish baseline accuracy\nUse consistency checks to automatically flag suspicious outputs\nApply model-as-judge on a larger sample to monitor quality\nPrioritize review using confidence scores or validation warnings\nContinuous monitoring as you process the full collection\n\nThis gives you both rigorous accuracy metrics and practical quality assurance at scale.",
|
| 203 |
"crumbs": [
|
| 204 |
"Structured Information Extraction",
|
| 205 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
|
|
| 122 |
"href": "patterns/structured-generation/vlm-structured-generation.html#classification",
|
| 123 |
"title": "3 Structured Information Extraction with Vision Language Models",
|
| 124 |
"section": "3.6 Classification",
|
| 125 |
+
"text": "3.6 Classification\n\nWe’ll define a fairly simple prompt that asks the VLM to decide if a page is one of three categories. We describe each of these categopries and then ask the model to only return one of these as the output. We’ll do this for ten examples and we’ll also log how long it’s taking.\n\nimport time\nfrom tqdm.auto import tqdm\n\nsample_size = 10\n\nsample = ds.take(sample_size)\n\nprompt = \"\"\"Classify this image into one of the following categories:\n\n1. **Index/Reference Card**: A library catalog or reference card\n\n2. **Manuscript Page**: A handwritten or historical document page\n\n3. **Other**: Any document that doesn't fit the above categories\n\nExamine the overall structure, layout, and content type to determine the classification. Focus on whether the document is a structured catalog/reference tool (Index Card) or a historical manuscript with continuous text (Manuscript Page).\n\nReturn only the category name: \"Index/Reference Card\", \"Manuscript Page\", or \"Other\"\n\"\"\"\n\nresults = []\n# Time the execution using standard Python\nstart_time = time.time()\nfor row in tqdm(sample):\n image = row['image']\n results.append(query_image(image, prompt))\nelapsed_time = time.time() - start_time\nprint(f\"Execution time: {elapsed_time:.2f} seconds\")\nrprint(results)\n\n\n\n\nExecution time: 100.05 seconds\n\n\n[\n 'Index/Reference Card',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page',\n 'Manuscript Page'\n]\n\n\n\nLet’s check the result that was predicted as “index/reference card”\n\nsample[0]['image']\n\n\n\n\n\n\n\n\nWe can extrapolate how long this would take for the full dataset\n\n# Calculate average time per image\navg_time_per_image = elapsed_time / sample_size\n\n# Project time for full dataset\ntotal_images = len(ds)\nprojected_time = avg_time_per_image * total_images\n\nprint(f\"Sample processing time: {elapsed_time:.2f} seconds ({elapsed_time/60:.2f} minutes)\")\nprint(f\"Average time per image: {avg_time_per_image:.2f} seconds\")\nprint(f\"Total images in dataset: {total_images}\")\nprint(f\"Projected time for full dataset: {projected_time/60:.2f} minutes ({projected_time/3600:.2f} hours)\")\n\nSample processing time: 100.05 seconds (1.67 minutes)\nAverage time per image: 10.01 seconds\nTotal images in dataset: 2734\nProjected time for full dataset: 455.91 minutes (7.60 hours)\n\n\n\n3.6.1 Classifying with structured labels\nIn the previous example, we relied on the model to return the label in the correct format. While this often works, it can sometimes lead to inconsistencies in the output. To address this, we can use Pydantic models to define a structured output format. This way, we can ensure that the output adheres to a specific schema.\nIn this example, we’ll define a Pydantic model for our classification task. The model will have a single field category which can take one of three literal values \"Index/Reference Card\", \"Manuscript Page\", or \"other\".\nWhat this means in practice is that the model will only be able to return one of these three values for the category field.\n\nfrom pydantic import BaseModel, Field\nfrom typing import Literal\n\nclass PageCategory(BaseModel):\n category: Literal[\"Index/Reference Card\", \"Manuscript Page\", \"other\"] = Field(\n ..., description=\"The category of the image\"\n )\n\nWhen using the OpenAI client we can specify this Pydantic model as the response_format when making the request. This tells the model to return the output in a format that can be parsed into the Pydantic model (the APIs for this are still evolving so may change slightly over time).\n\nbuffered = BytesIO()\nimage.save(buffered, format=\"JPEG\")\nimage_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')\ncompletion = client.beta.chat.completions.parse(\n model=\"qwen/qwen2.5-vl-7b\",\n messages=[\n {\n \"role\": \"user\",\n \"content\": [\n {\n \"type\": \"text\",\n \"text\": prompt,\n },\n {\n \"type\": \"image_url\",\n \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image_base64}\"},\n },\n ],\n },\n ],\n max_tokens=100,\n temperature=0.7,\n response_format=PageCategory,\n)\nrprint(completion)\nrprint(completion.choices[0].message.parsed)\n\nParsedChatCompletion[PageCategory](\n id='chatcmpl-v8bizojixwds0z7pg8j0th',\n choices=[\n ParsedChoice[PageCategory](\n finish_reason='stop',\n index=0,\n logprobs=None,\n message=ParsedChatCompletionMessage[PageCategory](\n content='{\"category\": \"Manuscript Page\"}',\n refusal=None,\n role='assistant',\n annotations=None,\n audio=None,\n function_call=None,\n tool_calls=None,\n parsed=PageCategory(category='Manuscript Page')\n )\n )\n ],\n created=1761588374,\n model='qwen/qwen2.5-vl-7b',\n object='chat.completion',\n service_tier=None,\n system_fingerprint='qwen/qwen2.5-vl-7b',\n usage=CompletionUsage(\n completion_tokens=10,\n prompt_tokens=142,\n total_tokens=152,\n completion_tokens_details=None,\n prompt_tokens_details=None\n ),\n stats={}\n)\n\n\n\nPageCategory(category='Manuscript Page')\n\n\n\n\nimage",
|
| 126 |
+
"crumbs": [
|
| 127 |
+
"Structured Information Extraction",
|
| 128 |
+
"<span class='chapter-number'>3</span> <span class='chapter-title'>Structured Information Extraction with Vision Language Models</span>"
|
| 129 |
+
]
|
| 130 |
+
},
|
| 131 |
+
{
|
| 132 |
+
"objectID": "patterns/structured-generation/vlm-structured-generation.html#beyond-classifying---extracting-structured-information",
|
| 133 |
+
"href": "patterns/structured-generation/vlm-structured-generation.html#beyond-classifying---extracting-structured-information",
|
| 134 |
+
"title": "3 Structured Information Extraction with Vision Language Models",
|
| 135 |
+
"section": "3.7 Beyond classifying - Extracting structured information",
|
| 136 |
+
"text": "3.7 Beyond classifying - Extracting structured information\nSo far we’ve focused on classifying images but what if we want to extract information from the images? Let’s take the first example from the dataset again.\n\nindex_image = ds[0]['image']\nindex_image\n\n\n\n\n\n\n\n\nIf we have an image like this we don’t just want to assign a label from it (we may do this as a first step) we actually want to extract the various fields from the card in a structured way. We can again use a Pydantic model to define the structure of the data we want to extract.\n\nfrom pydantic import BaseModel, Field\nfrom typing import Optional\n\n\nclass BritishLibraryReprographicCard(BaseModel):\n \"\"\"\n Pydantic model for extracting information from British Library Reference Division \n reprographic cards used to document manuscripts and other materials.\n \"\"\"\n \n department: str = Field(\n ..., \n description=\"The division that holds the material (e.g., 'MANUSCRIPTS')\"\n )\n \n shelfmark: str = Field(\n ..., \n description=\"The library's classification/location code (e.g., 'SLOANE 3972.C. (VOL 1)')\"\n )\n \n order: str = Field(\n ..., \n description=\"Order reference, typically starting with 'SCH NO' followed by numbers\"\n )\n \n author: Optional[str] = Field(\n None, \n description=\"Author name if present, null if blank or marked with diagonal line\"\n )\n \n title: str = Field(\n ..., \n description=\"The name of the work or manuscript\"\n )\n \n place_and_date_of_publication: Optional[str] = Field(\n None, \n description=\"Place and date of publication if present, null if blank\"\n )\n \n reduction: int = Field(\n ..., \n description=\"The reduction number shown at the bottom of the card\"\n )\n\nWe’ll now create a function to handle the querying process using this structured schema.\n\ndef query_image_structured(image, prompt, schema, model='qwen3-vl-2b-instruct-mlx'):\n \"\"\"\n Query VLM with an image and get structured output based on a Pydantic schema.\n \n Args:\n image: PIL Image or file path to the image\n prompt: Text prompt describing what to extract\n schema: Pydantic model class defining the expected output structure\n model: Model ID to use for the query\n \n Returns:\n Parsed Pydantic model instance with the extracted data\n \"\"\"\n # Convert image to base64\n if isinstance(image, PILImage):\n buffered = BytesIO()\n image.save(buffered, format=\"JPEG\")\n image_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')\n else:\n with open(image, \"rb\") as f:\n image_base64 = base64.b64encode(f.read()).decode('utf-8')\n \n # Query with structured output\n completion = client.beta.chat.completions.parse(\n model=model,\n messages=[{\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": prompt},\n {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image_base64}\"}}\n ]\n }],\n response_format=schema,\n temperature=0.3 # Lower temperature for more consistent extraction\n )\n \n # Return the parsed structured data\n return completion.choices[0].message.parsed\n\nWe also need to define a prompt that describes what information we want to extract from the card.\n\n# Example usage\nextraction_prompt = \"\"\"\nExtract the information from this British Library card into structured data (JSON format).\n\nRead each field on the card and extract the following information:\n- department: The division name (e.g., \"MANUSCRIPTS\")\n- shelfmark: The catalog number (e.g., \"SLOANE 3972.C. (VOL 1)\")\n- order: The SCH NO reference number\n- author: The author name, or null if blank\n- title: The full title of the work\n- place_and_date_of_publication: Publication info, or null if blank\n- reduction: The reduction number (as integer) at bottom of card\n\nReturn the exact text as shown on the card. For empty fields with diagonal lines or no text, use null.\n\"\"\"\nresult = query_image_structured(index_image, extraction_prompt, BritishLibraryReprographicCard)\nrprint(result)\n\nBritishLibraryReprographicCard(\n department='MANUSCRIPTS',\n shelfmark='SLOANE 3972.C. (VOL 1)',\n order='98876',\n author='HANS SLOANES',\n title='CATALOGUE OF SIR HANS SLOANES LIBRARY',\n place_and_date_of_publication=None,\n reduction=12\n)\n\n\n\n\nrprint(result)\n\nBritishLibraryReprographicCard(\n department='MANUSCRIPTS',\n shelfmark='SLOANE 3972.C. (VOL 1)',\n order='98876',\n author='HANS SLOANES',\n title='CATALOGUE OF SIR HANS SLOANES LIBRARY',\n place_and_date_of_publication=None,\n reduction=12\n)\n\n\n\n\nindex_image",
|
| 137 |
"crumbs": [
|
| 138 |
"Structured Information Extraction",
|
| 139 |
"<span class='chapter-number'>3</span> <span class='chapter-title'>Structured Information Extraction with Vision Language Models</span>"
|
|
|
|
| 144 |
"href": "patterns/structured-generation/advisor-index-cards.html",
|
| 145 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 146 |
"section": "",
|
| 147 |
+
"text": "4.1 Introduction\nThis chapter demonstrates a practical application of VLM-based structured extraction on a real-world GLAM digitization project: extracting structured metadata from historical index cards from the National Library of Scotland’s Advocate’s Library collection.\nUnlike the previous chapter which focused on explaining VLM concepts and setup, this chapter assumes you’re familiar with the basics and focuses on:",
|
| 148 |
"crumbs": [
|
| 149 |
"Structured Information Extraction",
|
| 150 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
|
|
| 166 |
"href": "patterns/structured-generation/advisor-index-cards.html#the-task-advisor-index-cards",
|
| 167 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 168 |
"section": "4.2 The Task: Advisor Index Cards",
|
| 169 |
+
"text": "4.2 The Task: Advisor Index Cards\nThe National Library of Scotland has a collection of historical index cards documenting manuscripts and correspondence. Each card follows a fairly consistent format:\n\nSurname: Family name\nForenames: Given names\nEpithet: Role, title, or occupation\nMS no: Manuscript reference number\nDescription: Document type and date\nFolios: Page references\n\nThe goal is to extract this structured information to enable: - Searchable digital catalog - Integration with library management systems - Research access to historical collections\n\n4.2.1 Example Cards\nLet’s look at a few sample cards from the collection:\n\nfrom pathlib import Path\nimport matplotlib.pyplot as plt\n\nimages = list(Path(\"../../assets/vllm-structured-generation/indexes/\").rglob(\"*.JPG\"))\nimages\n\n[PosixPath('../../assets/vllm-structured-generation/indexes/DSC00172.JPG'),\n PosixPath('../../assets/vllm-structured-generation/indexes/DSC00173.JPG'),\n PosixPath('../../assets/vllm-structured-generation/indexes/DSC00171.JPG'),\n PosixPath('../../assets/vllm-structured-generation/indexes/DSC00170.JPG'),\n PosixPath('../../assets/vllm-structured-generation/indexes/DSC00169.JPG'),\n PosixPath('../../assets/vllm-structured-generation/indexes/DSC00168.JPG')]\n\n\n\n# display a grid of images using matplotlib (len of images)\nnumber_of_images = len(images)\ncols = 3\nrows = (number_of_images + cols - 1) // cols\nfig, axs = plt.subplots(rows, cols, figsize=(15, 5 * rows))\nfor i, img_path in enumerate(images):\n img = plt.imread(img_path)\n ax = axs[i // cols, i % cols] if rows > 1 else axs[i % cols]\n ax.imshow(img)\n ax.axis('off')\n ax.set_title(img_path.stem)\nplt.tight_layout()\nplt.show()",
|
| 170 |
"crumbs": [
|
| 171 |
"Structured Information Extraction",
|
| 172 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
|
|
| 177 |
"href": "patterns/structured-generation/advisor-index-cards.html#schema-design",
|
| 178 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 179 |
"section": "4.3 Schema Design",
|
| 180 |
+
"text": "4.3 Schema Design\nWorking with the library curators, we designed a schema that matches their cataloging requirements. The schema is intentionally simple - complex schemas are harder for VLMs to extract reliably.\nThis schema is something we can iterate on later based on extraction quality but gives us a solid starting point.\n\nfrom pydantic import BaseModel, Field\nfrom typing import Optional\n\nclass IndexCardEntry(BaseModel):\n \"\"\"Schema for index card extraction matching curator specification\"\"\"\n \n surname: str = Field(..., description=\"Family name as written on card\")\n forenames: Optional[str] = Field(None, description=\"Given names\")\n epithet: Optional[str] = Field(None, description=\"Title, occupation, or role\")\n ms_no: str = Field(..., description=\"Manuscript number\")\n description: str = Field(..., description=\"Document description with date\")\n folios: str = Field(..., description=\"Folio reference\")\n \n failed_to_parse: bool = Field(\n False,\n description=\"Set to True if the card cannot be reliably extracted (illegible, damaged, etc.)\"\n )\n notes: Optional[str] = Field(\n None, \n description=\"Optional notes about the card: handwritten annotations, ambiguities, \"\n \"corrections, or reasons for failed parsing.\"\n )\n\n\nLet’s take a look at the schema definition we’ll use for extraction:\n\n# Display the schema\nfrom rich import print\nprint(IndexCardEntry.model_json_schema())\n\n{\n 'description': 'Schema for index card extraction matching curator specification',\n 'properties': {\n 'surname': {'description': 'Family name as written on card', 'title': 'Surname', 'type': 'string'},\n 'forenames': {\n 'anyOf': [{'type': 'string'}, {'type': 'null'}],\n 'default': None,\n 'description': 'Given names',\n 'title': 'Forenames'\n },\n 'epithet': {\n 'anyOf': [{'type': 'string'}, {'type': 'null'}],\n 'default': None,\n 'description': 'Title, occupation, or role',\n 'title': 'Epithet'\n },\n 'ms_no': {'description': 'Manuscript number', 'title': 'Ms No', 'type': 'string'},\n 'description': {\n 'description': 'Document description with date',\n 'title': 'Description',\n 'type': 'string'\n },\n 'folios': {'description': 'Folio reference', 'title': 'Folios', 'type': 'string'},\n 'failed_to_parse': {\n 'default': False,\n 'description': 'Set to True if the card cannot be reliably extracted (illegible, damaged, etc.)',\n 'title': 'Failed To Parse',\n 'type': 'boolean'\n },\n 'notes': {\n 'anyOf': [{'type': 'string'}, {'type': 'null'}],\n 'default': None,\n 'description': 'Optional notes about the card: handwritten annotations, ambiguities, corrections, or \nreasons for failed parsing.',\n 'title': 'Notes'\n }\n },\n 'required': ['surname', 'ms_no', 'description', 'folios'],\n 'title': 'IndexCardEntry',\n 'type': 'object'\n}",
|
| 181 |
"crumbs": [
|
| 182 |
"Structured Information Extraction",
|
| 183 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
|
|
| 188 |
"href": "patterns/structured-generation/advisor-index-cards.html#setup",
|
| 189 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 190 |
"section": "4.4 Setup",
|
| 191 |
+
"text": "4.4 Setup\nWe’ll reuse the VLM setup from the previous chapter. If you haven’t already, make sure LM Studio is running with a VLM loaded.\n\nfrom openai import OpenAI\nimport base64\nfrom io import BytesIO\nfrom PIL import Image as PILImage\n\n\nclient = OpenAI(\n base_url=\"http://localhost:1234/v1\",\n api_key=\"lm-studio\"\n)\n\n\nclient.models.list() \n\nSyncPage[Model](data=[Model(id='qwen3-vl-2b-instruct-mlx', created=None, object='model', owned_by='organization_owner'), Model(id='qwen/qwen3-vl-8b', created=None, object='model', owned_by='organization_owner'), Model(id='qwen/qwen3-vl-4b', created=None, object='model', owned_by='organization_owner'), Model(id='text-embedding-nomic-embed-text-v1.5', created=None, object='model', owned_by='organization_owner'), Model(id='qwen3-vl-30b-a3b-instruct', created=None, object='model', owned_by='organization_owner'), Model(id='qwen3-vl-30b-a3b-thinking@4bit', created=None, object='model', owned_by='organization_owner'), Model(id='qwen3-vl-30b-a3b-thinking@3bit', created=None, object='model', owned_by='organization_owner'), Model(id='qwen/qwen3-4b-thinking-2507', created=None, object='model', owned_by='organization_owner'), Model(id='google/gemma-3-12b', created=None, object='model', owned_by='organization_owner'), Model(id='google/gemma-3-4b', created=None, object='model', owned_by='organization_owner'), Model(id='qwen2-0.5b-instruct-fingreylit', created=None, object='model', owned_by='organization_owner'), Model(id='google/gemma-3n-e4b', created=None, object='model', owned_by='organization_owner'), Model(id='granite-vision-3.3-2b', created=None, object='model', owned_by='organization_owner'), Model(id='ibm/granite-4-h-tiny', created=None, object='model', owned_by='organization_owner'), Model(id='iconclass-vlm', created=None, object='model', owned_by='organization_owner'), Model(id='mlx-community/qwen2.5-vl-3b-instruct', created=None, object='model', owned_by='organization_owner'), Model(id='lmstudio-community/qwen2.5-vl-3b-instruct', created=None, object='model', owned_by='organization_owner'), Model(id='lfm2-vl-1.6b', created=None, object='model', owned_by='organization_owner'), Model(id='mimo-vl-7b-rl-2508@q4_k_s', created=None, object='model', owned_by='organization_owner'), Model(id='mimo-vl-7b-rl-2508@q8_0', created=None, object='model', owned_by='organization_owner'), Model(id='qwen3-30b-a3b-instruct-2507', created=None, object='model', owned_by='organization_owner'), Model(id='qwen3-4b-instruct-2507-mlx', created=None, object='model', owned_by='organization_owner'), Model(id='openai/gpt-oss-20b', created=None, object='model', owned_by='organization_owner'), Model(id='qwen/qwen2.5-vl-7b', created=None, object='model', owned_by='organization_owner'), Model(id='mistralai/mistral-small-3.2', created=None, object='model', owned_by='organization_owner'), Model(id='qwen3-30b-a3b-instruct-2507-mlx', created=None, object='model', owned_by='organization_owner'), Model(id='liquid/lfm2-1.2b', created=None, object='model', owned_by='organization_owner'), Model(id='smollm3-3b-mlx', created=None, object='model', owned_by='organization_owner'), Model(id='unsloth/smollm3-3b', created=None, object='model', owned_by='organization_owner'), Model(id='ggml-org/smollm3-3b', created=None, object='model', owned_by='organization_owner'), Model(id='mlx-community/smollm3-3b', created=None, object='model', owned_by='organization_owner')], object='list')\n\n\n\nfrom typing import Union\ndef query_image_structured(image: Union[PILImage.Image, str], prompt: str, schema: BaseModel, model='qwen/qwen3-vl-4b'):\n \"\"\"\n Query VLM with an image and get structured output based on a Pydantic schema.\n \n Args:\n image: PIL Image or file path to the image\n prompt: Text prompt describing what to extract\n schema: Pydantic model class defining the expected output structure\n model: Model ID to use for the query\n \n Returns:\n Parsed Pydantic model instance with the extracted data\n \"\"\"\n # Convert image to base64\n if isinstance(image, PILImage.Image):\n buffered = BytesIO()\n image.save(buffered, format=\"JPEG\")\n image_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')\n else:\n with open(image, \"rb\") as f:\n image_base64 = base64.b64encode(f.read()).decode('utf-8')\n \n # Query with structured output\n completion = client.beta.chat.completions.parse(\n model=model,\n messages=[{\n \"role\": \"user\",\n \"content\": [\n {\"type\": \"text\", \"text\": prompt},\n {\"type\": \"image_url\", \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image_base64}\"}}\n ]\n }],\n response_format=schema,\n temperature=0.3 # Lower temperature for more consistent extraction\n )\n \n # Return the parsed structured data\n return completion.choices[0].message.parsed",
|
| 192 |
"crumbs": [
|
| 193 |
"Structured Information Extraction",
|
| 194 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
|
|
| 199 |
"href": "patterns/structured-generation/advisor-index-cards.html#extraction-examples",
|
| 200 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 201 |
"section": "4.5 Extraction Examples",
|
| 202 |
+
"text": "4.5 Extraction Examples\nLet’s run extraction on several sample cards to see how the model performs.\n\nprompt = \"\"\"Extract structured information from this historical library index card and return it as JSON.\n\n This is an index card from the National Library of Scotland's Advocate's Library collection. Each card documents a person and associated manuscript references.\n\n Return a JSON object with these exact fields:\n\n {\n \"surname\": \"Family name exactly as typed (e.g., 'ABAD', 'ABARACA Y BOLEA')\",\n \"forenames\": \"Given names (e.g., 'Joseph', 'Thomas') or null if not present\",\n \"epithet\": \"Title, occupation, or role (e.g., 'Captain, Spanish Army') or null if not present\",\n \"ms_no\": \"Manuscript number exactly as written (e.g., '5538', '5529')\",\n \"description\": \"Document description with date (e.g., 'letter of (1783)', 'copy of petition of (ca. 1783)')\",\n \"folios\": \"Folio reference exactly as written (e.g., 'f.11', 'f.169')\",\n \"failed_to_parse\": false (or true if card is illegible/severely damaged),\n \"notes\": \"Optional notes about handwritten corrections, ambiguities, or parsing issues\"\n }\n\n Guidelines:\n - Extract text exactly as it appears - do not correct spelling or expand abbreviations\n - Preserve original punctuation and formatting\n - If a field is unclear but you can make a reasonable inference, extract it and note the ambiguity in \"notes\"\n - Only set \"failed_to_parse\" to true if you genuinely cannot extract the required fields\n - Use null for optional fields (forenames, epithet, notes) if they are not present or marked with a line\"\"\"\n\n\nimage = PILImage.open(images[0])\nimage \n\n\n\n\n\n\n\n\n\nfrom rich import print\nresult = query_image_structured(image, prompt, IndexCardEntry, model='qwen/qwen3-vl-4b')\n \n\n\nprint(result)\n\nIndexCardEntry(\n surname='ABBAATE',\n forenames='Itala',\n epithet='Daughter of the Physician',\n ms_no='2633',\n description='letter of (1878)',\n folios='f. 38',\n failed_to_parse=False,\n notes=\"Handwritten corrections and annotations present: 'Cairo' (instead of 'ABBAATE'), 'Cairo' (instead of \n'ABBAATE'), 'Physician' (instead of 'Physician'), '2633' (instead of '2633'), 'f. 38' (instead of 'f. 38'). Also, \n'Cairo' appears to be a scribbled correction or miswriting of 'ABBAATE'.\"\n)\n\n\n\n\n4.5.1 Comparing Extraction to Ground Truth\nLet’s compare a few extractions to the actual card content:\n\nfrom tqdm.auto import tqdm\n\nresults = []\nfor img_path in tqdm(images):\n image = PILImage.open(img_path)\n result = query_image_structured(image, prompt, IndexCardEntry, model='qwen/qwen3-vl-4b')\n results.append((img_path.stem, result))",
|
| 203 |
"crumbs": [
|
| 204 |
"Structured Information Extraction",
|
| 205 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|
|
|
|
| 210 |
"href": "patterns/structured-generation/advisor-index-cards.html#evaluation-strategies",
|
| 211 |
"title": "4 Practical Application: Advisor Index Card Extraction",
|
| 212 |
"section": "4.6 Evaluation Strategies",
|
| 213 |
+
"text": "4.6 Evaluation Strategies\nHow do we know if the extraction is working well? There are several approaches to evaluation, each with different tradeoffs.\n\n4.6.1 Looking at lots of samples\nIt sounds simple, but looking at a large number of random samples can give a good sense of overall quality. You can spot common errors and get a feel for how reliable the extraction is. You can quickly build intuition about what might be going wrong and where to focus improvement efforts. Realistically you will usually spend some time iterating on the prompt and schema at this stage. Looking at more than one example is important to avoid overfitting to a single case but you don’t immediately need to look at hundreds of examples or set up complex metrics or evaluations. This can come later.\n\nfor i, (img_stem, result) in enumerate(results):\n fig, (ax_img, ax_text) = plt.subplots(1, 2, figsize=(16, 6), \n gridspec_kw={'width_ratios': [1, 1]})\n\n # Left: Display image\n img = plt.imread(images[i])\n ax_img.imshow(img)\n ax_img.axis('off')\n ax_img.set_title(f\"Card {i+1}: {img_stem}\", fontsize=14, fontweight='bold')\n\n # Right: Display extracted data as formatted text\n ax_text.axis('off')\n\n # Format the extracted data nicely\n text_lines = [\n \"Extracted Data:\",\n \"\",\n f\"Surname: {result.surname}\",\n f\"Forenames: {result.forenames or 'N/A'}\",\n f\"Epithet: {result.epithet or 'N/A'}\",\n f\"MS No: {result.ms_no}\",\n f\"Description: {result.description}\",\n f\"Folios: {result.folios}\",\n \"\",\n f\"Failed to Parse: {result.failed_to_parse}\",\n ]\n\n # Add notes if present\n if result.notes:\n text_lines.extend((\"\", \"Notes:\"))\n # Wrap long notes\n import textwrap\n wrapped_notes = textwrap.fill(result.notes, width=60)\n text_lines.append(wrapped_notes)\n\n # Join and display\n formatted_text = \"\\n\".join(text_lines)\n ax_text.text(0.05, 0.95, formatted_text, \n transform=ax_text.transAxes,\n fontsize=11,\n verticalalignment='top',\n fontfamily='monospace',\n bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))\n\n plt.tight_layout()\n plt.show()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n4.6.1.1 What we learned from these samples\n\nIt seems that in these examples the notes field isn’t really adding much value and potentially it just adds noise.\nWhile the failed_to_parse flag sounds useful, we may want to rely on other approaches to identify failures since the model may not always set this flag correctly (and in this case we probably have some other ways to identify failures like looking for missing critical fields).\nOverall, we should prioritize extracting the most relevant information and avoid including fields that do not contribute to the understanding of the index card content. The simpler the schema the less for us to have to check and the fewer tokens the model has to generate. When we’re testing with small batches it doesn’t seem so important but when scaling to thousands of cards it can make a bigger difference.\n\n\nfrom pydantic import BaseModel, Field\nfrom typing import Optional\n\nclass IndexCardEntry(BaseModel):\n \"\"\"Schema for index card extraction matching curator specification\"\"\"\n \n surname: str = Field(..., description=\"Family name as written on card\")\n forenames: Optional[str] = Field(None, description=\"Given names\")\n epithet: Optional[str] = Field(None, description=\"Title, occupation, or role\")\n ms_no: str = Field(..., description=\"Manuscript number\")\n description: str = Field(..., description=\"Document description with date\")\n folios: str = Field(..., description=\"Folio reference\")\n \n\nprompt = \"\"\"Extract structured information from this historical library index card and return it as JSON.\n\n This is an index card from the National Library of Scotland's Advocate's Library collection. Each card documents a person and associated manuscript references.\n\n Return a JSON object with these exact fields:\n\n {\n \"surname\": \"Family name exactly as typed (e.g., 'ABAD', 'ABARACA Y BOLEA')\",\n \"forenames\": \"Given names (e.g., 'Joseph', 'Thomas') or null if not present\",\n \"epithet\": \"Title, occupation, or role (e.g., 'Captain, Spanish Army') or null if not present\",\n \"ms_no\": \"Manuscript number exactly as written (e.g., '5538', '5529')\",\n \"description\": \"Document description with date (e.g., 'letter of (1783)', 'copy of petition of (ca. 1783)')\",\n \"folios\": \"Folio reference exactly as written (e.g., 'f.11', 'f.169')\",\n }\n\n Guidelines:\n - Extract text exactly as it appears - do not correct spelling or expand abbreviations\n - Preserve original punctuation and formatting\n - Use null for optional fields (forenames, epithet, notes) if they are not present or marked with a line\"\"\"\n\n\nresults = []\nfor img_path in tqdm(images):\n image = PILImage.open(img_path)\n result = query_image_structured(image, prompt, IndexCardEntry, model='qwen/qwen3-vl-8b')\n results.append((img_path.stem, result))\n\n\n\n\n\n# Display images with extracted data side-by-side\n# Two columns: left = image, right = extracted text\n\nfor i, (img_stem, result) in enumerate(results):\n fig, (ax_img, ax_text) = plt.subplots(1, 2, figsize=(16, 6), \n gridspec_kw={'width_ratios': [1, 1]})\n \n # Left: Display image\n img = plt.imread(images[i])\n ax_img.imshow(img)\n ax_img.axis('off')\n ax_img.set_title(f\"Card {i+1}: {img_stem}\", fontsize=14, fontweight='bold')\n \n # Right: Display extracted data as formatted text\n ax_text.axis('off')\n \n # Format the extracted data nicely\n text_lines = [\n \"Extracted Data:\",\n \"\",\n f\"Surname: {result.surname}\",\n f\"Forenames: {result.forenames or 'N/A'}\",\n f\"Epithet: {result.epithet or 'N/A'}\",\n f\"MS No: {result.ms_no}\",\n f\"Description: {result.description}\",\n f\"Folios: {result.folios}\",\n \"\",\n ]\n # Join and display\n formatted_text = \"\\n\".join(text_lines)\n ax_text.text(0.05, 0.95, formatted_text, \n transform=ax_text.transAxes,\n fontsize=11,\n verticalalignment='top',\n fontfamily='monospace',\n bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))\n \n plt.tight_layout()\n plt.show()\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n4.6.2 1. Manual Ground Truth Evaluation\nThe Gold Standard: Manually annotate a sample of cards and compare.\nPros: - Most accurate measure of performance - Catches all types of errors - Builds training data for future improvements\nCons: - Time consuming - Requires expert annotators - Limited sample size\nBest for: Final validation, establishing baselines, understanding failure modes\n\n# TODO: Load manually annotated ground truth\n# Compare predictions to ground truth\n# Calculate field-level accuracy\n\n# Example metrics:\n# - Exact match accuracy per field\n# - Character error rate\n# - Common error patterns\n\n\n\n4.6.3 2. Cross-Model Evaluation (Model-as-Judge)\nThe Pragmatic Approach: Use a stronger/different model to evaluate outputs.\nPros: - Much faster than manual annotation - Can evaluate full dataset - Good for catching obvious errors\nCons: - Requires access to multiple models - May miss subtle errors - Judge model can be wrong too\nBest for: Large-scale quality monitoring, automated testing, identifying problem areas for manual review\n\n# TODO: Implement model-as-judge evaluation\n# - Extract with Model A (e.g., local Qwen)\n# - Show image + extraction to Model B (e.g., Claude/GPT-4)\n# - Ask Model B to rate accuracy and identify errors\n# - Aggregate results\n\n# Example judge prompt:\n# \"\"\"\n# Compare this extracted data to the index card image:\n# [extraction]\n# \n# For each field, rate accuracy:\n# - Correct: Field matches card exactly\n# - Minor error: Small typo or formatting difference\n# - Major error: Wrong information\n# - Missing: Field is on card but not extracted\n# \"\"\"\n\n\n\n4.6.4 3. Internal Consistency Checks\nThe Automated Approach: Use business rules and patterns to identify suspicious outputs.\nExamples: - Manuscript numbers should follow known patterns - Dates should be within expected ranges - Folio references have consistent formats - Certain fields should always be present\nPros: - Completely automated - Fast - can run on full dataset - No additional model costs\nCons: - Only catches specific error types - Requires domain knowledge to design rules - Can miss errors that follow valid patterns\nBest for: Flagging outliers for review, automated quality gates, monitoring production systems\n\n# TODO: Implement consistency checks\n\n# def validate_extraction(entry: IndexCardEntry) -> list[str]:\n# \"\"\"Run validation checks and return list of warnings.\"\"\"\n# warnings = []\n# \n# # Check MS number format\n# if not re.match(r'^\\d+', entry.ms_no):\n# warnings.append(f\"Unusual MS number format: {entry.ms_no}\")\n# \n# # Check for dates in expected range\n# dates = re.findall(r'\\d{4}', entry.description)\n# for date in dates:\n# if not (1500 <= int(date) <= 1950):\n# warnings.append(f\"Date outside expected range: {date}\")\n# \n# # Check folio format\n# if not re.match(r'^f+\\.?\\s*\\d+', entry.folios, re.IGNORECASE):\n# warnings.append(f\"Unusual folio format: {entry.folios}\")\n# \n# return warnings\n\n\n\n4.6.5 4. Confidence Scoring\nMany VLM APIs return confidence scores or logprobs. We can use these to identify uncertain extractions.\nPros: - No additional cost or models needed - Can prioritize review efforts - Helps establish quality thresholds\nCons: - Not all models/APIs provide confidence scores - High confidence doesn’t guarantee correctness - Requires calibration\nBest for: Prioritizing manual review, quality-based routing, understanding model uncertainty\n\n# TODO: If available, extract and analyze confidence scores\n# Plot distribution of confidence scores\n# Correlate confidence with manual evaluation results\n\n\n\n4.6.6 Combining Evaluation Approaches\nIn practice, a robust evaluation strategy uses multiple approaches:\n\nStart with manual ground truth on a small sample (~50-100 cards) to establish baseline accuracy\nUse consistency checks to automatically flag suspicious outputs\nApply model-as-judge on a larger sample to monitor quality\nPrioritize review using confidence scores or validation warnings\nContinuous monitoring as you process the full collection\n\nThis gives you both rigorous accuracy metrics and practical quality assurance at scale.",
|
| 214 |
"crumbs": [
|
| 215 |
"Structured Information Extraction",
|
| 216 |
"<span class='chapter-number'>4</span> <span class='chapter-title'>Practical Application: Advisor Index Card Extraction</span>"
|