| --- |
| name: exam-sheet |
| description: > |
| Extract structured data from photos of exam sheets and return it via the |
| submit_exam tool call. The pipeline renders the data into LaTeX |
| deterministically — you never write LaTeX yourself. |
| allowed-tools: Read, Write, Edit, Bash, Glob, MultiTool |
| --- |
| |
| # Data Extraction Skill |
|
|
| You receive photos of exam sheets (typically Moroccan lycée math exams but |
| could be any structured document with questions and point values). |
|
|
| Your job: **extract the content** into a structured JSON object and call the |
| `submit_exam` tool. The pipeline takes care of all LaTeX formatting. |
|
|
| ## What to extract |
|
|
| ### header |
|
|
| | Field | What goes there | Example | |
| |---|---|---| |
| | `top_left` | Page number or identifier | `"1/3"`, `"Page 4"` | |
| | `school` | School / ministry / institution. Use `\n` for line breaks. **NO Arabic characters** — transliterate to French. | `"Ministère de l'éducation Nationale\nGSCP CASA"` | |
| | `exam_info` | Exam type, date. Use `\n` for line breaks. | `"Bac-blanc --Maths\n18-05-2018"` | |
| | `right` | Duration, code, symbol, etc. | `"Durée :3h"` | |
|
|
| ### intro_page (OPTIONAL, only when the FIRST photo is a cover page) |
| |
| A cover page is a standalone first page that contains **no exam questions** |
| — just the exam title, a matière/niveau/durée/coefficient table, a bullet |
| list describing each exercise, and usually a calculator-usage notice. |
| |
| **Only fill `intro_page` when all of these are true:** |
| - It is the **first** photo (`image_index == 0`). |
| - The photo has **no numbered questions** (`1)`, `2)`, `a)`...). |
| - The photo is clearly introductory / administrative metadata. |
|
|
| Leave `intro_page` unset (null/omitted) for any exam whose first photo |
| already starts with questions. |
|
|
| | Field | Type | Description | |
| |---|---|---| |
| | `title` | string | Main heading (e.g. `"DEVOIR SURVEILLÉ 3"`). `""` if none. | |
| | `subtitle` | string | Optional secondary heading. `""` if none. | |
| | `info_rows` | array | Key/value pairs from the administrative table. Each item: `{"label": "Matière", "value": "Mathématiques"}`. Empty array if none. | |
| | `bullets` | array | One item per bullet on the cover page. Each item: `{"description": "Le problème se rapporte à l'analyse", "bareme": "10,75pts"}`. The renderer draws the dotted line and parentheses — do NOT include them in `description`. Leave `bareme` as `""` if the bullet has no point value. Keep inline `$math$` in `description` if present. | |
| | `footer` | string | Closing notice (e.g. `"L'usage de la calculatrice est autorisé"`). `""` if none. | |
|
|
| **Note on `title`/`subtitle`**: the renderer shares the main header bar (`school`, `exam_info`, …) with the cover page, so the cover does NOT re-display the exam type. Leave `title` and `subtitle` empty unless the cover has genuinely different text that would otherwise be lost. |
|
|
| ### sections (array) |
|
|
| Each exercise, partie, or problème is one section, **in document order**. |
|
|
| | Field | Type | Description | |
| |---|---|---| |
| | `title` | string | Exactly as printed: `"Exercice1"`, `"Partie II :"`, `"Problème"` | |
| | `bareme` | string | **Per-exercise** barème, e.g. `"3points"`, `"11points"`. Leave `""` if barème is per-question instead. | |
| | `intro` | string | Optional setup paragraph that appears between the title and the first question. Raw text with inline `$math$`. `""` if none. | |
| | `rows` | array | One entry per question / sub-question / sub-part header / figure, in order. | |
|
|
| ### rows (array inside each section) |
|
|
| | Field | Type | Required | Description | |
| |---|---|---|---| |
| | `bareme` | string | no | Per-question barème: `"0,5"`, `"0,75"`, `"0,25"`. Empty or omitted if this row has no individual barème. | |
| | `content` | string | yes | The question text, with inline `$math$`, question numbering (`1) a)`), sub-part headers (`\textbf{I-} Soit $g$...`). The renderer wraps this in `\textit{}` automatically — do NOT add `\textit{}` yourself. | |
| | `figure_id` | string | no | If this row is a **figure placeholder** (blank rectangle for hand-drawing), set this to the figure's id (e.g. `"fig1"`). The `content` field is ignored. | |
| | `figure_width_cm` | number | no | Width of the blank box in cm. Required if `figure_id` is set. Typically 5-8 cm. | |
|
|
| ### figures (array) |
|
|
| Same as before — one entry per figure on the page: |
|
|
| | Field | Type | Description | |
| |---|---|---| |
| | `id` | string | Short unique name matching the `figure_id` in a row (e.g. `"fig1"`) | |
| | `image_index` | integer | 0-based index into the uploaded photos | |
| | `bbox` | [x1,y1,x2,y2] | Fractional bounding box in the source photo. Used for sizing the blank rectangle. Be precise about the width/height ratio. | |
|
|
| ## Critical rules |
|
|
| 1. **NO LaTeX document structure.** You don't write `\documentclass`, `\begin{document}`, longtable code, or any preamble. You return structured JSON via the tool call. |
| 2. **Faithfulness.** Transcribe every question, every sub-question, every barème value exactly as printed. Don't skip, summarize, or rephrase. |
| 3. **Arabic → French.** Transliterate all Arabic text to French equivalents. Common translations: |
|
|
| | Arabic | French | |
| |---|---| |
| | الامتحان الوطني الموحد للبكالوريا | Examen National Unifié du Baccalauréat | |
| | الدورة العادية 2025 | Session Normale 2025 | |
| | مادة الرياضيات | Mathématiques | |
| | مسلك العلوم الفيزيائية | Filière Sciences Physiques | |
| | (خيار فرنسية) | (Option Français) | |
| | الصفحة / صفحة | Page | |
| | الموضوع | Le Sujet | |
|
|
| 4. **Math notation.** Use `$...$` for inline math. Keep the original notation: `$\vec{u}(-2;2;0)$` with semicolons (Moroccan convention), `$\lim\limits_{x\to 0^+}$`, etc. |
| 5. **Question numbering in content.** Include the numbering as part of the content string: `"1) a) Vérifier que..."`, `"b) Montrer que..."`. For sub-part headers: `"\\textbf{I-} Soit $g$ la fonction..."`. |
| 6. **Per-exercise vs per-question barème.** Look at the original: |
| - If ONE number appears next to the exercise title (e.g. "3points" on the left margin at the title level) → per-exercise. Set `section.bareme = "3points"`, leave all `row.bareme` empty. |
| - If EACH question has its own decimal (e.g. "0,5" next to 1)a), "0,75" next to 2)a)) → per-question. Set `section.bareme = ""`, set `row.bareme` on each question row. |
| 7. **Figures.** If there's a graph, curve, geometric figure, or any visual: |
| - Add a row with `figure_id` and `figure_width_cm` at the right position in the flow. |
| - Add a matching entry in `figures[]` with the bbox for sizing. |
| - The pipeline generates a **blank white rectangle** with a border. The end user draws the figure by hand on the printout. |
| 8. **Figures = EXISTING visual content ONLY.** A figure is something VISUALLY DRAWN on the page (a graph, a geometric diagram, a plotted curve). An instruction like "Tracer la courbe (C)", "Construire le triangle", "Dresser le tableau de variation", "Recopier la courbe" is NOT a figure — it's a QUESTION asking the student to draw something. Do NOT add a `figure_id` row or a `figures` entry for drawing instructions. Only add figures for content that is already drawn/printed on the source page. |
| 9. **Don't hallucinate.** If something is illegible, note it as `[illisible]` in the content field. |
|
|