Spaces:
Configuration error
Configuration error
| # Notebooks | |
| This directory holds Jupyter notebooks. Each notebook has a specific role in | |
| the project lifecycle, and the rules are different for each one. | |
| --- | |
| ## `01_ieee_inceptionv3_transformer.ipynb` β **FROZEN** | |
| This notebook is the **canonical research artefact** behind the IEEE | |
| publication [*AI Narratives: Bridging Visual Content and Linguistic | |
| Expression*](https://ieeexplore.ieee.org/document/10675203). It contains the | |
| exact training pipeline, hyperparameters, and inference code used to produce | |
| the BLEU ~24 score reported in the paper. | |
| ### Why is it frozen? | |
| Reproducibility of a published result is non-negotiable. If the notebook drifts | |
| from what the paper describes, anyone trying to reproduce the result β | |
| reviewers, future students, recruiters running the demo β will see numbers that | |
| don't match the paper. That breaks scientific trust. | |
| ### Rules | |
| 1. **Do not edit cells.** No improvements, no refactors, no comment fixes. | |
| 2. **Do not re-run cells with different seeds.** The committed outputs are | |
| reference outputs β they are stripped on commit by `nbstripout`, but the | |
| structure must stay identical. | |
| 3. **Improvements go into the modular package** at [`src/captioning/`](../src/captioning/), | |
| never back into this notebook. | |
| 4. **Parity is enforced in CI.** The `make freeze-paper-notebook` target | |
| computes a SHA-256 of this file and asserts it matches the locked hash in | |
| `.paper-notebook.sha256`. If you change a cell, CI fails until you either | |
| revert OR explicitly re-lock with `make lock-paper-notebook` AND update | |
| the paper / model card to reflect the new behaviour. | |
| ### When this rule changes | |
| The frozen state lifts when (and only when) we publish a v2 of the paper or | |
| explicitly mark a re-run in the changelog. Until then, treat this file like | |
| a museum exhibit. | |
| --- | |
| ## `02_dataset_eda.ipynb` β exploratory (Phase 1+) | |
| Dataset inspection. Caption length distributions, vocabulary coverage, image | |
| dimension histograms, class balance across COCO super-categories. This | |
| notebook **may** be edited freely; it's a working scratchpad, not a published | |
| artefact. | |
| ## `03_attention_visualization.ipynb` β exploratory (Phase 4+) | |
| Visualisations of decoder attention weights over image patches. Used to | |
| generate the figures in [`docs/results/`](../docs/results/). Outputs are | |
| stripped by `nbstripout` on commit; PNGs land in `docs/images/attention/` | |
| when explicitly exported. | |
| --- | |
| ## Conventions for new notebooks | |
| If you add a new notebook: | |
| - **Number it** (`04_*`, `05_*`) so the lifecycle order is obvious. | |
| - **Use prose Markdown cells** between code cells β a notebook reads like a | |
| short paper, not a Python script. | |
| - **Do not import from `notebooks/`** elsewhere in the codebase. Notebooks | |
| consume the `captioning` package; they never define library code. | |
| - **Strip outputs before committing.** `nbstripout` does this automatically | |
| if you ran `make install-hooks`. Without that hook, run `nbstripout | |
| notebooks/your_notebook.ipynb` manually before `git add`. | |
| --- | |
| ## Why notebooks at all? | |
| Notebooks are excellent for *exploration* β narrative, mixed media, iterative | |
| data wrangling. They are bad for *libraries* β no testing, no type-checking, | |
| no module reuse, hidden cell-execution-order bugs. The IEEE notebook stays | |
| because the paper points at it; everything else lives in `src/captioning/`. | |