Spaces:
Configuration error
Phase 0 β Bootstrap (decision log)
Phase 0 establishes the engineering scaffolding the rest of the project will stand on. Nothing here changes the model; everything here changes how the repo looks and behaves to the next person who clones it (including recruiters and CI runners).
What this phase delivers
| Artefact | Purpose |
|---|---|
notebooks/01_ieee_inceptionv3_transformer.ipynb |
Renamed from image-captionin-using-dl.ipynb via git mv to preserve history. Now the canonical, frozen IEEE artefact. |
notebooks/README.md |
Documents the frozen-notebook policy and conventions for any new notebooks. |
pyproject.toml |
Single source of truth for the captioning Python package, dependency groups, and tool config (ruff/mypy/pytest/coverage). |
requirements.txt |
Pinned runtime deps, used directly by Docker and CI (mirrors [project.dependencies]). |
requirements-dev.txt |
Pinned dev deps (lint, type-check, test, hooks). |
requirements-eval.txt |
Pinned metric deps, kept separate to avoid bloating the serving image. |
.python-version |
Pins Python 3.10 for pyenv users. |
.env.example |
Schema for pydantic-settings-loaded env vars. |
.pre-commit-config.yaml |
Hooks: ruff, mypy, nbstripout, prettier (frontend), gitleaks. |
Makefile |
Discoverable command index (make help). |
LICENSE |
MIT license, attribution to original author. |
.gitignore |
Production-grade exclusions, organised by purpose with explanatory comments. |
docs/restructure-plan.md |
Public-facing engineering plan for Phases 0β4. |
Decisions and reasoning
1. Why src/ layout over flat layout?
A flat layout (captioning/ at repo root) lets test code accidentally import
from the working tree instead of the installed package. That hides bugs that
would only surface in production, where the tree layout is gone. The src/
layout forces every test, every script, and every import to go through the
installed package β exactly the path users will follow. This is the layout
the Python Packaging Authority recommends,
and it's what production Python codebases (FastAPI, Pydantic, HTTPX) use.
2. Why pyproject.toml AND requirements.txt?
They serve different audiences:
pyproject.tomlis the source of truth for the package β its name, version, abstract dependency ranges, optional extras, and tool configuration. When youpip install -e .[dev], this is what pip reads.requirements.txtis the concretely pinned snapshot β used by Docker builds, CI runners, and anyone who wantspip install -r requirements.txtwithout cloning the source. It's regenerable frompyproject.tomlviapip-compile, but committing it explicitly makes installs deterministic and diffable.
Phase 5+ will switch to pip-compile for automated regeneration; for now,
manual mirroring is simpler and beginner-readable.
3. Why pin tensorflow-cpu==2.15.0 so hard?
Two independent reasons stack:
tensorflow-cpu(nottensorflow): the GPU build pulls ~600 MB of CUDA libraries that are useless on CPU-only HuggingFace Spaces. Splitting the wheel keeps the serving image well under 1.5 GB.- 2.15 specifically: TF 2.16 swapped to Keras 3 by default. The IEEE
notebook uses
tf.keras.layers.TextVectorizationwith the Keras 2 save/load API. Upgrading silently changes vocab serialisation, which silently changes BLEU. Pinning is the difference between reproducible-published-result and reproducibility theatre.
When Phase 5+ migrates to a modern multimodal backbone, this pin will move in a deliberate, tested step β not by accident.
4. Why Ruff over Black + isort + flake8?
Ruff replaces all three with one tool that runs ~100x faster, reads config
from a single section in pyproject.toml, and ships its own formatter
(ruff format) that is byte-identical to Black's output. One install, one
config, one cache. Recruiters reading the repo see the modern Python tool;
CI runs faster; make format is one command, not three.
5. Why nbstripout is non-negotiable in pre-commit
Notebook outputs include base64-encoded images, full DataFrames, and
sometimes credentials printed by accident. Committed notebook diffs without
output stripping are unreadable (+aaaaaaaaaa[base64]+aaaaaβ¦) and
occasionally leak data. nbstripout removes all output cells on commit,
keeping notebook history clean and reviewable.
6. Why include a Makefile on a Windows project?
Three reasons:
- CI runs on Linux β every CI job uses the same Make targets, so the commands you run locally match what CI runs.
- Discoverability β
make helpis one command that prints every high-level operation with a one-line description. A new contributor (or recruiter cloning the repo) sees the entire workflow in one screen. - Tooling availability β Make is a 5-second install on Windows
(
winget install GnuWin32.Make, Git Bash, or WSL). PowerShell users who skip Make can still read the Makefile and run the underlying commands directly.
7. Why a freeze-paper-notebook Make target?
The IEEE paper points reviewers at the notebook. If the notebook drifts from
what the paper describes, reviewers running it will see numbers that don't
match the paper β and that's a scientific integrity issue, not a software
issue. The target hashes the notebook and asserts it matches a locked
SHA-256. Phase 4 wires this into CI as a required check on main.
8. Why split optional deps into [hf], [eval], [mlflow], [dev]?
The slim production image (backend:latest) does NOT need transformers,
torch, pycocoevalcap, or MLflow. Bundling them adds ~1.5 GB of dependencies
the production code never imports. Extras let pip install -e ".[hf]" add
the HuggingFace baselines for the Phase 3 comparison demo, while
pip install -r requirements.txt keeps the production install lean.
9. Why MIT license?
The IEEE paper is published under IEEE's standard terms; the code is covered separately. MIT is the most permissive widely recognised license β it lets recruiters, students, and other researchers freely fork, learn from, and extend the code. For a recruiter-grade portfolio project, permissive licensing signals "I want this work to be useful," which is the right tone.
10. Why folder name configs/ (plural), not config/ (singular)?
config/ was the empty folder shipped with the template. The plural form
configs/ is the convention in modern Python ML projects (FastAPI's own
example apps, Hydra projects, the official transformers repo) because
it holds multiple files (one per environment, model variant, or run).
Phase 1 creates configs/ with content; the empty config/ folder will
be removed in the Phase 1 commit that introduces the YAML files.
What this phase deliberately does NOT do
- No code is moved out of the notebook yet. That's Phase 1, behind a parity validation gate.
- No
src/captioning/modules are created. Empty__init__.pyfiles would just be churn; Phase 1 will create them with real code. - No Dockerfile or docker-compose.yml. They depend on
backend/app/existing; both arrive in Phase 1. - No GitHub Actions workflows. They live in Phase 2, after there is Python code to lint and type-check.
- No README rewrite. The current README accurately describes the research; the demo-link rewrite happens in Phase 2 once a live URL exists.
This restraint is deliberate. Each phase ships a coherent slice of value; running ahead would create half-built features and vague commits.
Local setup checklist for the developer
After pulling this commit, on a fresh dev box:
# 1. Create a Python 3.10 virtual environment.
python -m venv .venv
.venv\Scripts\activate # PowerShell
# source .venv/bin/activate # Linux/macOS
# 2. Install dev dependencies + the package (editable).
make install-dev
# Or, without Make:
# pip install -r requirements-dev.txt -r requirements-eval.txt
# pip install -e ".[hf,mlflow]"
# 3. Register pre-commit hooks.
make install-hooks
# Or: pre-commit install
# 4. (Optional) Lock the paper notebook's hash, so CI can enforce parity.
make lock-paper-notebook
# 5. Verify everything works.
make pre-commit # Run all hooks against all files
make test # No tests yet β exits cleanly with "no tests collected"
The first make install-dev will take a few minutes (TensorFlow is large).
Subsequent runs hit the wheel cache and complete in seconds.