Spaces:

apoorvrajdev
/

image-captioning-api

Configuration error

App Files Files Community

apoorvrajdev commited on 4 days ago

Commit

4e0b47e

1 Parent(s): c1ac860

docs(readme): update status block — Phase 2C deployed end-to-end

Browse files

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -42,7 +42,7 @@ short_description: InceptionV3 + Transformer image captioning inference API
 ## Status
-> 🚧 **Active build.** The research → modular conversion (Phase 1) is complete and the full inference stack (Phase 2A backend + 2B frontend) is operational end-to-end: a React 19 / Vite 8 SPA posts multipart uploads to `POST /v1/captions`, the FastAPI service returns a typed `CaptionResponse`, and the lifespan-managed `CaptionPredictor` is reused across every request with a warm graph and no per-call TF rebuilds. The IEEE notebook is preserved verbatim and protected by a SHA-256 freeze check. A four-stage parity audit ([`scripts/notebook_module_audit.py`](scripts/notebook_module_audit.py)) re-implements caption preprocessing, tokenizer vocabulary + encoding, image preprocessing, and the decoder forward pass inline and asserts the modular path is byte-identical (or `tf.allclose`-identical) to the notebook. Phase 1b (training stabilization) shipped beam search, the full corpus metric suite (BLEU-1..4 / CIDEr / METEOR / ROUGE-L), a benchmark runner that emits one machine-readable artefact set per evaluation, and a stabilized training config that gates label smoothing / cosine LR / warmup / dropout-free validation behind ablatable flags. Phase 2C (public deployment) is now in flight — workstream **D (backend test suite)** is complete: 12 new FastAPI route tests use a duck-typed fake predictor service to cover the full 200 / 400 / 413 / 415 / 422 / 503 contract end-to-end without loading TensorFlow, dropping the backend slice from a cold-start liability to a 0.3-second suite. The remaining workstreams (Dockerfile, HuggingFace Hub weights hosting, HF Spaces deploy, Vercel deploy, production CORS, GitHub Actions CI/CD, runbook) are sequenced in the [Roadmap](#-roadmap) below.
 > ⚠️ **Caption quality disclaimer.** The weights committed under [`models/v1.0.0/`](models/v1.0.0/) are **bootstrap dev artefacts** produced by [`scripts/bootstrap_dev_artifacts.py`](scripts/bootstrap_dev_artifacts.py): the architecture is wired correctly but every weight is randomly initialised. They exist to exercise the serving stack (lifespan, predictor wiring, multipart upload, frontend integration) before a real COCO-trained checkpoint is dropped in. Live captions therefore look like noise today — that is the *intended* state of the bootstrap path, not a regression. See [Current model quality status](#-current-model-quality-status) for what is being done about it.

 ## Status
+> ✅ **Deployed.** Phase 2C (public deployment) is complete. The research → modular conversion (Phase 1) and the full inference stack (Phase 2A backend + 2B frontend) ship as a live, publicly reachable system: a React 19 / Vite 8 SPA at [`image-captioning-system.vercel.app`](https://image-captioning-system.vercel.app) posts multipart uploads to `POST /v1/captions` against a Dockerised FastAPI service running on a HuggingFace Space at [`apoorvrajdev-image-captioning-api.hf.space`](https://apoorvrajdev-image-captioning-api.hf.space), which pulls its versioned weights from [`apoorvrajdev/captioning-inceptionv3-transformer`](https://huggingface.co/apoorvrajdev/captioning-inceptionv3-transformer) on the Hub at lifespan startup via `snapshot_download`. The lifespan-managed `CaptionPredictor` is reused across every request with a warm graph and no per-call TF rebuilds. The IEEE notebook is preserved verbatim and protected by a SHA-256 freeze check, and a four-stage parity audit ([`scripts/notebook_module_audit.py`](scripts/notebook_module_audit.py)) re-implements caption preprocessing, tokenizer vocabulary + encoding, image preprocessing, and the decoder forward pass inline and asserts the modular path is byte-identical (or `tf.allclose`-identical) to the notebook. Phase 1b (training stabilization) shipped beam search, the full corpus metric suite (BLEU-1..4 / CIDEr / METEOR / ROUGE-L), a benchmark runner that emits one machine-readable artefact set per evaluation, and a stabilized training config that gates label smoothing / cosine LR / warmup / dropout-free validation behind ablatable flags. Phase 2C shipped a hardened backend test suite (12 route tests covering the full 200 / 400 / 413 / 415 / 422 / 503 contract via a duck-typed fake predictor, full slice runs in 0.3 s), a multi-stage Dockerfile, Hub-versioned weight loading with an injectable downloader for offline testing, explicit production CORS wired through Space variables, a four-job GitHub Actions CI pipeline (ruff + mypy, pytest matrix on 3.10/3.11/3.12, notebook SHA-256 freeze, frontend lint + build) plus a chained `deploy-backend.yml` that pushes `main` to the Space remote only after CI is green, and a full deployment runbook at [`docs/PHASE_2C_DEPLOYMENT_RUNBOOK.md`](docs/PHASE_2C_DEPLOYMENT_RUNBOOK.md). Next up: Phase 3 (multimodal baselines) — see [Roadmap](#-roadmap).
 > ⚠️ **Caption quality disclaimer.** The weights committed under [`models/v1.0.0/`](models/v1.0.0/) are **bootstrap dev artefacts** produced by [`scripts/bootstrap_dev_artifacts.py`](scripts/bootstrap_dev_artifacts.py): the architecture is wired correctly but every weight is randomly initialised. They exist to exercise the serving stack (lifespan, predictor wiring, multipart upload, frontend integration) before a real COCO-trained checkpoint is dropped in. Live captions therefore look like noise today — that is the *intended* state of the bootstrap path, not a regression. See [Current model quality status](#-current-model-quality-status) for what is being done about it.