docs: model card — overview + usage (load model.zip, predict_sequence)
Browse files
README.md
CHANGED
|
@@ -1,3 +1,97 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- wildfire
|
| 6 |
+
- smoke-detection
|
| 7 |
+
- object-detection
|
| 8 |
+
- temporal
|
| 9 |
---
|
| 10 |
+
|
| 11 |
+
# Temporal Smoke Model (bbox-tube-temporal)
|
| 12 |
+
|
| 13 |
+
A temporal wildfire-**smoke** classifier for short sequences of camera frames. A
|
| 14 |
+
YOLO detector proposes boxes, boxes are linked across frames into temporal
|
| 15 |
+
**tubes**, each tube's image patches are classified by a DINOv2 ViT + transformer
|
| 16 |
+
head, and a logistic calibrator turns the tube logits into a calibrated
|
| 17 |
+
probability and a keep/discard decision.
|
| 18 |
+
|
| 19 |
+
This repo ships a single self-contained **`model.zip`**, versioned by HuggingFace
|
| 20 |
+
revision/tag (`v<version>`). Each release bundles everything needed to run:
|
| 21 |
+
|
| 22 |
+
| file | purpose |
|
| 23 |
+
|---|---|
|
| 24 |
+
| `manifest.yaml` | version + provenance (train git SHA, backbone, detector) |
|
| 25 |
+
| `yolo_weights.pt` | the companion YOLO detector |
|
| 26 |
+
| `classifier.ckpt` | the temporal ViT classifier |
|
| 27 |
+
| `config.yaml` | inference + decision config |
|
| 28 |
+
| `logistic_calibrator.json` | the calibrated decision head |
|
| 29 |
+
|
| 30 |
+
The model runs YOLO **itself** — you pass only raw frames, no detections.
|
| 31 |
+
|
| 32 |
+
## Usage
|
| 33 |
+
|
| 34 |
+
Install the inference package (`temporal_model.core`):
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
pip install "git+https://github.com/pyronear/temporal-model.git#subdirectory=core"
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
Download a versioned `model.zip` and run it on a **temporally ordered** sequence
|
| 41 |
+
of frames:
|
| 42 |
+
|
| 43 |
+
```python
|
| 44 |
+
from pathlib import Path
|
| 45 |
+
|
| 46 |
+
from huggingface_hub import hf_hub_download
|
| 47 |
+
from temporal_model.core.model import BboxTubeTemporalModel
|
| 48 |
+
|
| 49 |
+
# 1. Download a specific release (pin the revision).
|
| 50 |
+
model_zip = hf_hub_download("pyronear/temporal-model", "model.zip", revision="v0.1.0")
|
| 51 |
+
|
| 52 |
+
# 2. Temporally-ordered frames. Filenames carry timestamps
|
| 53 |
+
# (<prefix>_<YYYY-MM-DDTHH-MM-SS>.jpg); the order is the time order.
|
| 54 |
+
frame_paths = sorted(Path("my_sequence").glob("*.jpg"))
|
| 55 |
+
|
| 56 |
+
# 3. Load (device=None → auto cuda → mps → cpu) and predict.
|
| 57 |
+
# hf_hub_download returns a str, so wrap it in Path().
|
| 58 |
+
model = BboxTubeTemporalModel.from_package(Path(model_zip), device=None)
|
| 59 |
+
out = model.predict_sequence(frame_paths)
|
| 60 |
+
|
| 61 |
+
print("is_smoke: ", out.is_positive)
|
| 62 |
+
print("trigger_frame_index:", out.trigger_frame_index) # 0-based; None if no smoke
|
| 63 |
+
|
| 64 |
+
# Per-tube breakdown (logits, calibrated probabilities, bboxes, decision).
|
| 65 |
+
kept = out.details.get("tubes", {}).get("kept", [])
|
| 66 |
+
print("kept tubes: ", len(kept))
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
`predict_sequence(frame_paths)` returns a `TemporalModelOutput`:
|
| 70 |
+
|
| 71 |
+
- `is_positive: bool` — the smoke verdict.
|
| 72 |
+
- `trigger_frame_index: int | None` — 0-based frame where smoke first crosses the
|
| 73 |
+
decision threshold (time-to-detection, in frames; `None` when no smoke).
|
| 74 |
+
- `details: dict` — per-tube logits, calibrated probabilities, bboxes, and the
|
| 75 |
+
decision (`aggregation`, `threshold`, trigger tube).
|
| 76 |
+
|
| 77 |
+
## Served API (Docker)
|
| 78 |
+
|
| 79 |
+
The same model is also served as a FastAPI image with the `model.zip` baked in
|
| 80 |
+
(auto-uses the GPU with `--gpus all`):
|
| 81 |
+
|
| 82 |
+
```bash
|
| 83 |
+
docker run --gpus all -p 8000:8000 \
|
| 84 |
+
-e TEMPORAL_API_S3_BUCKET=<frames-bucket> \
|
| 85 |
+
-e TEMPORAL_API_S3_ENDPOINT_URL=<s3-endpoint> \
|
| 86 |
+
pyronear/temporal-model-api:0.1.0
|
| 87 |
+
# POST /predict {"frames": ["<s3-key>", ...]} GET /health
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
## Provenance
|
| 91 |
+
|
| 92 |
+
Every `model.zip` manifest records how it was built — the training git SHA, the
|
| 93 |
+
classifier backbone (`vit_small_patch14_dinov2.lvd142m`), and the exact companion
|
| 94 |
+
detector (e.g. `pyronear/yolo11s_nimble-narwhal_v6.0.0`, verified by SHA-256). So
|
| 95 |
+
a served model always traces back to its detector + training code.
|
| 96 |
+
|
| 97 |
+
Source & pipeline: <https://github.com/pyronear/temporal-model>
|