File size: 3,769 Bytes
5e3cd85
 
b2daca7
 
 
 
 
 
5e3cd85
b2daca7
 
 
16dbef9
8f0e566
b2daca7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16dbef9
b2daca7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16dbef9
b2daca7
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: apache-2.0
library_name: pytorch
tags:
  - wildfire
  - smoke-detection
  - object-detection
  - temporal
---

# Temporal Smoke Model (bbox-tube-temporal)

> **Latest release:** [`v0.2.0`](https://huggingface.co/pyronear/temporal-model/tree/v0.2.0) β€” pin this revision for reproducibility, or omit `revision=` to always get the latest. All releases: the **Files and versions** tab.

A temporal wildfire-**smoke** classifier for short sequences of camera frames. A
YOLO detector proposes boxes, boxes are linked across frames into temporal
**tubes**, each tube's image patches are classified by a DINOv2 ViT + transformer
head, and a logistic calibrator turns the tube logits into a calibrated
probability and a keep/discard decision.

This repo ships a single self-contained **`model.zip`**, versioned by HuggingFace
revision/tag (`v<version>`). Each release bundles everything needed to run:

| file | purpose |
|---|---|
| `manifest.yaml` | version + provenance (train git SHA, backbone, detector) |
| `yolo_weights.pt` | the companion YOLO detector |
| `classifier.ckpt` | the temporal ViT classifier |
| `config.yaml` | inference + decision config |
| `logistic_calibrator.json` | the calibrated decision head |

The model runs YOLO **itself** β€” you pass only raw frames, no detections.

## Usage

Install the inference package (`temporal_model.core`):

```bash
pip install "git+https://github.com/pyronear/temporal-model.git#subdirectory=core"
```

Download a versioned `model.zip` and run it on a **temporally ordered** sequence
of frames:

```python
from pathlib import Path

from huggingface_hub import hf_hub_download
from temporal_model.core.model import BboxTubeTemporalModel

# 1. Download a specific release (pin the revision).
model_zip = hf_hub_download("pyronear/temporal-model", "model.zip", revision="v0.2.0")

# 2. Temporally-ordered frames. Filenames carry timestamps
#    (<prefix>_<YYYY-MM-DDTHH-MM-SS>.jpg); the order is the time order.
frame_paths = sorted(Path("my_sequence").glob("*.jpg"))

# 3. Load (device=None β†’ auto cuda β†’ mps β†’ cpu) and predict.
#    hf_hub_download returns a str, so wrap it in Path().
model = BboxTubeTemporalModel.from_package(Path(model_zip), device=None)
out = model.predict_sequence(frame_paths)

print("is_smoke:           ", out.is_positive)
print("trigger_frame_index:", out.trigger_frame_index)  # 0-based; None if no smoke

# Per-tube breakdown (logits, calibrated probabilities, bboxes, decision).
kept = out.details.get("tubes", {}).get("kept", [])
print("kept tubes:         ", len(kept))
```

`predict_sequence(frame_paths)` returns a `TemporalModelOutput`:

- `is_positive: bool` β€” the smoke verdict.
- `trigger_frame_index: int | None` β€” 0-based frame where smoke first crosses the
  decision threshold (time-to-detection, in frames; `None` when no smoke).
- `details: dict` β€” per-tube logits, calibrated probabilities, bboxes, and the
  decision (`aggregation`, `threshold`, trigger tube).

## Served API (Docker)

The same model is also served as a FastAPI image with the `model.zip` baked in
(auto-uses the GPU with `--gpus all`):

```bash
docker run --gpus all -p 8000:8000 \
  -e TEMPORAL_API_S3_BUCKET=<frames-bucket> \
  -e TEMPORAL_API_S3_ENDPOINT_URL=<s3-endpoint> \
  pyronear/temporal-model-api:0.2.0
# POST /predict  {"frames": ["<s3-key>", ...]}      GET /health
```

## Provenance

Every `model.zip` manifest records how it was built β€” the training git SHA, the
classifier backbone (`vit_small_patch14_dinov2.lvd142m`), and the exact companion
detector (e.g. `pyronear/yolo11s_nimble-narwhal_v6.0.0`, verified by SHA-256). So
a served model always traces back to its detector + training code.

Source & pipeline: <https://github.com/pyronear/temporal-model>