YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
T2.1 β Compressed Crop Disease Classifier
AIMS KTT Hackathon Β· Tier 2 Β· AgriTech / Computer Vision / Quantization / Serving
A compact MobileNetV3-Small classifier for 5 maize/cassava/bean diseases, quantized to INT8 ONNX (< 10 MB), served through a FastAPI
/predictendpoint with a Kinyarwanda + French USSD/SMS fallback for feature-phone farmers.
Submission artefacts
| Artefact | Location |
|---|---|
| Source repo | this repository |
| Model (INT8 ONNX) | model/model.onnx (also mirrored on Hugging Face β see link below) |
| Dataset | regenerated by scripts/download_data.py (β€ 2 minutes on a laptop) |
| Service | service/app.py + service/Dockerfile |
| Product artefact | ussd_fallback.md |
| Process log | process_log.md |
| Honor-code sign-off | SIGNED.md |
| 4-minute video | see link at the bottom of this README |
- Model hub:
https://huggingface.co/<your-handle>/t2.1-crop-disease-mbv3s-int8(update before submission) - Video:
https://youtu.be/<your-unlisted-id>(update before submission)
Results
Source-of-truth numbers live in model/metrics.json (produced by
scripts/evaluate.py). Snapshot of the submitted run:
| Split | macro-F1 | accuracy | # images | latency mean (CPU) |
|---|---|---|---|---|
| clean test | 0.967 | 0.967 | 150 | 225 ms |
| field-noisy test | 0.899 | 0.900 | 60 | 214 ms |
| drop (clean β field) | 6.7 pp | β | β | β |
Model file size on disk: model/model.onnx = 2.29 MB (budget was < 10 MB).
FP32 β INT8 shrinkage: 8.48 MB β 2.29 MB (3.7Γ). FP32βINT8 top-1 parity on the
calibration batch: 1.000.
Per-class F1 on the clean test split:
| Class | precision | recall | F1 |
|---|---|---|---|
bean_spot |
0.964 | 0.900 | 0.931 |
cassava_mosaic |
0.968 | 1.000 | 0.984 |
healthy |
0.906 | 0.967 | 0.935 |
maize_blight |
1.000 | 0.967 | 0.983 |
maize_rust |
1.000 | 1.000 | 1.000 |
Latency is measured single-threaded on CPU (RTX 2050 host, CPUExecutionProvider)
end-to-end including preprocessing; p95 is 375 ms on the clean split. On the
target 2-vCPU district VM described in ussd_fallback.md latencies will sit in
the same range, which is well within the SMS-relay time budget.
Reproducing the whole run in β€ 2 commands
pip install -r requirements.txt
python scripts/run_all.py # downloads data, trains, quantizes, evaluates, writes metrics.json
scripts/run_all.py is a thin wrapper that runs the four pipeline scripts
in order. If you'd rather step through them:
python scripts/download_data.py --out data/mini_plant_set --per-class 300
python scripts/make_field_set.py --src data/mini_plant_set/test --out data/test_field --per-class 12
python scripts/train.py --data data/mini_plant_set --out model --epochs 12
python scripts/quantize_export.py --ckpt model/best_fp32.pt --calib data/mini_plant_set/train --out model/model.onnx
python scripts/evaluate.py --onnx model/model.onnx --clean data/mini_plant_set/test --field data/test_field
Colab free CPU tier is fine β the whole pipeline finishes in ~25 minutes because the dataset is small and MobileNetV3-Small is tiny.
Running the service
Native Python:
uvicorn service.app:app --host 0.0.0.0 --port 8000
Docker:
docker build -f service/Dockerfile -t crop-classifier .
docker run --rm -p 8000:8000 crop-classifier
Hit it (macOS / Linux / WSL / Git-Bash):
curl -X POST -F "image=@service/samples/maize_rust_1.jpg" http://localhost:8000/predict
On Windows PowerShell, use curl.exe explicitly β the bare curl alias
maps to Invoke-WebRequest and handles -F differently:
curl.exe -X POST -F "image=@service/samples/maize_rust_1.jpg" http://localhost:8000/predict
A ready-to-run PowerShell companion lives at service/curl_examples.ps1
(bash version: service/curl_examples.sh).
Sample response (real output from the shipped model on
service/samples/maize_rust_1.jpg):
{
"label": "maize_rust",
"confidence": 0.8955,
"top3": [
{"label": "maize_rust", "confidence": 0.8955},
{"label": "cassava_mosaic", "confidence": 0.0421},
{"label": "healthy", "confidence": 0.0233}
],
"latency_ms": 37.4,
"rationale": {
"summary_en": "Dense small rust-coloured pustules consistent with Puccinia sorghi (common rust).",
"summary_fr": "Petites pustules rouille nombreuses, caractΓ©ristiques de Puccinia sorghi (rouille commune).",
"summary_rw": "Udutonga duto tw'ibara ry'uburinga bwinshi, ibimenyetso bya Puccinia sorghi.",
"recommended_action": "Apply Mancozeb 75% WP at 2.5 g/L, 10β14 day interval. Remove volunteer maize nearby.",
"low_confidence": false,
"escalation_hint": null
}
}
And a field-noisy example that triggers the low-confidence escalation (this is the pair we demo on camera):
curl -X POST -F "image=@service/samples/maize_blight_field_1.jpg" http://localhost:8000/predict
# -> confidence 0.497, low_confidence=true, escalation_hint prompts a second photo
PowerShell equivalent:
curl.exe -X POST -F "image=@service/samples/maize_blight_field_1.jpg" http://localhost:8000/predict
Set RATIONALE_MODE=gradcam (and keep model/best_fp32.pt around) to add a
Grad-CAM heat-map as rationale.gradcam_png_b64. It's off by default so
the shipped image stays small.
Repo map
.
βββ README.md <- this file
βββ SIGNED.md <- honor-code sign-off
βββ process_log.md <- hour-by-hour timeline + LLM declarations
βββ ussd_fallback.md <- Product & Business adaptation artefact
βββ requirements.txt <- training + eval deps
βββ scripts/
β βββ download_data.py <- builds mini_plant_set (HF + procedural fallback)
β βββ make_field_set.py <- builds test_field (blur / JPEG / brightness)
β βββ train.py <- MobileNetV2 fine-tuning
β βββ quantize_export.py <- dynamic INT8 ONNX export (< 10 MB)
β βββ evaluate.py <- macro-F1 on clean + field splits
β βββ run_all.py <- one-shot pipeline wrapper (β€ 2-command repro)
βββ service/
β βββ app.py <- FastAPI /predict
β βββ Dockerfile <- CPU-only image (~350 MB)
β βββ requirements-service.txt
β βββ curl_examples.sh <- bash example calls
β βββ curl_examples.ps1 <- PowerShell example calls
β βββ samples/ <- example images used in the demo
βββ model/
β βββ model.onnx <- shipped INT8 artefact
β βββ best_fp32.pt <- (git-ignored) FP32 checkpoint, for Grad-CAM
β βββ class_names.json
β βββ metrics.json <- evaluation report
β βββ quant_report.json <- quantization parity numbers
β βββ train_metrics.json <- training curves + per-class breakdown
βββ data/ <- (git-ignored) regenerated in < 2 min
β βββ mini_plant_set/{train,val,test}/{class}/*.jpg
β βββ test_field/{class}/*.jpg
βββ T2.1_Compressed_Crop_Disease_Classifier.pdf <- original brief (reference)
Technical choices, in one screen
- Backbone. MobileNetV2 (width=1.0). I ran a three-way bake-off
against the candidates in the brief and only MobileNetV2 survives both
the size budget (< 10 MB) and the accuracy floor (β₯ 0.80 macro-F1)
after INT8 quantisation. Full experiment record in
model/experiments/README.md; short version: MobileNetV3-Small INT8 β clean F1 0.393 (hard-swish breaks PTQ); EfficientNet-B0 INT8 β clean F1 0.690 and FP32 is 15.3 MB (over-budget); MobileNetV2 INT8 β clean F1 0.967 at 2.29 MB. ReLU6 is the single activation function in the candidate set that survives onnxruntime's INT8 rescalers cleanly. - Fine-tuning. Two-phase: 2 epochs with backbone frozen (head-only) to stabilise gradients, then 10 epochs full unfreeze at LRΓ0.3. AdamW, cosine schedule, label smoothing 0.05.
- Augmentation. Flip, 15Β° rotation, mild ColorJitter. Intentionally
conservative β the
test_fieldsplit already provides a hard robustness probe. - Quantization. ONNX Runtime static PTQ (QDQ format, per-channel INT8 weights and activations). Calibration on 128 random train images. Parity check: top-1 agreement β₯ 0.98 vs FP32 on the calibration batch.
- Serving. onnxruntime inside FastAPI; no torch in the Docker image so the container weighs ~350 MB. Grad-CAM is a lazy, opt-in path.
Field robustness & the question "which augmentation next?"
Clean β field drop is 6.7 pp (0.967 β 0.899), comfortably under the
12 pp robustness-bonus threshold in the brief. Where it does cost us is
bean_spot recall (0.90 β 0.75): JPEG compression at q=50β65 blurs the
small angular spots just enough that the model overruns onto healthy.
Three of the four misses in the field-noisy bean_spot bucket go to
healthy, which matches the clean test confusion pattern (the only
bean_spot misses on clean test also went to healthy).
The single augmentation I would add next to close the gap is
RandomJPEG compression (quality 40β80) during training β it is the
exact distortion our test_field generator applies, and color-jitter
alone does not teach the backbone to tolerate JPEG block artefacts. I
deliberately did not add it to the submitted run so the reported drop
is an honest measurement against a held-out robustness probe rather than
a number I trained toward.
Grad-CAM rationale (stretch goal)
Set RATIONALE_MODE=gradcam before starting the service and the
/predict response picks up an extra gradcam_png_b64 field β a
base64-encoded PNG overlay showing where the model "looked" on the leaf.
This is the asset an extension officer can show the farmer on a tablet
when they need to justify the diagnosis visually.
RATIONALE_MODE=gradcam uvicorn service.app:app --host 0.0.0.0 --port 8000
# A sample Grad-CAM PNG lives at service/samples/_gradcam_maize_rust.png
Grad-CAM is off by default because enabling it requires torch inside
the Docker image (β 1.1 GB vs the β 350 MB lean image). See the "hardest
decision" note in process_log.md. The FP32 best_fp32.pt needed for
Grad-CAM is not committed (gitignored because of its size); run
python scripts/run_all.py once to rebuild it locally.
Honor code & LLM declarations
See SIGNED.md for the signed honor code and process_log.md for the
full hour-by-hour timeline plus the sample prompts I sent the assistant
and the one I discarded.