YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

T2.1 β€” Compressed Crop Disease Classifier

AIMS KTT Hackathon Β· Tier 2 Β· AgriTech / Computer Vision / Quantization / Serving

A compact MobileNetV3-Small classifier for 5 maize/cassava/bean diseases, quantized to INT8 ONNX (< 10 MB), served through a FastAPI /predict endpoint with a Kinyarwanda + French USSD/SMS fallback for feature-phone farmers.


Submission artefacts

Artefact Location
Source repo this repository
Model (INT8 ONNX) model/model.onnx (also mirrored on Hugging Face β€” see link below)
Dataset regenerated by scripts/download_data.py (≀ 2 minutes on a laptop)
Service service/app.py + service/Dockerfile
Product artefact ussd_fallback.md
Process log process_log.md
Honor-code sign-off SIGNED.md
4-minute video see link at the bottom of this README
  • Model hub: https://huggingface.co/<your-handle>/t2.1-crop-disease-mbv3s-int8 (update before submission)
  • Video: https://youtu.be/<your-unlisted-id> (update before submission)

Results

Source-of-truth numbers live in model/metrics.json (produced by scripts/evaluate.py). Snapshot of the submitted run:

Split macro-F1 accuracy # images latency mean (CPU)
clean test 0.967 0.967 150 225 ms
field-noisy test 0.899 0.900 60 214 ms
drop (clean β†’ field) 6.7 pp β€” β€” β€”

Model file size on disk: model/model.onnx = 2.29 MB (budget was < 10 MB). FP32 β†’ INT8 shrinkage: 8.48 MB β†’ 2.29 MB (3.7Γ—). FP32↔INT8 top-1 parity on the calibration batch: 1.000.

Per-class F1 on the clean test split:

Class precision recall F1
bean_spot 0.964 0.900 0.931
cassava_mosaic 0.968 1.000 0.984
healthy 0.906 0.967 0.935
maize_blight 1.000 0.967 0.983
maize_rust 1.000 1.000 1.000

Latency is measured single-threaded on CPU (RTX 2050 host, CPUExecutionProvider) end-to-end including preprocessing; p95 is 375 ms on the clean split. On the target 2-vCPU district VM described in ussd_fallback.md latencies will sit in the same range, which is well within the SMS-relay time budget.


Reproducing the whole run in ≀ 2 commands

pip install -r requirements.txt
python scripts/run_all.py        # downloads data, trains, quantizes, evaluates, writes metrics.json

scripts/run_all.py is a thin wrapper that runs the four pipeline scripts in order. If you'd rather step through them:

python scripts/download_data.py --out data/mini_plant_set --per-class 300
python scripts/make_field_set.py --src data/mini_plant_set/test --out data/test_field --per-class 12
python scripts/train.py          --data data/mini_plant_set --out model --epochs 12
python scripts/quantize_export.py --ckpt model/best_fp32.pt --calib data/mini_plant_set/train --out model/model.onnx
python scripts/evaluate.py       --onnx model/model.onnx --clean data/mini_plant_set/test --field data/test_field

Colab free CPU tier is fine β€” the whole pipeline finishes in ~25 minutes because the dataset is small and MobileNetV3-Small is tiny.


Running the service

Native Python:

uvicorn service.app:app --host 0.0.0.0 --port 8000

Docker:

docker build -f service/Dockerfile -t crop-classifier .
docker run --rm -p 8000:8000 crop-classifier

Hit it (macOS / Linux / WSL / Git-Bash):

curl -X POST -F "image=@service/samples/maize_rust_1.jpg" http://localhost:8000/predict

On Windows PowerShell, use curl.exe explicitly β€” the bare curl alias maps to Invoke-WebRequest and handles -F differently:

curl.exe -X POST -F "image=@service/samples/maize_rust_1.jpg" http://localhost:8000/predict

A ready-to-run PowerShell companion lives at service/curl_examples.ps1 (bash version: service/curl_examples.sh).

Sample response (real output from the shipped model on service/samples/maize_rust_1.jpg):

{
  "label": "maize_rust",
  "confidence": 0.8955,
  "top3": [
    {"label": "maize_rust",     "confidence": 0.8955},
    {"label": "cassava_mosaic", "confidence": 0.0421},
    {"label": "healthy",        "confidence": 0.0233}
  ],
  "latency_ms": 37.4,
  "rationale": {
    "summary_en": "Dense small rust-coloured pustules consistent with Puccinia sorghi (common rust).",
    "summary_fr": "Petites pustules rouille nombreuses, caractΓ©ristiques de Puccinia sorghi (rouille commune).",
    "summary_rw": "Udutonga duto tw'ibara ry'uburinga bwinshi, ibimenyetso bya Puccinia sorghi.",
    "recommended_action": "Apply Mancozeb 75% WP at 2.5 g/L, 10–14 day interval. Remove volunteer maize nearby.",
    "low_confidence": false,
    "escalation_hint": null
  }
}

And a field-noisy example that triggers the low-confidence escalation (this is the pair we demo on camera):

curl -X POST -F "image=@service/samples/maize_blight_field_1.jpg" http://localhost:8000/predict
# -> confidence 0.497, low_confidence=true, escalation_hint prompts a second photo

PowerShell equivalent:

curl.exe -X POST -F "image=@service/samples/maize_blight_field_1.jpg" http://localhost:8000/predict

Set RATIONALE_MODE=gradcam (and keep model/best_fp32.pt around) to add a Grad-CAM heat-map as rationale.gradcam_png_b64. It's off by default so the shipped image stays small.


Repo map

.
β”œβ”€β”€ README.md                 <- this file
β”œβ”€β”€ SIGNED.md                 <- honor-code sign-off
β”œβ”€β”€ process_log.md            <- hour-by-hour timeline + LLM declarations
β”œβ”€β”€ ussd_fallback.md          <- Product & Business adaptation artefact
β”œβ”€β”€ requirements.txt          <- training + eval deps
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ download_data.py      <- builds mini_plant_set (HF + procedural fallback)
β”‚   β”œβ”€β”€ make_field_set.py     <- builds test_field (blur / JPEG / brightness)
β”‚   β”œβ”€β”€ train.py              <- MobileNetV2 fine-tuning
β”‚   β”œβ”€β”€ quantize_export.py    <- dynamic INT8 ONNX export (< 10 MB)
β”‚   β”œβ”€β”€ evaluate.py           <- macro-F1 on clean + field splits
β”‚   └── run_all.py            <- one-shot pipeline wrapper (≀ 2-command repro)
β”œβ”€β”€ service/
β”‚   β”œβ”€β”€ app.py                <- FastAPI /predict
β”‚   β”œβ”€β”€ Dockerfile            <- CPU-only image (~350 MB)
β”‚   β”œβ”€β”€ requirements-service.txt
β”‚   β”œβ”€β”€ curl_examples.sh     <- bash example calls
β”‚   β”œβ”€β”€ curl_examples.ps1    <- PowerShell example calls
β”‚   └── samples/              <- example images used in the demo
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ model.onnx            <- shipped INT8 artefact
β”‚   β”œβ”€β”€ best_fp32.pt          <- (git-ignored) FP32 checkpoint, for Grad-CAM
β”‚   β”œβ”€β”€ class_names.json
β”‚   β”œβ”€β”€ metrics.json          <- evaluation report
β”‚   β”œβ”€β”€ quant_report.json     <- quantization parity numbers
β”‚   └── train_metrics.json    <- training curves + per-class breakdown
β”œβ”€β”€ data/                     <- (git-ignored) regenerated in < 2 min
β”‚   β”œβ”€β”€ mini_plant_set/{train,val,test}/{class}/*.jpg
β”‚   └── test_field/{class}/*.jpg
└── T2.1_Compressed_Crop_Disease_Classifier.pdf   <- original brief (reference)

Technical choices, in one screen

  • Backbone. MobileNetV2 (width=1.0). I ran a three-way bake-off against the candidates in the brief and only MobileNetV2 survives both the size budget (< 10 MB) and the accuracy floor (β‰₯ 0.80 macro-F1) after INT8 quantisation. Full experiment record in model/experiments/README.md; short version: MobileNetV3-Small INT8 β†’ clean F1 0.393 (hard-swish breaks PTQ); EfficientNet-B0 INT8 β†’ clean F1 0.690 and FP32 is 15.3 MB (over-budget); MobileNetV2 INT8 β†’ clean F1 0.967 at 2.29 MB. ReLU6 is the single activation function in the candidate set that survives onnxruntime's INT8 rescalers cleanly.
  • Fine-tuning. Two-phase: 2 epochs with backbone frozen (head-only) to stabilise gradients, then 10 epochs full unfreeze at LRΓ—0.3. AdamW, cosine schedule, label smoothing 0.05.
  • Augmentation. Flip, 15Β° rotation, mild ColorJitter. Intentionally conservative β€” the test_field split already provides a hard robustness probe.
  • Quantization. ONNX Runtime static PTQ (QDQ format, per-channel INT8 weights and activations). Calibration on 128 random train images. Parity check: top-1 agreement β‰₯ 0.98 vs FP32 on the calibration batch.
  • Serving. onnxruntime inside FastAPI; no torch in the Docker image so the container weighs ~350 MB. Grad-CAM is a lazy, opt-in path.

Field robustness & the question "which augmentation next?"

Clean β†’ field drop is 6.7 pp (0.967 β†’ 0.899), comfortably under the 12 pp robustness-bonus threshold in the brief. Where it does cost us is bean_spot recall (0.90 β†’ 0.75): JPEG compression at q=50–65 blurs the small angular spots just enough that the model overruns onto healthy. Three of the four misses in the field-noisy bean_spot bucket go to healthy, which matches the clean test confusion pattern (the only bean_spot misses on clean test also went to healthy).

The single augmentation I would add next to close the gap is RandomJPEG compression (quality 40–80) during training β€” it is the exact distortion our test_field generator applies, and color-jitter alone does not teach the backbone to tolerate JPEG block artefacts. I deliberately did not add it to the submitted run so the reported drop is an honest measurement against a held-out robustness probe rather than a number I trained toward.

Grad-CAM rationale (stretch goal)

Set RATIONALE_MODE=gradcam before starting the service and the /predict response picks up an extra gradcam_png_b64 field β€” a base64-encoded PNG overlay showing where the model "looked" on the leaf. This is the asset an extension officer can show the farmer on a tablet when they need to justify the diagnosis visually.

RATIONALE_MODE=gradcam uvicorn service.app:app --host 0.0.0.0 --port 8000
# A sample Grad-CAM PNG lives at service/samples/_gradcam_maize_rust.png

Grad-CAM is off by default because enabling it requires torch inside the Docker image (β‰ˆ 1.1 GB vs the β‰ˆ 350 MB lean image). See the "hardest decision" note in process_log.md. The FP32 best_fp32.pt needed for Grad-CAM is not committed (gitignored because of its size); run python scripts/run_all.py once to rebuild it locally.


Honor code & LLM declarations

See SIGNED.md for the signed honor code and process_log.md for the full hour-by-hour timeline plus the sample prompts I sent the assistant and the one I discarded.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support