---
license: mit
tags:
  - image-classification
  - quality-assessment
  - codec-corruption
  - mobilenet
library_name: pytorch
pipeline_tag: image-classification
---

# codec-corruption-classifier

Binary image classifier that predicts whether a video frame contains severe **codec corruption** — block tearing, frame freezes, or other compression-artifact damage typically seen in low-bandwidth WiFi video streams (e.g. DJI Tello drone telemetry).

Trained on hand-labelled frames from indoor drone-mapping footage. Intended as a preprocessing filter for downstream SfM / 3D reconstruction pipelines, where a single severely-corrupted frame can pollute feature matching.

## Architecture

- Backbone: `torchvision.models.mobilenet_v3_small` (ImageNet-pretrained, IMAGENET1K_V1)
- Head: replace the final `nn.Linear` in `classifier` with `nn.Linear(in_features, 1)`
- Output: single logit; apply `torch.sigmoid` for P(corrupted)
- Suggested threshold: `0.5`
- Params: 1.5M

## Preprocessing

Frames are letterboxed (preserve aspect, pad with black) to 224×224, then normalized with ImageNet statistics.

```python
from PIL import Image

def letterbox(img: Image.Image, size: int = 224) -> Image.Image:
    w, h = img.size
    scale = size / max(w, h)
    new_w, new_h = int(w * scale), int(h * scale)
    img = img.resize((new_w, new_h), Image.BILINEAR)
    padded = Image.new("RGB", (size, size), (0, 0, 0))
    padded.paste(img, ((size - new_w) // 2, (size - new_h) // 2))
    return padded
```

## Usage

```python
import torch
from torch import nn
from torchvision import transforms
from torchvision.models import mobilenet_v3_small
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from PIL import Image

weights_path = hf_hub_download(
    repo_id="callum-sh/codec-corruption-classifier",
    filename="model.safetensors",
)
state = load_file(weights_path)

model = mobilenet_v3_small(weights=None)
in_features = model.classifier[-1].in_features
model.classifier[-1] = nn.Linear(in_features, 1)
model.load_state_dict(state)
model.eval()

tx = transforms.Compose([
    transforms.Lambda(lambda im: letterbox(im, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

img = Image.open("frame.jpg").convert("RGB")
with torch.no_grad():
    logit = model(tx(img).unsqueeze(0))
    p_corrupted = torch.sigmoid(logit).item()
print(f"P(corrupted) = {p_corrupted:.3f}")
```

## Intended use

Filter out frames before structure-from-motion. A frame with `P(corrupted) > 0.5` should be excluded from the SfM input set.

Not intended as a general-purpose image-quality predictor — it specifically targets *codec* artifacts, not blur, exposure, or motion noise.