MobileViT-S Garbage Classifier
Binary classification model for filtering objectively-bad frames (black, blurry, uniform-color, low-detail) in anime video preprocessing pipelines.
Model Details
- Architecture: MobileViT-S
- Parameters: 4.94M
- Model Size: 20MB
- Input Size: 256×256
- Classes: [quality, garbage]
Performance
Without threshold (0.5):
- Accuracy: 93.46%
- Precision: 92.24%
- Recall: 95.47%
- F1-Score: 93.83%
With optimal threshold (0.7115):
- Accuracy: 93.62%
- Precision: 93.92%
- Recall: 93.82%
- F1-Score: 93.87%
Usage
import torch
import timm
from torchvision import transforms
from PIL import Image
# Load model
model = timm.create_model('mobilevit_s', num_classes=2, pretrained=False)
model.load_state_dict(torch.load('pytorch_model.bin'))
model.eval()
# Prepare image
transform = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
img = transform(Image.open('frame.webp').convert('RGB')).unsqueeze(0).cuda()
# Predict
with torch.no_grad():
logits = model(img)
probs = torch.softmax(logits, dim=1)
garbage_prob = probs[0, 0].item() # Class 0 = garbage
# Decision
is_garbage = garbage_prob > 0.7115 # Use optimal threshold
Training Data
- Total frames: 12,440
- Training: 10,574 frames
- Validation: 1,866 frames (895 garbage, 971 quality)
- Labeling: Verified via reverse-engineered frame matching
Garbage Detection
Filters frames with:
- Solid black/white/uniform color (33%)
- No edge patterns (33%)
- Low detail content (16%)
- Extreme outliers (15%)
Threshold Recommendations
- Default (0.5): Good starting point, slightly higher recall
- Optimal (0.7115): Best F1-score, balanced precision/recall
- High precision (0.75-0.80): Reduce false positives
- High recall (0.60-0.65): Catch more garbage, accept more false positives
License
MIT
- Downloads last month
- 18