Anime Frame Interesting Classifier (CNN v2.0)
Model Details
Architecture: EfficientNetV2-S (CNN)
Framework: PyTorch
Input Size: 224x224 RGB images
Output: Binary classification (Boring/Interesting)
Performance
Evaluated on v2.0 test set (433 frames):
- F1 Score: 95.15%
- Accuracy: 95.15%
- Precision: 95.16%
- Recall: 95.15%
Training data: 3,999 frames (4,432 total with test holdout)
Intended Use
What it does: Classifies anime frames as either "interesting" (depicting meaningful character/scene details) or "boring" (back-of-head shots, non-descript backgrounds, montages).
When to use:
- Filtering extracted anime keyframes for quality
- Pre-scoring large frame datasets before manual review
- Combined with transformer model for ensemble voting (higher confidence)
When NOT to use:
- Real-world photos or non-anime content
- Frames smaller than 224x224 (resize required)
- Critical applications requiring >99% precision without human review
Labels
- Class 0 (Boring): Frames lacking interesting visual details or character focus
- Class 1 (Interesting): Frames with clear character/scene details suitable for downstream tasks
Model Size & Speed
- Model Size: 78 MB (SafeTensors format)
- Inference Speed: ~20ms per image on GPU
- VRAM Required: ~2 GB (including activations)
Training Data Composition
- 900 manually curated frames (hand-labeled)
- 1,655 frames filtered via dual-agreement with garbage classifier
- 1,877 frames from curated anime site scraper
- Total: 4,432 frames (90% train, 10% test holdout)
All frames are 224x224 RGB anime screenshots.
How to Use
Recommended: HuggingFace Transformers (SafeTensors)
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
# Load model (automatically uses SafeTensors)
processor = AutoImageProcessor.from_pretrained(
"hf_models/anime-frame-interesting-classifier-cnn-v2"
)
model = AutoModelForImageClassification.from_pretrained(
"hf_models/anime-frame-interesting-classifier-cnn-v2",
trust_remote_code=False # Safe: SafeTensors prevents code execution
)
image = Image.open('frame.png')
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item()
print(f"Prediction: {'Interesting' if prediction == 1 else 'Boring'}")
Alternative: PyTorch with SafeTensors
from torchvision import models
import torch
from safetensors.torch import load_file
# Load model using SafeTensors (secure, no pickle deserialization)
model = models.efficientnet_v2_s()
model.classifier[1] = torch.nn.Linear(model.classifier[1].in_features, 2)
model.load_state_dict(load_file('model.safetensors'))
model.eval()
# Inference
import torchvision.transforms as transforms
from PIL import Image
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
image = Image.open('frame.png')
x = transform(image).unsqueeze(0)
with torch.no_grad():
output = model(x)
logits = output
prediction = logits.argmax(1).item()
confidence = logits.softmax(1)[0][prediction].item()
print(f"Prediction: {'Interesting' if prediction == 1 else 'Boring'}")
print(f"Confidence: {confidence:.2%}")
Security Note: This model uses SafeTensors format (not pickle). This prevents arbitrary code execution during model loading. For maximum safety, use HuggingFace Transformers with trust_remote_code=False.
Comparison with Transformer Model
See anime-frame-interesting-classifier-vit-v2 for transformer alternative:
| Metric | CNN (This Model) | ViT | Use Case |
|---|---|---|---|
| F1 | 95.15% | 94.92% | CNN: general, ViT: ensemble |
| Speed | Faster | Slower | CNN preferred for speed |
| Size | 20 MB | 20 MB | Similar footprint |
| Ensemble | Better precision with ViT voting | Better recall with CNN voting | Use together |
For maximum confidence, use ensemble voting: classify with both models and flag disagreements for manual review.
Limitations
- Anime-only: Trained exclusively on anime content, not tested on other media
- Dataset bias: Training data skewed toward popular anime styles (may underperform on niche art styles)
- Resolution: Trained on 224x224 frames; extreme aspect ratios may need preprocessing
- Edge cases: Minimal training on hard-to-classify frames (90-95% confidence borderline cases)
Training Details
- Dataset: v2.0 (4,432 frames)
- Train/Test Split: 90/10 (3,999 train, 433 test)
- Epochs: 20
- Batch Size: 64
- Optimizer: AdamW (lr=1e-4)
- Loss: CrossEntropyLoss
- Augmentation: None (data quality sufficient at this scale)
Version History
- v2.0 (current): 95.15% F1, retrained on expanded 4,432-frame dataset
- v1.0: 86% F1, 900-frame dataset (legacy, deprecated)
Citation
If you use this model, please reference:
- Dataset: Anime Frame Interesting v2.0 (4,432 curated frames)
- Architecture: EfficientNetV2-S
- Training: PyTorch, 2026
Future Improvements
- Collect 1,000+ edge-case frames for hard-negatives mining
- Ensemble with ViT for higher confidence decisions
- Fine-tune on downstream style classification tasks
- Downloads last month
- 2