Anime Frame Interesting Classifier (CNN v2.0)

Model Details

Architecture: EfficientNetV2-S (CNN)
Framework: PyTorch
Input Size: 224x224 RGB images
Output: Binary classification (Boring/Interesting)

Performance

Evaluated on v2.0 test set (433 frames):

F1 Score: 95.15%
Accuracy: 95.15%
Precision: 95.16%
Recall: 95.15%

Training data: 3,999 frames (4,432 total with test holdout)

Intended Use

What it does: Classifies anime frames as either "interesting" (depicting meaningful character/scene details) or "boring" (back-of-head shots, non-descript backgrounds, montages).

When to use:

Filtering extracted anime keyframes for quality
Pre-scoring large frame datasets before manual review
Combined with transformer model for ensemble voting (higher confidence)

When NOT to use:

Real-world photos or non-anime content
Frames smaller than 224x224 (resize required)
Critical applications requiring >99% precision without human review

Labels

Class 0 (Boring): Frames lacking interesting visual details or character focus
Class 1 (Interesting): Frames with clear character/scene details suitable for downstream tasks

Model Size & Speed

Model Size: 78 MB (SafeTensors format)
Inference Speed: ~20ms per image on GPU
VRAM Required: ~2 GB (including activations)

Training Data Composition

900 manually curated frames (hand-labeled)
1,655 frames filtered via dual-agreement with garbage classifier
1,877 frames from curated anime site scraper
Total: 4,432 frames (90% train, 10% test holdout)

All frames are 224x224 RGB anime screenshots.

How to Use

Recommended: HuggingFace Transformers (SafeTensors)

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image

# Load model (automatically uses SafeTensors)
processor = AutoImageProcessor.from_pretrained(
    "hf_models/anime-frame-interesting-classifier-cnn-v2"
)
model = AutoModelForImageClassification.from_pretrained(
    "hf_models/anime-frame-interesting-classifier-cnn-v2",
    trust_remote_code=False  # Safe: SafeTensors prevents code execution
)

image = Image.open('frame.png')
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item()

print(f"Prediction: {'Interesting' if prediction == 1 else 'Boring'}")

Alternative: PyTorch with SafeTensors

from torchvision import models
import torch
from safetensors.torch import load_file

# Load model using SafeTensors (secure, no pickle deserialization)
model = models.efficientnet_v2_s()
model.classifier[1] = torch.nn.Linear(model.classifier[1].in_features, 2)
model.load_state_dict(load_file('model.safetensors'))
model.eval()

# Inference
import torchvision.transforms as transforms
from PIL import Image

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225])
])

image = Image.open('frame.png')
x = transform(image).unsqueeze(0)

with torch.no_grad():
    output = model(x)
    logits = output
    prediction = logits.argmax(1).item()
    confidence = logits.softmax(1)[0][prediction].item()

print(f"Prediction: {'Interesting' if prediction == 1 else 'Boring'}")
print(f"Confidence: {confidence:.2%}")

Security Note: This model uses SafeTensors format (not pickle). This prevents arbitrary code execution during model loading. For maximum safety, use HuggingFace Transformers with trust_remote_code=False.

Comparison with Transformer Model

See anime-frame-interesting-classifier-vit-v2 for transformer alternative:

Metric	CNN (This Model)	ViT	Use Case
F1	95.15%	94.92%	CNN: general, ViT: ensemble
Speed	Faster	Slower	CNN preferred for speed
Size	20 MB	20 MB	Similar footprint
Ensemble	Better precision with ViT voting	Better recall with CNN voting	Use together

For maximum confidence, use ensemble voting: classify with both models and flag disagreements for manual review.

Limitations

Anime-only: Trained exclusively on anime content, not tested on other media
Dataset bias: Training data skewed toward popular anime styles (may underperform on niche art styles)
Resolution: Trained on 224x224 frames; extreme aspect ratios may need preprocessing
Edge cases: Minimal training on hard-to-classify frames (90-95% confidence borderline cases)

Training Details

Dataset: v2.0 (4,432 frames)
Train/Test Split: 90/10 (3,999 train, 433 test)
Epochs: 20
Batch Size: 64
Optimizer: AdamW (lr=1e-4)
Loss: CrossEntropyLoss
Augmentation: None (data quality sufficient at this scale)

Version History

v2.0 (current): 95.15% F1, retrained on expanded 4,432-frame dataset
v1.0: 86% F1, 900-frame dataset (legacy, deprecated)

Citation

If you use this model, please reference:

Dataset: Anime Frame Interesting v2.0 (4,432 curated frames)
Architecture: EfficientNetV2-S
Training: PyTorch, 2026

Future Improvements

Collect 1,000+ edge-case frames for hard-negatives mining
Ensemble with ViT for higher confidence decisions
Fine-tune on downstream style classification tasks

Downloads last month: 2

Safetensors

Model size

20.3M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support