YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Anime Frame Interesting Classifier (CNN v2.0)

Model Details

Architecture: EfficientNetV2-S (CNN)
Framework: PyTorch
Input Size: 224x224 RGB images
Output: Binary classification (Boring/Interesting)

Performance

Evaluated on v2.0 test set (433 frames):

  • F1 Score: 95.15%
  • Accuracy: 95.15%
  • Precision: 95.16%
  • Recall: 95.15%

Training data: 3,999 frames (4,432 total with test holdout)

Intended Use

What it does: Classifies anime frames as either "interesting" (depicting meaningful character/scene details) or "boring" (back-of-head shots, non-descript backgrounds, montages).

When to use:

  • Filtering extracted anime keyframes for quality
  • Pre-scoring large frame datasets before manual review
  • Combined with transformer model for ensemble voting (higher confidence)

When NOT to use:

  • Real-world photos or non-anime content
  • Frames smaller than 224x224 (resize required)
  • Critical applications requiring >99% precision without human review

Labels

  • Class 0 (Boring): Frames lacking interesting visual details or character focus
  • Class 1 (Interesting): Frames with clear character/scene details suitable for downstream tasks

Model Size & Speed

  • Model Size: 78 MB (SafeTensors format)
  • Inference Speed: ~20ms per image on GPU
  • VRAM Required: ~2 GB (including activations)

Training Data Composition

  • 900 manually curated frames (hand-labeled)
  • 1,655 frames filtered via dual-agreement with garbage classifier
  • 1,877 frames from curated anime site scraper
  • Total: 4,432 frames (90% train, 10% test holdout)

All frames are 224x224 RGB anime screenshots.

How to Use

Recommended: HuggingFace Transformers (SafeTensors)

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image

# Load model (automatically uses SafeTensors)
processor = AutoImageProcessor.from_pretrained(
    "hf_models/anime-frame-interesting-classifier-cnn-v2"
)
model = AutoModelForImageClassification.from_pretrained(
    "hf_models/anime-frame-interesting-classifier-cnn-v2",
    trust_remote_code=False  # Safe: SafeTensors prevents code execution
)

image = Image.open('frame.png')
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
prediction = outputs.logits.argmax(-1).item()

print(f"Prediction: {'Interesting' if prediction == 1 else 'Boring'}")

Alternative: PyTorch with SafeTensors

from torchvision import models
import torch
from safetensors.torch import load_file

# Load model using SafeTensors (secure, no pickle deserialization)
model = models.efficientnet_v2_s()
model.classifier[1] = torch.nn.Linear(model.classifier[1].in_features, 2)
model.load_state_dict(load_file('model.safetensors'))
model.eval()

# Inference
import torchvision.transforms as transforms
from PIL import Image

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225])
])

image = Image.open('frame.png')
x = transform(image).unsqueeze(0)

with torch.no_grad():
    output = model(x)
    logits = output
    prediction = logits.argmax(1).item()
    confidence = logits.softmax(1)[0][prediction].item()

print(f"Prediction: {'Interesting' if prediction == 1 else 'Boring'}")
print(f"Confidence: {confidence:.2%}")

Security Note: This model uses SafeTensors format (not pickle). This prevents arbitrary code execution during model loading. For maximum safety, use HuggingFace Transformers with trust_remote_code=False.

Comparison with Transformer Model

See anime-frame-interesting-classifier-vit-v2 for transformer alternative:

Metric CNN (This Model) ViT Use Case
F1 95.15% 94.92% CNN: general, ViT: ensemble
Speed Faster Slower CNN preferred for speed
Size 20 MB 20 MB Similar footprint
Ensemble Better precision with ViT voting Better recall with CNN voting Use together

For maximum confidence, use ensemble voting: classify with both models and flag disagreements for manual review.

Limitations

  • Anime-only: Trained exclusively on anime content, not tested on other media
  • Dataset bias: Training data skewed toward popular anime styles (may underperform on niche art styles)
  • Resolution: Trained on 224x224 frames; extreme aspect ratios may need preprocessing
  • Edge cases: Minimal training on hard-to-classify frames (90-95% confidence borderline cases)

Training Details

  • Dataset: v2.0 (4,432 frames)
  • Train/Test Split: 90/10 (3,999 train, 433 test)
  • Epochs: 20
  • Batch Size: 64
  • Optimizer: AdamW (lr=1e-4)
  • Loss: CrossEntropyLoss
  • Augmentation: None (data quality sufficient at this scale)

Version History

  • v2.0 (current): 95.15% F1, retrained on expanded 4,432-frame dataset
  • v1.0: 86% F1, 900-frame dataset (legacy, deprecated)

Citation

If you use this model, please reference:

  • Dataset: Anime Frame Interesting v2.0 (4,432 curated frames)
  • Architecture: EfficientNetV2-S
  • Training: PyTorch, 2026

Future Improvements

  • Collect 1,000+ edge-case frames for hard-negatives mining
  • Ensemble with ViT for higher confidence decisions
  • Fine-tune on downstream style classification tasks
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support