Upload folder using huggingface_hub

4ee4910 verified about 1 month ago

4.04 kB

license: openrail
language: en
library_name: timm
tags:
  - image-classification
  - anime
  - real
  - rendered
  - 3d-graphics
datasets:
  - coco
  - custom-anime
  - steam-screenshots

EfficientNet-B0 - Anime/Real/Rendered Classifier

Fast, lightweight image classifier distinguishing photographs from anime and 3D rendered images.

Model Summary

Model Name: efficientnet_b0
Framework: PyTorch + TIMM
Input: 224×224 RGB images
Output: 3 classes (anime, real, rendered)
Parameters: 5.3M
Size: 16.2 MB

Intended Use

Classify images into three categories:

anime: Drawn 2D or cel-shaded animation
real: Photographs and real-world footage
rendered: 3D graphics (games, CGI, Pixar, etc.)

Performance

Validation Accuracy: 97.44%

Class	Precision	Recall	F1-Score	Support
anime	0.98	0.99	0.99	236
real	0.98	0.98	0.98	500
rendered	0.96	0.93	0.94	161
weighted avg	0.97	0.97	0.97	897

Training Data

Real images: 5,000 COCO 2017 validation set images
Anime images: 2,357 curated animation frames and key scenes
Rendered images: 1,549 AAA game screenshots (Metacritic ≥75) + 61 Pixar movie stills
Total: 8,967 images, 8,070 training, 897 validation (perceptually-hashed for diversity)

Training Details

Framework: PyTorch
Augmentation: Resize only (224×224)
Loss Function: CrossEntropyLoss with inverse frequency class weights
Optimizer: AdamW (lr=0.001)
Batch Size: 80
Epochs: 20
Hardware: NVIDIA RTX 3060 (12GB VRAM)
Training Time: ~20 minutes

Limitations

Photorealistic video games sometimes classified as real (90% recall on rendered class)
Cel-shaded games may score as anime rather than rendered
Artistic 3D renders (Pixar, high-quality CGI) show mixed confidence
Performance degrades on images <224×224

Recommendations

Use confidence threshold of ≥80% for reliable predictions
For critical applications, ensemble with tf_efficientnetv2_s
Check confusion patterns in own use cases
Manually review edge cases (game screenshots, stylized renders)

How to Use

from PIL import Image
import torch
from torchvision import transforms
import timm
from safetensors.torch import load_file

# Load
model = timm.create_model('efficientnet_b0', num_classes=3, pretrained=False)
state_dict = load_file('model.safetensors')
model.load_state_dict(state_dict)
model.eval()

# Prepare image
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
img = Image.open('image.jpg').convert('RGB')
x = transform(img).unsqueeze(0)

# Infer
with torch.no_grad():
    logits = model(x)
    probs = torch.softmax(logits, dim=1)
    pred = probs.argmax().item()

labels = ['anime', 'real', 'rendered']
print(f"{labels[pred]}: {probs[0, pred]:.1%}")

Benchmarks

Inference Speed (RTX 3060)

Single image: ~20ms
Batch of 32: ~150ms

Accuracy Comparison

Model	Accuracy	Speed	Params
EfficientNet-B0	97.44%	Fast	5.3M
TF-EfficientNetV2-S	97.55%	Moderate	21.5M

Ethical Considerations

This model classifies images by visual style/source. Potential misuse:

Detecting deepfakes/AI-generated content (not designed for this)
Filtering user-generated content (may have cultural bias)
Surveillance or profiling

Recommendations:

Use with human review for content moderation
Test on your target domain before deployment
Don't rely solely on automatic classification for safety-critical decisions
Consider cultural representation in anime/rendered content

Contact

For questions or issues: [GitHub repo]

License

OpenRAIL (Open Responsible AI License) - free for research and commercial use with proper attribution