Anime/Real/Rendered Image Classifier (EfficientNet-B0)

Fast, lightweight classifier for distinguishing photographs from anime and 3D rendered images.

Model Details

  • Architecture: EfficientNet-B0 (timm)
  • Input Size: 224×224 RGB
  • Classes: anime, real, rendered
  • Parameters: 5.3M
  • Validation Accuracy: 97.44%
  • Training Speed: ~1 min/epoch (GPU)
  • Inference Speed: ~20ms per image (RTX 3060)

Performance

Class Precision Recall F1-Score
anime 0.98 0.99 0.99
real 0.98 0.98 0.98
rendered 0.96 0.93 0.94
macro avg 0.97 0.97 0.97

Usage

from PIL import Image
import torch
from torchvision import transforms
import timm
from safetensors.torch import load_file

# Load model
model = timm.create_model('efficientnet_b0', num_classes=3, pretrained=False)
state_dict = load_file('model.safetensors')
model.load_state_dict(state_dict)
model.eval()

# Prepare image
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

image = Image.open('image.jpg').convert('RGB')
x = transform(image).unsqueeze(0)

# Predict
with torch.no_grad():
    logits = model(x)
    probs = torch.softmax(logits, dim=1)
    pred_class = probs.argmax(dim=1).item()

labels = ['anime', 'real', 'rendered']
print(f"{labels[pred_class]}: {probs[0, pred_class]:.2%}")

Dataset

  • Real: 5,000 COCO 2017 validation images (diverse real-world scenarios)
  • Anime: 2,357 curated anime/animation frames
  • Rendered: 1,610 AAA game screenshots + 61 Pixar movie stills
  • Total: 8,967 images (8,070 train / 897 val)

Training Details

  • Augmentation: None (raw resize to 224×224)
  • Optimizer: AdamW (lr=0.001)
  • Loss: CrossEntropyLoss with class weighting
  • Epochs: 20
  • Batch Size: 80
  • Hardware: NVIDIA RTX 3060 (12GB)

Known Limitations

  • Real vs Rendered: Some confusion (photorealistic games misclassified as real)
  • Stylized Games: Cel-shaded games (e.g., Fate/Extella) may score as anime
  • Pixar: Stylized rendered images may show mixed confidence

Recommendations

  • Use ensemble with tf_efficientnetv2_s for critical applications
  • Apply confidence threshold: only trust predictions >85% confidence
  • For edge cases, use the full confusion matrix to understand failure modes

License

OpenRAIL - Free for research and commercial use with proper attribution

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support