Anime/Real/Rendered Image Classifier (EfficientNet-B0)
Fast, lightweight classifier for distinguishing photographs from anime and 3D rendered images.
Model Details
- Architecture: EfficientNet-B0 (timm)
- Input Size: 224×224 RGB
- Classes: anime, real, rendered
- Parameters: 5.3M
- Validation Accuracy: 97.44%
- Training Speed: ~1 min/epoch (GPU)
- Inference Speed: ~20ms per image (RTX 3060)
Performance
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| anime | 0.98 | 0.99 | 0.99 |
| real | 0.98 | 0.98 | 0.98 |
| rendered | 0.96 | 0.93 | 0.94 |
| macro avg | 0.97 | 0.97 | 0.97 |
Usage
from PIL import Image
import torch
from torchvision import transforms
import timm
from safetensors.torch import load_file
# Load model
model = timm.create_model('efficientnet_b0', num_classes=3, pretrained=False)
state_dict = load_file('model.safetensors')
model.load_state_dict(state_dict)
model.eval()
# Prepare image
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
image = Image.open('image.jpg').convert('RGB')
x = transform(image).unsqueeze(0)
# Predict
with torch.no_grad():
logits = model(x)
probs = torch.softmax(logits, dim=1)
pred_class = probs.argmax(dim=1).item()
labels = ['anime', 'real', 'rendered']
print(f"{labels[pred_class]}: {probs[0, pred_class]:.2%}")
Dataset
- Real: 5,000 COCO 2017 validation images (diverse real-world scenarios)
- Anime: 2,357 curated anime/animation frames
- Rendered: 1,610 AAA game screenshots + 61 Pixar movie stills
- Total: 8,967 images (8,070 train / 897 val)
Training Details
- Augmentation: None (raw resize to 224×224)
- Optimizer: AdamW (lr=0.001)
- Loss: CrossEntropyLoss with class weighting
- Epochs: 20
- Batch Size: 80
- Hardware: NVIDIA RTX 3060 (12GB)
Known Limitations
- Real vs Rendered: Some confusion (photorealistic games misclassified as real)
- Stylized Games: Cel-shaded games (e.g., Fate/Extella) may score as anime
- Pixar: Stylized rendered images may show mixed confidence
Recommendations
- Use ensemble with tf_efficientnetv2_s for critical applications
- Apply confidence threshold: only trust predictions >85% confidence
- For edge cases, use the full confusion matrix to understand failure modes
License
OpenRAIL - Free for research and commercial use with proper attribution
- Downloads last month
- 12