File size: 4,040 Bytes
4ee4910
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
---
license: openrail
language: en
library_name: timm
tags:
  - image-classification
  - anime
  - real
  - rendered
  - 3d-graphics
datasets:
  - coco
  - custom-anime
  - steam-screenshots
---

# EfficientNet-B0 - Anime/Real/Rendered Classifier

Fast, lightweight image classifier distinguishing photographs from anime and 3D rendered images.

## Model Summary

- **Model Name:** efficientnet_b0
- **Framework:** PyTorch + TIMM
- **Input:** 224×224 RGB images
- **Output:** 3 classes (anime, real, rendered)
- **Parameters:** 5.3M
- **Size:** 16.2 MB

## Intended Use

Classify images into three categories:
- **anime**: Drawn 2D or cel-shaded animation
- **real**: Photographs and real-world footage
- **rendered**: 3D graphics (games, CGI, Pixar, etc.)

## Performance

**Validation Accuracy:** 97.44%

| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| anime | 0.98 | 0.99 | 0.99 | 236 |
| real | 0.98 | 0.98 | 0.98 | 500 |
| rendered | 0.96 | 0.93 | 0.94 | 161 |
| **weighted avg** | **0.97** | **0.97** | **0.97** | **897** |

## Training Data

- **Real images:** 5,000 COCO 2017 validation set images
- **Anime images:** 2,357 curated animation frames and key scenes
- **Rendered images:** 1,549 AAA game screenshots (Metacritic ≥75) + 61 Pixar movie stills
- **Total:** 8,967 images, 8,070 training, 897 validation (perceptually-hashed for diversity)

## Training Details

- **Framework:** PyTorch
- **Augmentation:** Resize only (224×224)
- **Loss Function:** CrossEntropyLoss with inverse frequency class weights
- **Optimizer:** AdamW (lr=0.001)
- **Batch Size:** 80
- **Epochs:** 20
- **Hardware:** NVIDIA RTX 3060 (12GB VRAM)
- **Training Time:** ~20 minutes

## Limitations

1. Photorealistic video games sometimes classified as real (90% recall on rendered class)
2. Cel-shaded games may score as anime rather than rendered
3. Artistic 3D renders (Pixar, high-quality CGI) show mixed confidence
4. Performance degrades on images <224×224

## Recommendations

- Use confidence threshold of ≥80% for reliable predictions
- For critical applications, ensemble with tf_efficientnetv2_s
- Check confusion patterns in own use cases
- Manually review edge cases (game screenshots, stylized renders)

## How to Use

```python
from PIL import Image
import torch
from torchvision import transforms
import timm
from safetensors.torch import load_file

# Load
model = timm.create_model('efficientnet_b0', num_classes=3, pretrained=False)
state_dict = load_file('model.safetensors')
model.load_state_dict(state_dict)
model.eval()

# Prepare image
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
img = Image.open('image.jpg').convert('RGB')
x = transform(img).unsqueeze(0)

# Infer
with torch.no_grad():
    logits = model(x)
    probs = torch.softmax(logits, dim=1)
    pred = probs.argmax().item()

labels = ['anime', 'real', 'rendered']
print(f"{labels[pred]}: {probs[0, pred]:.1%}")
```

## Benchmarks

**Inference Speed (RTX 3060)**
- Single image: ~20ms
- Batch of 32: ~150ms

**Accuracy Comparison**
| Model | Accuracy | Speed | Params |
|-------|----------|-------|--------|
| EfficientNet-B0 | 97.44% | Fast | 5.3M |
| TF-EfficientNetV2-S | 97.55% | Moderate | 21.5M |

## Ethical Considerations

This model classifies images by visual style/source. Potential misuse:
- Detecting deepfakes/AI-generated content (not designed for this)
- Filtering user-generated content (may have cultural bias)
- Surveillance or profiling

**Recommendations:**
- Use with human review for content moderation
- Test on your target domain before deployment
- Don't rely solely on automatic classification for safety-critical decisions
- Consider cultural representation in anime/rendered content

## Contact

For questions or issues: [GitHub repo]

## License

OpenRAIL (Open Responsible AI License) - free for research and commercial use with proper attribution