Mitchins commited on
Commit
bf0c72b
·
verified ·
1 Parent(s): 030c01d

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +103 -0
  2. config.json +10 -0
  3. labels.txt +3 -0
  4. model.safetensors +3 -0
  5. model_card.md +168 -0
README.md ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Anime/Real/Rendered Image Classifier (TF-EfficientNetV2-S)
2
+
3
+ **Higher-capacity classifier with improved generalization for anime, photo, and 3D detection.**
4
+
5
+ ## Model Details
6
+
7
+ - **Architecture:** TF-EfficientNetV2-S (timm)
8
+ - **Input Size:** 224×224 RGB
9
+ - **Classes:** anime, real, rendered
10
+ - **Parameters:** 21.5M (4× larger than B0)
11
+ - **Validation Accuracy:** 97.55% (+0.11% vs B0)
12
+ - **Training Speed:** ~3 min/epoch (GPU)
13
+ - **Inference Speed:** ~60ms per image (RTX 3060)
14
+
15
+ ## Performance
16
+
17
+ | Class | Precision | Recall | F1-Score |
18
+ |-------|-----------|--------|----------|
19
+ | anime | 1.00 | 0.97 | 0.98 |
20
+ | real | 0.98 | 0.99 | 0.98 |
21
+ | rendered | 0.93 | 0.90 | 0.91 |
22
+ | **macro avg** | **0.97** | **0.95** | **0.96** |
23
+
24
+ ## Comparison to EfficientNet-B0
25
+
26
+ | Metric | B0 | V2-S | Winner |
27
+ |--------|-----|------|--------|
28
+ | Final Accuracy | 97.44% | **97.55%** | V2-S +0.11% |
29
+ | Best Accuracy | 97.99% | 97.99% | Tied |
30
+ | Params | 5.3M | 21.5M | B0 (lighter) |
31
+ | Speed | 1 min/epoch | 3 min/epoch | B0 (faster) |
32
+ | Convergence | Epoch 4 | Epoch 13 | B0 (faster) |
33
+
34
+ **Verdict:** V2-S learns training data better with marginally improved generalization. Use B0 for speed, V2-S for accuracy.
35
+
36
+ ## Usage
37
+
38
+ ```python
39
+ from PIL import Image
40
+ import torch
41
+ from torchvision import transforms
42
+ import timm
43
+ from safetensors.torch import load_file
44
+
45
+ # Load model
46
+ model = timm.create_model('tf_efficientnetv2_s', num_classes=3, pretrained=False)
47
+ state_dict = load_file('model.safetensors')
48
+ model.load_state_dict(state_dict)
49
+ model.eval()
50
+
51
+ # Prepare image
52
+ transform = transforms.Compose([
53
+ transforms.Resize((224, 224)),
54
+ transforms.ToTensor(),
55
+ transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
56
+ ])
57
+
58
+ image = Image.open('image.jpg').convert('RGB')
59
+ x = transform(image).unsqueeze(0)
60
+
61
+ # Predict
62
+ with torch.no_grad():
63
+ logits = model(x)
64
+ probs = torch.softmax(logits, dim=1)
65
+ pred_class = probs.argmax(dim=1).item()
66
+
67
+ labels = ['anime', 'real', 'rendered']
68
+ print(f"{labels[pred_class]}: {probs[0, pred_class]:.2%}")
69
+ ```
70
+
71
+ ## Dataset
72
+
73
+ - **Real:** 5,000 COCO 2017 validation images
74
+ - **Anime:** 2,357 curated animation frames
75
+ - **Rendered:** 1,610 AAA games + 61 Pixar stills
76
+ - **Total:** 8,967 images (8,070 train / 897 perceptually-hashed val)
77
+
78
+ ## Training Details
79
+
80
+ - **Augmentation:** Raw (resize only)
81
+ - **Optimizer:** AdamW (lr=0.001)
82
+ - **Loss:** CrossEntropyLoss with class weighting
83
+ - **Epochs:** 20
84
+ - **Batch Size:** 40 (GPU memory constrained)
85
+ - **Hardware:** NVIDIA RTX 3060 (12GB)
86
+
87
+ ## Known Behavior
88
+
89
+ - **Better Anime Detection:** Perfect precision (1.00) but 97% recall
90
+ - **Stronger Real Recognition:** 99% recall on real images
91
+ - **Rendered Uncertainty:** 90% recall suggests photorealistic games still challenging
92
+ - **Slower Inference:** ~3× slower than B0 due to model size
93
+
94
+ ## Recommendations
95
+
96
+ - **Production:** Ensemble both models for maximum confidence
97
+ - **Real-time:** Use B0 for speed-critical applications
98
+ - **Accuracy-critical:** Use V2-S as primary model
99
+ - **Confidence Thresholding:** Only trust predictions >80% confidence
100
+
101
+ ## License
102
+
103
+ This model is provided as-is for research and educational purposes.
config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "tf_efficientnetv2_s",
3
+ "num_classes": 3,
4
+ "input_size": 224,
5
+ "labels": [
6
+ "anime",
7
+ "real",
8
+ "rendered"
9
+ ]
10
+ }
labels.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ 0: anime
2
+ 1: real
3
+ 2: rendered
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:935da24ebbb0bfbe744f7f75d7a018dadd0bfb1f9dcad0d7c34488acb0d546cf
3
+ size 81414628
model_card.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail
3
+ language: en
4
+ library_name: timm
5
+ tags:
6
+ - image-classification
7
+ - anime
8
+ - real
9
+ - rendered
10
+ - 3d-graphics
11
+ datasets:
12
+ - coco
13
+ - custom-anime
14
+ - steam-screenshots
15
+ ---
16
+
17
+ # TF-EfficientNetV2-S - Anime/Real/Rendered Classifier
18
+
19
+ Higher-capacity classifier with improved generalization for distinguishing photographs from anime and 3D rendered images.
20
+
21
+ ## Model Summary
22
+
23
+ - **Model Name:** tf_efficientnetv2_s
24
+ - **Framework:** PyTorch + TIMM
25
+ - **Input:** 224×224 RGB images
26
+ - **Output:** 3 classes (anime, real, rendered)
27
+ - **Parameters:** 21.5M (4× larger than B0)
28
+ - **Size:** 81.4 MB
29
+
30
+ ## Intended Use
31
+
32
+ Same as EfficientNet-B0, but with higher accuracy and better generalization:
33
+ - **anime**: Drawn 2D or cel-shaded animation
34
+ - **real**: Photographs and real-world footage
35
+ - **rendered**: 3D graphics (games, CGI, Pixar, etc.)
36
+
37
+ ## Performance
38
+
39
+ **Validation Accuracy:** 97.55% (+0.11% vs B0)
40
+
41
+ | Class | Precision | Recall | F1-Score | Support |
42
+ |-------|-----------|--------|----------|---------|
43
+ | anime | 1.00 | 0.97 | 0.98 | 236 |
44
+ | real | 0.98 | 0.99 | 0.98 | 500 |
45
+ | rendered | 0.93 | 0.90 | 0.91 | 161 |
46
+ | **weighted avg** | **0.97** | **0.95** | **0.96** | **897** |
47
+
48
+ ## Training Data
49
+
50
+ Identical to EfficientNet-B0:
51
+ - **Real images:** 5,000 COCO 2017 validation set
52
+ - **Anime images:** 2,357 curated frames
53
+ - **Rendered images:** 1,549 AAA games + 61 Pixar stills
54
+ - **Total:** 8,967 images (8,070 train / 897 diverse val)
55
+
56
+ ## Training Details
57
+
58
+ - **Framework:** PyTorch
59
+ - **Augmentation:** Resize only (224×224)
60
+ - **Loss Function:** CrossEntropyLoss with inverse frequency weighting
61
+ - **Optimizer:** AdamW (lr=0.001)
62
+ - **Batch Size:** 40 (GPU memory constrained)
63
+ - **Epochs:** 20
64
+ - **Hardware:** NVIDIA RTX 3060 (12GB VRAM)
65
+ - **Training Time:** ~60 minutes
66
+
67
+ ## Comparison to EfficientNet-B0
68
+
69
+ | Metric | B0 | V2-S | Delta |
70
+ |--------|-----|------|-------|
71
+ | Final Accuracy | 97.44% | 97.55% | +0.11% |
72
+ | Best Accuracy | 97.99% | 97.99% | Tied |
73
+ | Params | 5.3M | 21.5M | +4× |
74
+ | Speed | ~20ms | ~60ms | -3× |
75
+ | Convergence | Epoch 4 | Epoch 13 | -9 epochs |
76
+ | Train Loss | 0.1022 | 0.0003 | Better |
77
+ | Val Loss | 0.5519 | 0.1134 | Better |
78
+
79
+ **Verdict:** V2-S learns training distribution more thoroughly, but marginal real-world improvement. Use B0 for speed, V2-S for maximum accuracy.
80
+
81
+ ## Limitations
82
+
83
+ 1. Slower inference (60ms vs B0's 20ms)
84
+ 2. Larger model (81.4MB vs B0's 16.2MB)
85
+ 3. Same fundamental challenges: photorealistic games, cel-shading, artistic renders
86
+ 4. Performance degrades on images <224×224
87
+
88
+ ## Recommendations
89
+
90
+ - **Real-time/Mobile:** Use EfficientNet-B0 instead
91
+ - **Accuracy-Critical:** This model preferred
92
+ - **Ensemble:** Use both models for highest confidence
93
+ - **Confidence Threshold:** ≥80% for reliable predictions
94
+ - **Edge Cases:** Manually inspect when models disagree
95
+
96
+ ## How to Use
97
+
98
+ ```python
99
+ from PIL import Image
100
+ import torch
101
+ from torchvision import transforms
102
+ import timm
103
+ from safetensors.torch import load_file
104
+
105
+ # Load
106
+ model = timm.create_model('tf_efficientnetv2_s', num_classes=3, pretrained=False)
107
+ state_dict = load_file('model.safetensors')
108
+ model.load_state_dict(state_dict)
109
+ model.eval()
110
+
111
+ # Prepare image
112
+ transform = transforms.Compose([
113
+ transforms.Resize((224, 224)),
114
+ transforms.ToTensor(),
115
+ transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
116
+ ])
117
+ img = Image.open('image.jpg').convert('RGB')
118
+ x = transform(img).unsqueeze(0)
119
+
120
+ # Infer
121
+ with torch.no_grad():
122
+ logits = model(x)
123
+ probs = torch.softmax(logits, dim=1)
124
+ pred = probs.argmax().item()
125
+
126
+ labels = ['anime', 'real', 'rendered']
127
+ print(f"{labels[pred]}: {probs[0, pred]:.1%}")
128
+ ```
129
+
130
+ ## Ensemble Strategy
131
+
132
+ For maximum accuracy, use both models:
133
+
134
+ ```python
135
+ # Load both
136
+ b0 = load_model('efficientnet_b0')
137
+ v2s = load_model('tf_efficientnetv2_s')
138
+
139
+ # Infer
140
+ with torch.no_grad():
141
+ probs_b0 = torch.softmax(b0(x), dim=1)
142
+ probs_v2s = torch.softmax(v2s(x), dim=1)
143
+
144
+ # Average predictions
145
+ ensemble_probs = (probs_b0 + probs_v2s) / 2
146
+ pred = ensemble_probs.argmax().item()
147
+ ```
148
+
149
+ ## Benchmarks
150
+
151
+ **Inference Speed (RTX 3060)**
152
+ - Single image: ~60ms
153
+ - Batch of 16: ~200ms
154
+
155
+ ## Ethical Considerations
156
+
157
+ Same as EfficientNet-B0. This model:
158
+ - NOT designed for deepfake detection
159
+ - May have cultural bias in anime/rendered representation
160
+ - Should be used with human review for content moderation
161
+
162
+ ## Contact
163
+
164
+ For questions: [GitHub repo]
165
+
166
+ ## License
167
+
168
+ OpenRAIL - Free for research and commercial use with proper attribution