Upload folder using huggingface_hub

4ee4910 verified about 1 month ago

4.04 kB

	---
	license: openrail
	language: en
	library_name: timm
	tags:
	- image-classification
	- anime
	- real
	- rendered
	- 3d-graphics
	datasets:
	- coco
	- custom-anime
	- steam-screenshots
	---

	# EfficientNet-B0 - Anime/Real/Rendered Classifier

	Fast, lightweight image classifier distinguishing photographs from anime and 3D rendered images.

	## Model Summary

	- Model Name: efficientnet_b0
	- Framework: PyTorch + TIMM
	- Input: 224×224 RGB images
	- Output: 3 classes (anime, real, rendered)
	- Parameters: 5.3M
	- Size: 16.2 MB

	## Intended Use

	Classify images into three categories:
	- anime: Drawn 2D or cel-shaded animation
	- real: Photographs and real-world footage
	- rendered: 3D graphics (games, CGI, Pixar, etc.)

	## Performance

	Validation Accuracy: 97.44%

	\| Class \| Precision \| Recall \| F1-Score \| Support \|
	\|-------\|-----------\|--------\|----------\|---------\|
	\| anime \| 0.98 \| 0.99 \| 0.99 \| 236 \|
	\| real \| 0.98 \| 0.98 \| 0.98 \| 500 \|
	\| rendered \| 0.96 \| 0.93 \| 0.94 \| 161 \|
	\| weighted avg \| 0.97 \| 0.97 \| 0.97 \| 897 \|

	## Training Data

	- Real images: 5,000 COCO 2017 validation set images
	- Anime images: 2,357 curated animation frames and key scenes
	- Rendered images: 1,549 AAA game screenshots (Metacritic ≥75) + 61 Pixar movie stills
	- Total: 8,967 images, 8,070 training, 897 validation (perceptually-hashed for diversity)

	## Training Details

	- Framework: PyTorch
	- Augmentation: Resize only (224×224)
	- Loss Function: CrossEntropyLoss with inverse frequency class weights
	- Optimizer: AdamW (lr=0.001)
	- Batch Size: 80
	- Epochs: 20
	- Hardware: NVIDIA RTX 3060 (12GB VRAM)
	- Training Time: ~20 minutes

	## Limitations

	1. Photorealistic video games sometimes classified as real (90% recall on rendered class)
	2. Cel-shaded games may score as anime rather than rendered
	3. Artistic 3D renders (Pixar, high-quality CGI) show mixed confidence
	4. Performance degrades on images <224×224

	## Recommendations

	- Use confidence threshold of ≥80% for reliable predictions
	- For critical applications, ensemble with tf_efficientnetv2_s
	- Check confusion patterns in own use cases
	- Manually review edge cases (game screenshots, stylized renders)

	## How to Use

	```python
	from PIL import Image
	import torch
	from torchvision import transforms
	import timm
	from safetensors.torch import load_file

	# Load
	model = timm.create_model('efficientnet_b0', num_classes=3, pretrained=False)
	state_dict = load_file('model.safetensors')
	model.load_state_dict(state_dict)
	model.eval()

	# Prepare image
	transform = transforms.Compose([
	transforms.Resize((224, 224)),
	transforms.ToTensor(),
	transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
	])
	img = Image.open('image.jpg').convert('RGB')
	x = transform(img).unsqueeze(0)

	# Infer
	with torch.no_grad():
	logits = model(x)
	probs = torch.softmax(logits, dim=1)
	pred = probs.argmax().item()

	labels = ['anime', 'real', 'rendered']
	print(f"{labels[pred]}: {probs[0, pred]:.1%}")
	```

	## Benchmarks

	Inference Speed (RTX 3060)
	- Single image: ~20ms
	- Batch of 32: ~150ms

	Accuracy Comparison
	\| Model \| Accuracy \| Speed \| Params \|
	\|-------\|----------\|-------\|--------\|
	\| EfficientNet-B0 \| 97.44% \| Fast \| 5.3M \|
	\| TF-EfficientNetV2-S \| 97.55% \| Moderate \| 21.5M \|

	## Ethical Considerations

	This model classifies images by visual style/source. Potential misuse:
	- Detecting deepfakes/AI-generated content (not designed for this)
	- Filtering user-generated content (may have cultural bias)
	- Surveillance or profiling

	Recommendations:
	- Use with human review for content moderation
	- Test on your target domain before deployment
	- Don't rely solely on automatic classification for safety-critical decisions
	- Consider cultural representation in anime/rendered content

	## Contact

	For questions or issues: [GitHub repo]

	## License

	OpenRAIL (Open Responsible AI License) - free for research and commercial use with proper attribution