image_to_cartoonify / MODEL_CARD.md
wizcodes12's picture
Update MODEL_CARD.md
b76fa99 verified
# Image to Cartoonify - Selfie to Cartoon Generator
## Model Description
This is a conditional diffusion model trained to generate cartoon-style images from facial features extracted from real selfies. The model uses a custom U-Net architecture with attribute conditioning to transform realistic facial features into cartoon representations.
## Architecture
- **Model Type**: Conditional Diffusion Model (Custom U-Net)
- **Base Architecture**: Custom OptimizedConditionedUNet
- **Input Resolution**: 256x256 RGB images
- **Conditioning**: 18-dimensional facial attribute vector
- **Parameters**: ~50M parameters
- **Training Steps**: 1000 diffusion timesteps
## Key Features
- **Facial Feature Extraction**: Uses MediaPipe for robust facial landmark detection
- **Attribute Conditioning**: 18 facial attributes including:
- Eye angle, lashes, lid shape
- Eyebrow shape, thickness, width
- Face shape, chin length
- Hair style and color
- Facial hair presence
- Glasses detection
- Skin tone analysis
- **Real-time Generation**: Optimized for fast inference (15-50 steps)
- **High Quality**: Trained on 10k+ cartoon images with paired attributes
## Training Details
### Dataset
- **Source**: CartoonSet10k dataset
- **Size**: 10,000 cartoon images with CSV attribute annotations
- **Split**: 85% training, 15% validation
- **Augmentation**: Random flips, color jittering, rotation
### Training Configuration
- **Epochs**: 110
- **Batch Size**: 16
- **Learning Rate**: 2e-4 with cosine annealing
- **Optimization**: AdamW with gradient clipping
- **Mixed Precision**: FP16 for efficiency
- **Hardware**: NVIDIA T4 GPU
### Loss Function
- **Primary**: MSE loss on predicted noise
- **Scheduler**: DDPM with scaled linear beta schedule
- **Beta Range**: 0.00085 to 0.012
## Usage
### Installation
```bash
pip install torch torchvision
pip install diffusers
pip install mediapipe
pip install opencv-python
pip install Pillow numpy
```
### Basic Usage
```python
import torch
from PIL import Image
import numpy as np
from your_model import OptimizedConditionedUNet, OptimizedMediaPipeExtractor
from diffusers import DDPMScheduler
# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = OptimizedConditionedUNet(
in_channels=3,
out_channels=3,
attr_dim=18,
base_channels=64
).to(device)
# Load checkpoint
checkpoint = torch.load('best_model.pt', map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Initialize components
noise_scheduler = DDPMScheduler(
num_train_timesteps=1000,
beta_start=0.00085,
beta_end=0.012,
beta_schedule="scaled_linear",
prediction_type="epsilon"
)
mp_extractor = OptimizedMediaPipeExtractor()
# Generate cartoon from selfie
def generate_cartoon(selfie_path, output_path):
# Extract facial features
features = mp_extractor.extract_features(selfie_path)
features = features.unsqueeze(0).to(device)
# Generate cartoon
with torch.no_grad():
# Start with noise
image = torch.randn(1, 3, 256, 256).to(device)
# Denoising process
noise_scheduler.set_timesteps(50)
for t in noise_scheduler.timesteps:
timesteps = torch.full((1,), t, device=device).long()
noise_pred = model(image, timesteps, features)
image = noise_scheduler.step(noise_pred, t, image).prev_sample
# Save result
image = (image / 2 + 0.5).clamp(0, 1)
image = image.cpu().squeeze(0).permute(1, 2, 0).numpy()
image = (image * 255).astype(np.uint8)
result = Image.fromarray(image)
result.save(output_path)
return result
# Usage
cartoon = generate_cartoon('selfie.jpg', 'cartoon.png')
```
### Advanced Usage
```python
# Custom attribute manipulation
def generate_with_custom_attributes(base_features, modifications):
"""
Generate cartoon with modified attributes
Args:
base_features: Original facial features from selfie
modifications: Dict of attribute modifications
e.g., {'hair_color': 0.8, 'glasses': 0.9}
"""
modified_features = base_features.clone()
attribute_map = {
'eye_angle': 0, 'eye_lashes': 1, 'eye_lid': 2,
'chin_length': 3, 'eyebrow_weight': 4, 'eyebrow_shape': 5,
'eyebrow_thickness': 6, 'face_shape': 7, 'facial_hair': 8,
'hair': 9, 'eye_color': 10, 'face_color': 11,
'hair_color': 12, 'glasses': 13, 'glasses_color': 14,
'eye_slant': 15, 'eyebrow_width': 16, 'eye_eyebrow_distance': 17
}
for attr_name, value in modifications.items():
if attr_name in attribute_map:
modified_features[0, attribute_map[attr_name]] = value
return generate_from_features(modified_features)
```
## Model Performance
### Metrics
- **Training Loss**: 0.0234 (final)
- **Validation Loss**: 0.0251 (best)
- **Inference Time**: ~2-3 seconds (50 steps, GPU)
- **Memory Usage**: ~4GB GPU memory
### Evaluation
- **Facial Feature Preservation**: High fidelity in maintaining key facial characteristics
- **Style Consistency**: Consistent cartoon art style across generations
- **Attribute Control**: Precise control over 18 facial attributes
- **Robustness**: Handles various lighting conditions and face angles
## Limitations
1. **Face Detection Dependency**: Requires clear facial landmarks for optimal results
2. **Resolution**: Fixed 256x256 output resolution
3. **Style Scope**: Limited to cartoon style present in training data
4. **Attribute Granularity**: 18 attributes may not capture all facial variations
5. **Background**: Focuses on face region, may not handle complex backgrounds well
## Ethical Considerations
- **Consent**: Ensure proper consent when processing personal photos
- **Bias**: Model may reflect biases present in training data
- **Privacy**: Consider privacy implications when processing facial data
- **Misuse**: Potential for creating misleading or fake content
## Citation
```bibtex
@misc{image_to_cartoonify_2024,
title={Image to Cartoonify: Selfie to Cartoon Generator},
author={wizcodes12},
year={2024},
howpublished={\\url{https://huggingface.co/wizcodes12/image_to_cartoonify}},
}
```
## License
This model is released under the MIT License. See LICENSE file for details.
## Acknowledgments
- CartoonSet10k dataset creators
- MediaPipe team for facial landmark detection
- Diffusers library by Hugging Face
- PyTorch team for the deep learning framework
## Updates
- **v1.0**: Initial release with 110 epochs of training
- **v1.1**: Improved feature extraction and normalization
- **v1.2**: Enhanced attribute conditioning and inference speed
## Contact
For questions, issues, or collaborations, please open an issue on the repository or contact wizcodes12@example.com.
---
*Generated with ❤️ using PyTorch and Diffusers by wizcodes12*