# Image to Cartoonify - Selfie to Cartoon Generator ## Model Description This is a conditional diffusion model trained to generate cartoon-style images from facial features extracted from real selfies. The model uses a custom U-Net architecture with attribute conditioning to transform realistic facial features into cartoon representations. ## Architecture - **Model Type**: Conditional Diffusion Model (Custom U-Net) - **Base Architecture**: Custom OptimizedConditionedUNet - **Input Resolution**: 256x256 RGB images - **Conditioning**: 18-dimensional facial attribute vector - **Parameters**: ~50M parameters - **Training Steps**: 1000 diffusion timesteps ## Key Features - **Facial Feature Extraction**: Uses MediaPipe for robust facial landmark detection - **Attribute Conditioning**: 18 facial attributes including: - Eye angle, lashes, lid shape - Eyebrow shape, thickness, width - Face shape, chin length - Hair style and color - Facial hair presence - Glasses detection - Skin tone analysis - **Real-time Generation**: Optimized for fast inference (15-50 steps) - **High Quality**: Trained on 10k+ cartoon images with paired attributes ## Training Details ### Dataset - **Source**: CartoonSet10k dataset - **Size**: 10,000 cartoon images with CSV attribute annotations - **Split**: 85% training, 15% validation - **Augmentation**: Random flips, color jittering, rotation ### Training Configuration - **Epochs**: 110 - **Batch Size**: 16 - **Learning Rate**: 2e-4 with cosine annealing - **Optimization**: AdamW with gradient clipping - **Mixed Precision**: FP16 for efficiency - **Hardware**: NVIDIA T4 GPU ### Loss Function - **Primary**: MSE loss on predicted noise - **Scheduler**: DDPM with scaled linear beta schedule - **Beta Range**: 0.00085 to 0.012 ## Usage ### Installation ```bash pip install torch torchvision pip install diffusers pip install mediapipe pip install opencv-python pip install Pillow numpy ``` ### Basic Usage ```python import torch from PIL import Image import numpy as np from your_model import OptimizedConditionedUNet, OptimizedMediaPipeExtractor from diffusers import DDPMScheduler # Load model device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = OptimizedConditionedUNet( in_channels=3, out_channels=3, attr_dim=18, base_channels=64 ).to(device) # Load checkpoint checkpoint = torch.load('best_model.pt', map_location=device) model.load_state_dict(checkpoint['model_state_dict']) model.eval() # Initialize components noise_scheduler = DDPMScheduler( num_train_timesteps=1000, beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", prediction_type="epsilon" ) mp_extractor = OptimizedMediaPipeExtractor() # Generate cartoon from selfie def generate_cartoon(selfie_path, output_path): # Extract facial features features = mp_extractor.extract_features(selfie_path) features = features.unsqueeze(0).to(device) # Generate cartoon with torch.no_grad(): # Start with noise image = torch.randn(1, 3, 256, 256).to(device) # Denoising process noise_scheduler.set_timesteps(50) for t in noise_scheduler.timesteps: timesteps = torch.full((1,), t, device=device).long() noise_pred = model(image, timesteps, features) image = noise_scheduler.step(noise_pred, t, image).prev_sample # Save result image = (image / 2 + 0.5).clamp(0, 1) image = image.cpu().squeeze(0).permute(1, 2, 0).numpy() image = (image * 255).astype(np.uint8) result = Image.fromarray(image) result.save(output_path) return result # Usage cartoon = generate_cartoon('selfie.jpg', 'cartoon.png') ``` ### Advanced Usage ```python # Custom attribute manipulation def generate_with_custom_attributes(base_features, modifications): """ Generate cartoon with modified attributes Args: base_features: Original facial features from selfie modifications: Dict of attribute modifications e.g., {'hair_color': 0.8, 'glasses': 0.9} """ modified_features = base_features.clone() attribute_map = { 'eye_angle': 0, 'eye_lashes': 1, 'eye_lid': 2, 'chin_length': 3, 'eyebrow_weight': 4, 'eyebrow_shape': 5, 'eyebrow_thickness': 6, 'face_shape': 7, 'facial_hair': 8, 'hair': 9, 'eye_color': 10, 'face_color': 11, 'hair_color': 12, 'glasses': 13, 'glasses_color': 14, 'eye_slant': 15, 'eyebrow_width': 16, 'eye_eyebrow_distance': 17 } for attr_name, value in modifications.items(): if attr_name in attribute_map: modified_features[0, attribute_map[attr_name]] = value return generate_from_features(modified_features) ``` ## Model Performance ### Metrics - **Training Loss**: 0.0234 (final) - **Validation Loss**: 0.0251 (best) - **Inference Time**: ~2-3 seconds (50 steps, GPU) - **Memory Usage**: ~4GB GPU memory ### Evaluation - **Facial Feature Preservation**: High fidelity in maintaining key facial characteristics - **Style Consistency**: Consistent cartoon art style across generations - **Attribute Control**: Precise control over 18 facial attributes - **Robustness**: Handles various lighting conditions and face angles ## Limitations 1. **Face Detection Dependency**: Requires clear facial landmarks for optimal results 2. **Resolution**: Fixed 256x256 output resolution 3. **Style Scope**: Limited to cartoon style present in training data 4. **Attribute Granularity**: 18 attributes may not capture all facial variations 5. **Background**: Focuses on face region, may not handle complex backgrounds well ## Ethical Considerations - **Consent**: Ensure proper consent when processing personal photos - **Bias**: Model may reflect biases present in training data - **Privacy**: Consider privacy implications when processing facial data - **Misuse**: Potential for creating misleading or fake content ## Citation ```bibtex @misc{image_to_cartoonify_2024, title={Image to Cartoonify: Selfie to Cartoon Generator}, author={wizcodes12}, year={2024}, howpublished={\\url{https://huggingface.co/wizcodes12/image_to_cartoonify}}, } ``` ## License This model is released under the MIT License. See LICENSE file for details. ## Acknowledgments - CartoonSet10k dataset creators - MediaPipe team for facial landmark detection - Diffusers library by Hugging Face - PyTorch team for the deep learning framework ## Updates - **v1.0**: Initial release with 110 epochs of training - **v1.1**: Improved feature extraction and normalization - **v1.2**: Enhanced attribute conditioning and inference speed ## Contact For questions, issues, or collaborations, please open an issue on the repository or contact wizcodes12@example.com. --- *Generated with ❤️ using PyTorch and Diffusers by wizcodes12*