# Image to Cartoonify - Selfie to Cartoon Generator

## Model Description

This is a conditional diffusion model trained to generate cartoon-style images from facial features extracted from real selfies. The model uses a custom U-Net architecture with attribute conditioning to transform realistic facial features into cartoon representations.

## Architecture

- **Model Type**: Conditional Diffusion Model (Custom U-Net)
- **Base Architecture**: Custom OptimizedConditionedUNet
- **Input Resolution**: 256x256 RGB images
- **Conditioning**: 18-dimensional facial attribute vector
- **Parameters**: ~50M parameters
- **Training Steps**: 1000 diffusion timesteps

## Key Features

- **Facial Feature Extraction**: Uses MediaPipe for robust facial landmark detection
- **Attribute Conditioning**: 18 facial attributes including:
  - Eye angle, lashes, lid shape
  - Eyebrow shape, thickness, width
  - Face shape, chin length
  - Hair style and color
  - Facial hair presence
  - Glasses detection
  - Skin tone analysis
- **Real-time Generation**: Optimized for fast inference (15-50 steps)
- **High Quality**: Trained on 10k+ cartoon images with paired attributes

## Training Details

### Dataset
- **Source**: CartoonSet10k dataset
- **Size**: 10,000 cartoon images with CSV attribute annotations
- **Split**: 85% training, 15% validation
- **Augmentation**: Random flips, color jittering, rotation

### Training Configuration
- **Epochs**: 110
- **Batch Size**: 16
- **Learning Rate**: 2e-4 with cosine annealing
- **Optimization**: AdamW with gradient clipping
- **Mixed Precision**: FP16 for efficiency
- **Hardware**: NVIDIA T4 GPU

### Loss Function
- **Primary**: MSE loss on predicted noise
- **Scheduler**: DDPM with scaled linear beta schedule
- **Beta Range**: 0.00085 to 0.012

## Usage

### Installation
```bash
pip install torch torchvision
pip install diffusers
pip install mediapipe
pip install opencv-python
pip install Pillow numpy
```

### Basic Usage
```python
import torch
from PIL import Image
import numpy as np
from your_model import OptimizedConditionedUNet, OptimizedMediaPipeExtractor
from diffusers import DDPMScheduler

# Load model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = OptimizedConditionedUNet(
    in_channels=3,
    out_channels=3,
    attr_dim=18,
    base_channels=64
).to(device)

# Load checkpoint
checkpoint = torch.load('best_model.pt', map_location=device)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Initialize components
noise_scheduler = DDPMScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    prediction_type="epsilon"
)

mp_extractor = OptimizedMediaPipeExtractor()

# Generate cartoon from selfie
def generate_cartoon(selfie_path, output_path):
    # Extract facial features
    features = mp_extractor.extract_features(selfie_path)
    features = features.unsqueeze(0).to(device)
    
    # Generate cartoon
    with torch.no_grad():
        # Start with noise
        image = torch.randn(1, 3, 256, 256).to(device)
        
        # Denoising process
        noise_scheduler.set_timesteps(50)
        for t in noise_scheduler.timesteps:
            timesteps = torch.full((1,), t, device=device).long()
            noise_pred = model(image, timesteps, features)
            image = noise_scheduler.step(noise_pred, t, image).prev_sample
        
        # Save result
        image = (image / 2 + 0.5).clamp(0, 1)
        image = image.cpu().squeeze(0).permute(1, 2, 0).numpy()
        image = (image * 255).astype(np.uint8)
        
        result = Image.fromarray(image)
        result.save(output_path)
        return result

# Usage
cartoon = generate_cartoon('selfie.jpg', 'cartoon.png')
```

### Advanced Usage
```python
# Custom attribute manipulation
def generate_with_custom_attributes(base_features, modifications):
    """
    Generate cartoon with modified attributes
    
    Args:
        base_features: Original facial features from selfie
        modifications: Dict of attribute modifications
                      e.g., {'hair_color': 0.8, 'glasses': 0.9}
    """
    modified_features = base_features.clone()
    
    attribute_map = {
        'eye_angle': 0, 'eye_lashes': 1, 'eye_lid': 2,
        'chin_length': 3, 'eyebrow_weight': 4, 'eyebrow_shape': 5,
        'eyebrow_thickness': 6, 'face_shape': 7, 'facial_hair': 8,
        'hair': 9, 'eye_color': 10, 'face_color': 11,
        'hair_color': 12, 'glasses': 13, 'glasses_color': 14,
        'eye_slant': 15, 'eyebrow_width': 16, 'eye_eyebrow_distance': 17
    }
    
    for attr_name, value in modifications.items():
        if attr_name in attribute_map:
            modified_features[0, attribute_map[attr_name]] = value
    
    return generate_from_features(modified_features)
```

## Model Performance

### Metrics
- **Training Loss**: 0.0234 (final)
- **Validation Loss**: 0.0251 (best)
- **Inference Time**: ~2-3 seconds (50 steps, GPU)
- **Memory Usage**: ~4GB GPU memory

### Evaluation
- **Facial Feature Preservation**: High fidelity in maintaining key facial characteristics
- **Style Consistency**: Consistent cartoon art style across generations
- **Attribute Control**: Precise control over 18 facial attributes
- **Robustness**: Handles various lighting conditions and face angles

## Limitations

1. **Face Detection Dependency**: Requires clear facial landmarks for optimal results
2. **Resolution**: Fixed 256x256 output resolution
3. **Style Scope**: Limited to cartoon style present in training data
4. **Attribute Granularity**: 18 attributes may not capture all facial variations
5. **Background**: Focuses on face region, may not handle complex backgrounds well

## Ethical Considerations

- **Consent**: Ensure proper consent when processing personal photos
- **Bias**: Model may reflect biases present in training data
- **Privacy**: Consider privacy implications when processing facial data
- **Misuse**: Potential for creating misleading or fake content

## Citation

```bibtex
@misc{image_to_cartoonify_2024,
  title={Image to Cartoonify: Selfie to Cartoon Generator},
  author={wizcodes12},
  year={2024},
  howpublished={\\url{https://huggingface.co/wizcodes12/image_to_cartoonify}},
}
```

## License

This model is released under the MIT License. See LICENSE file for details.

## Acknowledgments

- CartoonSet10k dataset creators
- MediaPipe team for facial landmark detection
- Diffusers library by Hugging Face
- PyTorch team for the deep learning framework

## Updates

- **v1.0**: Initial release with 110 epochs of training
- **v1.1**: Improved feature extraction and normalization
- **v1.2**: Enhanced attribute conditioning and inference speed

## Contact

For questions, issues, or collaborations, please open an issue on the repository or contact wizcodes12@example.com.

---

*Generated with ❤️ using PyTorch and Diffusers by wizcodes12*