Rename readme.md to README.md

42c78f7 verified 7 months ago

5.19 kB

	# CIFAR-10 Diffusion Model

	A lightweight diffusion model trained from scratch on the CIFAR-10 dataset in just 14.5 minutes using PyTorch.

	## Model Description

	This is a SimpleUNet-based diffusion model trained to generate 32x32 RGB images similar to the CIFAR-10 dataset. The model demonstrates the fundamentals of diffusion-based image generation with a compact architecture suitable for educational purposes and quick experimentation.

	### Key Features
	- 🚀 Fast Training: Complete training in under 15 minutes on RTX 3060
	- 💾 Lightweight: Only 16.8M parameters (~64MB model size)
	- 🎯 Educational: Clean, well-documented code for learning diffusion models
	- ⚡ Efficient Inference: Generate images in seconds on consumer GPUs

	## Model Details

	\| Attribute \| Value \|
	\|-----------\|-------\|
	\| Architecture \| SimpleUNet with ResNet blocks + Attention \|
	\| Parameters \| 16,808,835 \|
	\| Dataset \| CIFAR-10 (50,000 training images) \|
	\| Image Size \| 32×32 RGB \|
	\| Training Steps \| 7,820 (20 epochs × 391 batches) \|
	\| Training Time \| 14.54 minutes \|
	\| Hardware \| NVIDIA RTX 3060 (0.43GB VRAM used) \|
	\| Framework \| PyTorch 2.0+ \|

	## Quick Start

	### Installation
	```bash
	pip install torch torchvision matplotlib tqdm pillow numpy
	```

	### Basic Usage
	```python
	import torch
	import matplotlib.pyplot as plt

	# Load model
	checkpoint = torch.load('complete_diffusion_model.pth')
	model = SimpleUNet(**checkpoint['model_config'])
	model.load_state_dict(checkpoint['model_state_dict'])
	model.eval()

	# Initialize scheduler
	scheduler = DDPMScheduler(**checkpoint['diffusion_config'])

	# Generate images
	@torch.no_grad()
	def generate_images(model, scheduler, num_images=4):
	device = next(model.parameters()).device
	images = torch.randn(num_images, 3, 32, 32).to(device)

	for t in range(999, -1, -20): # 50 denoising steps
	timestep = torch.full((num_images,), t, device=device)
	noise_pred = model(images, timestep)

	# Simplified DDPM step
	alpha_t = scheduler.alpha_cumprod[t]
	alpha_prev = scheduler.alpha_cumprod[t-20] if t >= 20 else 1.0

	pred_x0 = (images - torch.sqrt(1-alpha_t) * noise_pred) / torch.sqrt(alpha_t)
	images = torch.sqrt(alpha_prev) * pred_x0 + torch.sqrt(1-alpha_prev) * noise_pred

	return images

	# Generate and display
	generated = generate_images(model, scheduler)
	```

	## Training Details

	- Loss Function: MSE between predicted and actual noise
	- Optimizer: AdamW (lr=1e-4, weight_decay=1e-6)
	- Scheduler: CosineAnnealingLR
	- Batch Size: 128
	- Final Loss: 0.0363 (73% reduction from initial)
	- Diffusion Steps: 1000 (linear beta schedule)

	## Performance

	### Training Loss Curve
	The model shows excellent convergence:
	- Epoch 1: 0.1349 → Epoch 20: 0.0363
	- Best Loss: 0.0358 (Epoch 19)
	- Stable convergence without overfitting

	### Generation Quality
	- ✅ Captures CIFAR-10 color distributions
	- ✅ Generates diverse, non-repetitive outputs
	- ⚠️ Abstract patterns (needs longer training for object recognition)
	- 🎯 Suitable for color/texture generation tasks

	## Files in this Repository

	\| File \| Description \| Size \|
	\|------\|-------------\|------\|
	\| `complete_diffusion_model.pth` \| Full model with config and weights \| ~64MB \|
	\| `diffusion_model_final.pth` \| Training checkpoint (epoch 20) \| ~64MB \|
	\| `model_info.json` \| Training metadata and hyperparameters \| <1KB \|
	\| `inference_example.py` \| Complete inference script with model classes \| ~5KB \|

	## Model Architecture

	```
	SimpleUNet(
	time_embedding: TimeEmbedding(128)
	encoder: 3 ResNet blocks with downsampling
	middle: ResNet + Self-Attention + ResNet
	decoder: 3 ResNet blocks with upsampling
	output: GroupNorm → SiLU → Conv2d
	)
	```

	## Use Cases

	- 🎓 Educational: Learn diffusion model fundamentals
	- 🔬 Research: Baseline for diffusion experiments
	- 🎨 Art: Generate abstract textures and patterns
	- ⚡ Prototyping: Quick diffusion model testing

	## Limitations & Improvements

	### Current Limitations
	- Generates abstract patterns rather than recognizable objects
	- Trained on small 32×32 resolution
	- Limited to 20 training epochs

	### Suggested Improvements
	1. Extended Training: 50-100 epochs for better object generation
	2. Larger Architecture: Increase model capacity
	3. Advanced Sampling: Implement DDIM or DPM-Solver++
	4. Higher Resolution: Train on 64×64 or 128×128 images
	5. Better Datasets: Use CelebA-HQ or custom datasets

	## Citation

	```bibtex
	@misc{cifar10-diffusion-2025,
	title={CIFAR-10 Diffusion Model: Fast Training Implementation},
	author={Karthik},
	year={2025},
	publisher={Hugging Face},
	howpublished={\url{https://huggingface.co/karthik-2905/DiffusionPretrained}}
	}
	```

	## License

	MIT License - Free for research and commercial use.

	---

	🚀 Want to train your own? Check out the [full implementation](https://github.com/GruheshKurra/DiffusionModelPretrained) with Jupyter notebooks and step-by-step training code!

	📊 Training Stats: 16.8M params • 14.5min training • RTX 3060 • PyTorch 2.0