residual-autoencoder-ensemble / README.md

Upload README.md with huggingface_hub

8daf709 verified about 1 month ago

4.03 kB

	# Residual Convolutional Autoencoder Ensemble

	Deep learning models for image reconstruction using residual convolutional autoencoders.

	## Model Architecture

	Two variants of a deep convolutional autoencoder with residual blocks:

	- Model A: latent_dim=512, dropout=0.15
	- Model B: latent_dim=768, dropout=0.20

	### Architecture Details

	```
	Input: (B, 3, 256, 256) RGB images in range [-1, 1]
	Encoder: 6-layer CNN with residual blocks (256→128→64→32→16→8→4)
	Latent: Fully connected projection to latent_dim
	Decoder: 6-layer TransposeCNN with residual blocks (4→8→16→32→64→128→256)
	Output: (B, 3, 256, 256) Reconstructed images + (B, latent_dim) latent codes
	```

	## Training Details

	- Dataset: Real images (256x256 resolution)
	- Loss: MSE (Mean Squared Error)
	- Optimizer: AdamW with weight decay
	- Training: 100+ epochs with validation monitoring
	- Best Validation Loss:
	- Model A: 0.025486
	- Model B: 0.025033

	## Usage

	```python
	import torch
	from model import ResidualConvAutoencoder, load_model

	# Option 1: Load pre-trained model
	model, checkpoint = load_model('model_a_best.pth', latent_dim=512, dropout=0.15)

	# Option 2: Create from scratch
	model = ResidualConvAutoencoder(latent_dim=512, dropout=0.15)
	model.eval()

	# Prepare image (normalize to [-1, 1])
	from torchvision import transforms
	transform = transforms.Compose([
	transforms.Resize((256, 256)),
	transforms.ToTensor(),
	transforms.Lambda(lambda x: x * 2 - 1) # [0,1] -> [-1,1]
	])

	# Inference
	with torch.no_grad():
	img_tensor = transform(image).unsqueeze(0)
	reconstructed, latent = model(img_tensor)

	# Get reconstruction error
	error = torch.nn.functional.mse_loss(reconstructed, img_tensor)
	```

	## Model Files

	- `model_a_best.pth` - Model A checkpoint (latent_dim=512)
	- `model_b_best.pth` - Model B checkpoint (latent_dim=768)
	- `model.py` - Model architecture definition
	- `config.json` - Training configuration
	- `training_history.json` - Full training metrics

	## Research Findings

	Important Note: These models were trained as image reconstruction autoencoders. Testing revealed they function as enhancement/denoising models rather than anomaly detectors:

	- ✅ Successfully reconstructs natural images
	- ✅ Can denoise corrupted images (JPEG artifacts, blur, contrast)
	- ⚠️ Not suitable for detecting modern AI-generated images
	- ⚠️ Shows negative discrimination for degraded images (reconstructs them better)

	### Performance on Synthetic Corruptions

	\| Corruption Type \| Separation from Real \|
	\|----------------\|---------------------\|
	\| Noise Added \| +122.1% ✅ \|
	\| Color Shifted \| +23.8% ⚠️ \|
	\| Patch Corrupted \| +12.6% ❌ \|
	\| JPEG Compressed \| -9.8% ❌ \|
	\| Contrast Altered \| -90.1% ❌ \|
	\| Blurred \| -92.5% ❌ \|

	Negative percentages indicate the model reconstructs corrupted images better than real images (denoising effect).

	## Limitations

	1. Not an anomaly detector: Models enhance/denoise rather than faithfully reconstruct
	2. Poor for fake detection: Cannot reliably distinguish modern AI-generated images from real ones
	3. Pixel-space limitations: Modern AI images are statistically similar to real images in pixel space

	## Recommended Use Cases

	✅ Image denoising and enhancement
	✅ Feature extraction (latent representations)
	✅ Image compression/reconstruction
	✅ Transfer learning backbone
	❌ Fake image detection (use supervised classifiers instead)
	❌ Anomaly detection (use different approach)

	## Citation

	If you use these models in your research, please cite:

	```
	@model{residual_autoencoder_ensemble_2024,
	author = {ash12321},
	title = {Residual Convolutional Autoencoder Ensemble},
	year = {2024},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/ash12321/residual-autoencoder-ensemble}}
	}
	```

	## License

	MIT License - See LICENSE file for details

	## Contact

	For questions or issues, please open an issue on the Hugging Face model page.