Spaces:

AIOmarRehan
/

CV_Model_Comparison_in_PyTorch

Sleeping

App Files Files Community

CV_Model_Comparison_in_PyTorch / README.md

AIOmarRehan

Update README.md

d782a6e verified 14 days ago

preview code

raw

history blame contribute delete

6.14 kB

	---
	title: CV Model Comparison In PyTorch
	emoji: 📊
	colorFrom: indigo
	colorTo: gray
	sdk: gradio
	sdk_version: 6.8.0
	app_file: app.py
	pinned: false
	license: mit
	short_description: PyTorch CV models comparison.
	models:
	- AIOmarRehan/PyTorch_Unified_CNN_Model
	datasets:
	- AIOmarRehan/Vehicles
	---

	# PyTorch Model Comparison: From Custom CNNs to Advanced Transfer Learning

	![Python](https://img.shields.io/badge/Python-3.8+-3776AB?style=flat\&logo=python\&logoColor=white)
	![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-EE4C2C?style=flat\&logo=pytorch\&logoColor=white)
	![Gradio](https://img.shields.io/badge/Gradio-4.0+-FF6F00?style=flat\&logo=gradio\&logoColor=white)
	![Hugging Face](https://img.shields.io/badge/Hugging%20Face-Transformers-FFD21E?style=flat\&logo=huggingface\&logoColor=white)
	![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?style=flat\&logo=docker\&logoColor=white)

	---

	## Overview

	This project compares three computer vision approaches in PyTorch on a vehicle classification task:

	1. Custom CNN (trained from scratch)
	2. Vision Transformer (DeiT-Tiny)
	3. Xception with two-phase transfer learning

	The goal is to answer a practical question:

	> On small or moderately sized datasets, should you train from scratch or use transfer learning?

	The results clearly show that transfer learning dramatically improves generalization and reliability, especially when data and compute are limited.

	---

	## Architectures Compared

	### Custom CNN (From Scratch)

	A traditional convolutional network built manually with Conv → ReLU → Pooling blocks and fully connected layers.

	Philosophy: Full architectural control, no pre-training.

	Minimal structure:

	```python
	class CustomCNN(nn.Module):
	def __init__(self, num_classes):
	super().__init__()
	self.features = nn.Sequential(
	nn.Conv2d(3, 32, 3, padding=1),
	nn.ReLU(),
	nn.MaxPool2d(2),
	nn.Conv2d(32, 64, 3, padding=1),
	nn.ReLU(),
	nn.MaxPool2d(2)
	)
	self.classifier = nn.Sequential(
	nn.Linear(64 * 56 * 56, 256),
	nn.ReLU(),
	nn.Dropout(0.5),
	nn.Linear(256, num_classes)
	)
	```

	Reality on small datasets:

	* Slower convergence
	* Higher variance
	* Larger generalization gap

	---

	### Vision Transformer (DeiT-Tiny)

	Using Hugging Face's pre-trained Vision Transformer:

	```python
	model = AutoModelForImageClassification.from_pretrained(
	"facebook/deit-tiny-patch16-224",
	num_labels=num_classes,
	ignore_mismatched_sizes=True
	)
	```

	Trained with the Hugging Face `Trainer` API.

	Advantages:

	* Stable convergence
	* Lightweight
	* Easy deployment
	* Good performance-to-efficiency ratio

	---

	### Xception (Two-Phase Transfer Learning)

	Implemented using `timm`.

	### Phase 1 - Train Classifier Head Only

	```python
	model = timm.create_model("xception", pretrained=True)

	for param in model.parameters():
	param.requires_grad = False

	model.fc = nn.Sequential(
	nn.Linear(in_features, 512),
	nn.ReLU(),
	nn.Dropout(0.5),
	nn.Linear(512, num_classes)
	)
	```

	### Phase 2 - Fine-Tune Selected Layers

	```python
	for name, param in model.named_parameters():
	if "block14" in name or "fc" in name:
	param.requires_grad = True
	```

	Lower learning rate used during fine-tuning.

	Result:
	- Smoothest training curves
	- Lowest validation loss
	- Highest test accuracy
	- Strongest performance on unseen internet images

	---

	## Comparative Results

	\| Model \| Validation Performance \| Generalization \| Stability \|
	\| ---------- \| ---------------------- \| -------------- \| ----------- \|
	\| Custom CNN \| High variance \| Weak \| Unstable \|
	\| DeiT-Tiny \| Strong \| Good \| Stable \|
	\| Xception \| Best \| Excellent \| Very Stable \|

	### Key Insight

	> High validation accuracy does NOT guarantee real-world reliability.

	Custom CNN achieved strong validation scores (~87%) but struggled more on distribution shifts.

	Xception consistently generalized better.

	---

	## Experimental Visualizations

	### Dataset Distribution Across All Three Models:

	![Chart](https://files.catbox.moe/eyuftl.png)

	---

	### Xception Model:
	![Accuracy & Loss](https://files.catbox.moe/qv7n6e.png)
	### Custom CNN Model:
	![Accuracy & Loss](https://files.catbox.moe/ch8s5d.png)

	---

	### Confusion Matrix between both Models:

	\| Custom CNN \| Xception \|
	\|------------\|----------\|
	\| <img src="https://files.catbox.moe/aulaxo.webp" width="100%"> \| <img src="https://files.catbox.moe/gy6yno.webp" width="100%"> \|

	---

	## Example Test Results (Custom CNN)

	```
	Test Accuracy: 87.89%

	Macro Avg:
	Precision: 0.8852
	Recall: 0.8794
	F1-Score: 0.8789
	```

	Despite solid metrics, performance dropped more noticeably on unseen real-world images compared to Xception.

	---

	## Deployment

	### Run Locally

	```bash
	pip install -r requirements.txt
	python app.py
	```

	Access at:

	```
	http://localhost:7860
	```

	---

	## When to Use Each Approach

	### Use Custom CNN if:

	* Domain is highly specialized
	* Pre-trained features don’t apply
	* You need full architectural control

	### Use Transfer Learning (e.g. DeiT or Xception) if:

	* You want fast experimentation
	* Efficiency matters
	* You prefer high-level APIs
	* You want best accuracy
	* You care about generalization
	* You need production-grade reliability

	---

	## Final Conclusion

	On small or moderately sized datasets:

	> Transfer learning isn’t an optimization - it’s a necessity.

	Training from scratch forces the model to learn both general visual features and task-specific knowledge simultaneously.

	Pre-trained models already understand edges, textures, and spatial structure.
	Your dataset only needs to teach classification boundaries.

	For most real-world tasks:

	* Start with transfer learning
	* Fine-tune carefully
	* Only train from scratch if absolutely necessary

	---

	## Results

	<p align="center">
	<a href="https://files.catbox.moe/ss5ohr.mp4">
	<img src="https://files.catbox.moe/3x5mp7.webp" width="400">
	</a>
	</p>