Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,12 +1,128 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ๐๏ธ Landmark Detection using VGG-19 (Transfer Learning)
|
| 2 |
+
|
| 3 |
+
This project implements an end-to-end **Landmark Image Classification system** using a pre-trained **VGG-19 Convolutional Neural Network** and deploys it as a web application using **Gradio** on **Hugging Face Spaces**.
|
| 4 |
+
|
| 5 |
+
The model is trained on a subset of the **Google Landmarks Dataset v2 (Top 51 Categories)** and is capable of predicting the landmark class of an input image.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## ๐ Features
|
| 10 |
+
|
| 11 |
+
* Automatic dataset loading from Hugging Face (`pemujo/GLDv2_Top_51_Categories`)
|
| 12 |
+
* Image preprocessing (resizing, normalization)
|
| 13 |
+
* Transfer Learning using **VGG-19 pretrained on ImageNet**
|
| 14 |
+
* Custom classification head for landmark recognition
|
| 15 |
+
* Training and evaluation on CPU
|
| 16 |
+
* Interactive **Gradio Web Interface** for real-time prediction
|
| 17 |
+
|
| 18 |
---
|
| 19 |
+
|
| 20 |
+
## ๐ง Model Architecture
|
| 21 |
+
|
| 22 |
+
* Base Model: VGG-19 (without top classification layers)
|
| 23 |
+
* Frozen convolutional layers for feature extraction
|
| 24 |
+
* Custom layers:
|
| 25 |
+
|
| 26 |
+
* Flatten
|
| 27 |
+
* Dense (ReLU)
|
| 28 |
+
* Dropout
|
| 29 |
+
* Softmax output layer (51 classes)
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## ๐ Dataset
|
| 34 |
+
|
| 35 |
+
* Source: Google Landmarks v2 (Top 51 Categories)
|
| 36 |
+
* Platform: Hugging Face Datasets
|
| 37 |
+
* Classes: 51 landmark categories
|
| 38 |
+
* Format: Image + Numeric Label
|
| 39 |
+
|
| 40 |
+
Due to numeric class encoding, outputs are displayed as:
|
| 41 |
+
|
| 42 |
+
```
|
| 43 |
+
Landmark_0, Landmark_1, ..., Landmark_50
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## ๐ ๏ธ Tech Stack
|
| 49 |
+
|
| 50 |
+
* Python
|
| 51 |
+
* TensorFlow / Keras
|
| 52 |
+
* Hugging Face Datasets
|
| 53 |
+
* Gradio
|
| 54 |
+
* NumPy
|
| 55 |
+
* Pillow
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## โถ๏ธ How It Works
|
| 60 |
+
|
| 61 |
+
1. Dataset is downloaded automatically using Hugging Face `datasets` library.
|
| 62 |
+
2. Images are resized to 224ร224 and normalized.
|
| 63 |
+
3. VGG-19 extracts deep visual features.
|
| 64 |
+
4. Custom classifier predicts the landmark category.
|
| 65 |
+
5. Gradio provides a web interface for uploading and classifying images.
|
| 66 |
+
|
| 67 |
---
|
| 68 |
|
| 69 |
+
## ๐ฆ Installation (Handled Automatically in Space)
|
| 70 |
+
|
| 71 |
+
Dependencies are listed in `requirements.txt`:
|
| 72 |
+
|
| 73 |
+
```
|
| 74 |
+
tensorflow
|
| 75 |
+
datasets
|
| 76 |
+
pillow
|
| 77 |
+
matplotlib
|
| 78 |
+
gradio
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## ๐ผ๏ธ Web App Usage
|
| 84 |
+
|
| 85 |
+
1. Upload a landmark image (JPEG/PNG).
|
| 86 |
+
2. The model predicts the most probable landmark class.
|
| 87 |
+
3. Output is shown as `Landmark_X`.
|
| 88 |
+
|
| 89 |
+
---
|
| 90 |
+
|
| 91 |
+
## ๐ Sample Output
|
| 92 |
+
|
| 93 |
+
```
|
| 94 |
+
Prediction: Landmark_43
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
This indicates the input image belongs to the 44th landmark category learned by the model.
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## ๐ Academic Use
|
| 102 |
+
|
| 103 |
+
This project demonstrates:
|
| 104 |
+
|
| 105 |
+
* Transfer Learning
|
| 106 |
+
* Deep CNN Feature Extraction
|
| 107 |
+
* Image Classification Pipeline
|
| 108 |
+
* Model Deployment with Gradio
|
| 109 |
+
* Reproducible ML using Hugging Face Hub
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
## ๐จโ๐ Author
|
| 114 |
+
|
| 115 |
+
**Apurba Das**
|
| 116 |
+
Landmark Detection Project
|
| 117 |
+
Deep Learning & Computer Vision
|
| 118 |
+
VGG-19 | TensorFlow | Hugging Face | Gradio
|
| 119 |
+
|
| 120 |
+
---
|
| 121 |
+
|
| 122 |
+
## ๐ฎ Future Improvements
|
| 123 |
+
|
| 124 |
+
* Map numeric labels to actual landmark names
|
| 125 |
+
* Fine-tune deeper VGG-19 layers
|
| 126 |
+
* Add top-k prediction probabilities
|
| 127 |
+
* Deploy with GPU for faster training
|
| 128 |
+
* Expand to full Google Landmarks v2 dataset
|