Model Card for Aerial Image Classification (CNN & Classic ML)
Model Details
Model Description
This repository contains two types of models for classifying aerial images from the AID dataset:
- Convolutional Neural Network (CNN): A lightweight ResNet-based model.
- Classic Machine Learning: A Bag of Features (BoF) pipeline using SIFT descriptors and Softmax Regression.
These models were developed as part of a machine learning assignment to evaluate deep learning approaches against classical computer vision methods.
- Model types:
- CNN (PyTorch)
- BoVW + ML algorithm (Scikit-learn/Joblib)
- Language(s): English
- Resources:
- CNN: ~1M parameters, ~16MB.
- Classic ML: ~100MB (includes vocabulary).
Model Architecture
The architecture consists of an initial convolution layer followed by three residual blocks.
- Residual Blocks: Enable deeper feature extraction without degradation.
- Layers: Convolution, Batch Normalization, Max-Pooling.
- Input: $600 \times 600$ pixel images (resized as needed by the pipeline).
Uses
Direct Use
The model is intended for classifying high-resolution aerial scenes into one of 30 categories. It is suitable for:
- Autonomous UAV navigation and mapping.
- Environmental monitoring.
- Land use classification.
Downstream Use
This model can be fine-tuned on other aerial or satellite imagery datasets.
Training Data
The model was trained on the Aerial Image Dataset (AID).
- Source: Google Earth imagery.
- Size: 10,000 images.
- Classes: 30 (e.g., Airport, Beach, Forest, Industrial, etc.).
- Split: 90% Training / 10% Test.
Performance
The CNN significantly outperformed classical Machine Learning methods (SVM, Random Forest, etc.) evaluated on the same dataset.
| Metric | Value |
|---|---|
| Test Accuracy | 92.80% |
| Macro Average | 0.93 |
| Weighted Average | 0.93 |
Comparison with Classical Methods
| Model | Test Accuracy |
|---|---|
| CNN (This Model) | 0.9280 |
| SVM (RBF Kernel) | 0.7120 |
| Softmax Regression | 0.6580 |
| Random Forest | 0.5680 |
| Naïve Bayes | 0.5280 |
Limitations
- Data Bias: The model is trained on Google Earth imagery (AID), so it may not generalize perfectly to aerial images with significantly different sensors, resolutions, or lighting conditions.
- Scope: Limited to the 30 classes defined in the AID dataset.
How to Get Started
You can use the provided demo.ipynb notebook for a complete example. Below is a snippet to load both models.
1. Load Classic ML Model
from huggingface_hub import hf_hub_download
import joblib
# Download model
model_path = hf_hub_download(
repo_id="JavideuS/aid-image-classification",
filename="classicML/models/bovw_softmax.pkl"
)
# Load pipeline
bundle = joblib.load(model_path)
pipeline = bundle['pipeline']
label_encoder = bundle['label_encoder']
# Predict
# pipeline.predict(["path/to/image.jpg"])
2. Load CNN Model
import torch
from NeuralNets.model import PiattiCNN # Ensure you have the model definition
from huggingface_hub import hf_hub_download
# Download checkpoints
checkpoints_path = hf_hub_download(
repo_id="JavideuS/aid-image-classification",
filename="neuralNet/models/PiattiVL_v0.69.pth"
)
# Load model
checkpoints = torch.load(checkpoints_path, map_location='cpu')
model = PiattiCNN(num_classes=checkpoints['num_classes'])
model.load_state_dict(checkpoints['model_state_dict'])
model.eval()
# Inference
# ...
Citation
If you use this model or the AID dataset, please cite the original dataset paper:
@article{aid_dataset,
title={AID: A Scene Classification Dataset},
author={Xia, Gui-Song and et al.},
journal={IEEE Transactions on Geoscience and Remote Sensing},
year={2017}
}
Evaluation results
- Test Accuracy on AID (Aerial Image Dataset)self-reported0.928
- Macro F1 on AID (Aerial Image Dataset)self-reported0.930