| Overview | |
| This project implements a satellite imagery classification pipeline using a DenseNet169 deep convolutional neural network architecture. The model is designed to classify tiles of Sentinel-2 satellite images into multiple land cover classes relevant for deforestation monitoring and land use assessment across India. | |
| The pipeline covers data collection via Google Earth Engine (GEE), data preprocessing, model training with TensorFlow, evaluation, and saving final artifacts. | |
| Model Architecture | |
| Backbone: DenseNet169 pre-trained on ImageNet, used as a feature extractor. | |
| Custom head: GlobalAveragePooling2D, batch normalization, dropout layers, two fully connected dense layers with ReLU activations and L2 regularization, followed by a softmax output layer for multi-class classification. | |
| Training involves a two-stage approach: | |
| Freeze DenseNet base and train only the custom classification head. | |
| Fine-tune selected deeper layers of DenseNet base by unfreezing from a configurable layer index. | |
| Loss function: Categorical cross-entropy. | |
| Optimizer: Adam with learning rate schedules. | |
| Metrics tracked: Accuracy and ROC AUC. | |
| Data and Training | |
| Input image size: 224 x 224 RGB tiles. | |
| Number of classes: Configurable based on the dataset (default six classes including heavily_deforested_area, healthy_forest, farmland, etc.). | |
| Data loading: TensorFlow datasets are created from CSV files listing image paths and labels; data augmentation applied during training (random flips, brightness, contrast changes). | |
| Batch size and number of epochs configurable from CLI arguments. | |
| Early stopping, learning rate reduction on plateau, and model checkpointing callbacks implemented for robust training. | |
| Functionality Summary | |
| Data collection workers utilize Google Earth Engine API to download image tiles filtered for cloud coverage and quality metrics. | |
| Tiles are geographically split into train, validation, and test sets to reduce data leakage by region. | |
| The VanRakshakClassifier class encapsulates model building, training (with fine-tuning), evaluation (classification report, confusion matrix), and model saving with metadata. | |
| Plotting helpers generate figures for training history and confusion matrix visualization. | |
| Command-line interface supports parameters for service account credentials, sample counts, batch size, debug mode, dry run for local testing without GEE calls, timeout, and worker thread count. | |
| Outputs and Artifacts | |
| Trained model saved as a .keras file including weights. | |
| Metadata JSON containing model parameters and class names. | |
| Training history JSON logging epoch-wise metrics. | |
| Evaluation results JSON with test metrics, classification report per class, and confusion matrix. | |
| Visualizations of training curves and confusion matrix saved as PNGs. | |
| Usage | |
| Run the pipeline with optional parameters to customize data collection and training. For example: | |
| bash | |
| python vanrakshak_fixed_pipeline.py --sa-key path/to/key.json --samples 2000 --batch-size 16 --debug | |
| The pipeline performs the following steps sequentially: | |
| Initialize Google Earth Engine. | |
| Create train/val/test geographic splits of sample regions. | |
| Download and filter Sentinel-2 tiles by cloud cover and quality. | |
| Prepare TensorFlow datasets with augmentation. | |
| Create, train, and fine-tune DenseNet169 classifier. | |
| Evaluate on held-out test data and save model and results. | |
| Requirements | |
| Python 3 | |
| TensorFlow (including Keras) | |
| scikit-learn | |
| matplotlib | |
| numpy, requests, Pillow | |
| google-auth, earthengine-api | |
| References | |
| Huang, Gao et al. "Densely Connected Convolutional Networks," CVPR 2017. | |
| Sentinel-2 satellite data accessed via Google Earth Engine. | |
| This README provides an overview to use, modify, or extend the VanRakshak DenseNet169 image classification pipeline for satellite data-based land cover classification. |