vanrakshak_model / README.md

Update README.md

86f800b verified 3 months ago

3.82 kB

	Overview
	This project implements a satellite imagery classification pipeline using a DenseNet169 deep convolutional neural network architecture. The model is designed to classify tiles of Sentinel-2 satellite images into multiple land cover classes relevant for deforestation monitoring and land use assessment across India.

	The pipeline covers data collection via Google Earth Engine (GEE), data preprocessing, model training with TensorFlow, evaluation, and saving final artifacts.

	Model Architecture
	Backbone: DenseNet169 pre-trained on ImageNet, used as a feature extractor.

	Custom head: GlobalAveragePooling2D, batch normalization, dropout layers, two fully connected dense layers with ReLU activations and L2 regularization, followed by a softmax output layer for multi-class classification.

	Training involves a two-stage approach:

	Freeze DenseNet base and train only the custom classification head.

	Fine-tune selected deeper layers of DenseNet base by unfreezing from a configurable layer index.

	Loss function: Categorical cross-entropy.

	Optimizer: Adam with learning rate schedules.

	Metrics tracked: Accuracy and ROC AUC.

	Data and Training
	Input image size: 224 x 224 RGB tiles.

	Number of classes: Configurable based on the dataset (default six classes including heavily_deforested_area, healthy_forest, farmland, etc.).

	Data loading: TensorFlow datasets are created from CSV files listing image paths and labels; data augmentation applied during training (random flips, brightness, contrast changes).

	Batch size and number of epochs configurable from CLI arguments.

	Early stopping, learning rate reduction on plateau, and model checkpointing callbacks implemented for robust training.

	Functionality Summary
	Data collection workers utilize Google Earth Engine API to download image tiles filtered for cloud coverage and quality metrics.

	Tiles are geographically split into train, validation, and test sets to reduce data leakage by region.

	The VanRakshakClassifier class encapsulates model building, training (with fine-tuning), evaluation (classification report, confusion matrix), and model saving with metadata.

	Plotting helpers generate figures for training history and confusion matrix visualization.

	Command-line interface supports parameters for service account credentials, sample counts, batch size, debug mode, dry run for local testing without GEE calls, timeout, and worker thread count.

	Outputs and Artifacts
	Trained model saved as a .keras file including weights.

	Metadata JSON containing model parameters and class names.

	Training history JSON logging epoch-wise metrics.

	Evaluation results JSON with test metrics, classification report per class, and confusion matrix.

	Visualizations of training curves and confusion matrix saved as PNGs.

	Usage
	Run the pipeline with optional parameters to customize data collection and training. For example:

	bash
	python vanrakshak_fixed_pipeline.py --sa-key path/to/key.json --samples 2000 --batch-size 16 --debug
	The pipeline performs the following steps sequentially:

	Initialize Google Earth Engine.

	Create train/val/test geographic splits of sample regions.

	Download and filter Sentinel-2 tiles by cloud cover and quality.

	Prepare TensorFlow datasets with augmentation.

	Create, train, and fine-tune DenseNet169 classifier.

	Evaluate on held-out test data and save model and results.

	Requirements
	Python 3

	TensorFlow (including Keras)

	scikit-learn

	matplotlib

	numpy, requests, Pillow

	google-auth, earthengine-api

	References
	Huang, Gao et al. "Densely Connected Convolutional Networks," CVPR 2017.

	Sentinel-2 satellite data accessed via Google Earth Engine.

	This README provides an overview to use, modify, or extend the VanRakshak DenseNet169 image classification pipeline for satellite data-based land cover classification.