File size: 6,011 Bytes
70862c4 3c0e167 70862c4 e4dc6e6 70862c4 906e7fc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
---
license: apache-2.0
---
# LARS-MobileNet-V4
This repository contains the implementation of the lightweight convolutional neural network architecture described in the paper "Advancing Real-Time Crop Disease Detection on Edge Computing Devices using Lightweight Convolutional Neural Networks."
https://github.com/lars-uav/LARS-MobileNet-V4
## Overview
This project introduces LARS-MobileNetV4, an optimized version of MobileNetV4 specifically designed for real-time crop disease detection on resource-constrained edge devices such as Raspberry Pi. Our implementation achieves 97.84% accuracy on the Paddy Doctor dataset while maintaining fast inference times (88.91ms on Raspberry Pi 5), making it suitable for deployment in agricultural field settings.
## Key Features
- **Optimized MobileNetV4 Architecture**: Enhanced with Squeeze-and-Excitation (SE) blocks and Efficient Channel Attention (ECA) mechanisms
- **Resource-Efficient Design**: Significantly reduced model size (10.2MB) compared to ResNet34 (85.3MB)
- **Real-time Performance**: Average inference time of 39ms on CPU and 88.91ms on Raspberry Pi 5
- **High Accuracy**: 97.84% detection accuracy across 12 common rice diseases
- **Custom Loss Function**: Combination of Focal Loss and Label Smoothing for better handling of class imbalance
- **Comprehensive Data Augmentation**: Robust augmentation pipeline to improve model generalization
- **Deployment-Ready**: Optimized for TFLite deployment on edge devices
## Model Architecture
LARS-MobileNetV4 builds upon the recently introduced MobileNetV4 architecture with several key optimizations:
1. **Universal Inverted Bottleneck (UIB)**: Merges features of Inverted Bottlenecks, ConvNext, and Feed Forward Networks to enhance flexibility in spatial and channel mixing
2. **Mobile Multi-Query Attention (MQA)**: An accelerator-optimized attention mechanism that reduces memory bandwidth bottlenecks
3. **Squeeze-and-Excitation Blocks**: Added to adaptively recalibrate channel-wise feature responses
4. **Efficient Channel Attention**: Captures cross-channel interactions with minimal computational overhead
5. **Neural Architecture Search (NAS)**: Tailored architecture for specific hardware
## Performance Comparison
| Model | Parameters (M) | Accuracy (%) | Model Size (MB) | Inference Time on CPU (ms) | Inference Time on Raspberry Pi 5 (ms) |
| --------------------- | -------------- | ------------ | --------------- | -------------------------- | ------------------------------------- |
| ResNet34 | 21.79 | 97.50 | 85.3 | 148.93 | 264.50 |
| MobileNet-V2 | 3.5 | 92.42 | 9.2 | 40.00 | 73.09 |
| MobileNet-V3 | 2.5 | 95.62 | 10.3 | N/A | N/A |
| MobileNet-V4 | 3.8 | 97.17 | 10.2 | 39.20 | 88.91 |
| **LARS-MobileNet-V4** | **3.8** | **97.84** | **10.2** | **39.20** | **88.91** |
## Training Strategies
Our implementation includes several optimization techniques:
| Model Variation | Train Accuracy (%) | Test Accuracy (%) |
| -------------------------------------------------------------------------------- | ------------------ | ----------------- |
| MobileNet-V4 Baseline | 99.93 | 97.17 |
| MobileNet-V4 (Augmentations) | 99.60 | 97.21 |
| MobileNet-V4 (FocalLabelSmoothingLoss) | 99.71 | 97.79 |
| MobileNet-V4 (Augmentations, FocalLabelSmoothingLoss, Squeeze-Excitation Blocks) | 99.68 | **97.84** |
### Custom Loss Function
We implement a combination of Focal Loss and Label Smoothing:
1. **Label Smoothing**: Redistributes confidence across classes
$$y_{smooth} = (1 - Ξ΅)y + Ξ΅/C$$
where Ξ΅ is the smoothing factor and C is the total number of classes.
2. **Focal Loss**: Focuses on harder examples
$$L_{focal}(pt) = -Ξ±(1 - pt)^Ξ³ log(pt)$$
where pt is the predicted probability for the true class.
3. **Combined Loss (FLS)**:
$$L_{FLS} = -Ξ±(1 - pt)^Ξ³ log(p_{smooth})$$
## Requirements
```
torch
torchvision
timm
numpy
pandas
Pillow
scikit-learn
tqdm
wandb
```
### Data Preparation
Organize your data as follows:
```
βββ train_images/
β βββ disease_class_1/
β β βββ image1.jpg
β β βββ image2.jpg
β β βββ ...
β βββ disease_class_2/
β βββ ...
βββ test_images/
βββ train.csv
```
The train.csv file should contain:
- `image_id`: Filename of the image
- `label`: Disease class name
### Configuration
Key hyperparameters can be modified at the top of the script:
```python
LEARNING_RATE = 0.0001
ARCHITECTURE = "MobileNetV4"
EPOCHS = 50
BATCH_SIZE = 64
OPTIMISER = "Adam"
LOSS_FUNCTION = "FocalLabelSmoothingComboLoss"
NUM_CLASSES = 13 # 12 disease classes + 1 normal class
PRETRAINED = True
```
## Citation
If you use this code in your research, please cite our paper:
```
@article{Nanda, T.R., Shukla, A., Srinivasa, T.R., Bhargava, J., Chauhan, S. (2025).
Advancing Real-Time Crop Disease Detection on Edge Computing Devices Using Lightweight Convolutional Neural Networks.
In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2025.
Lecture Notes in Networks and Systems, vol 1567. Springer, Cham. https://doi.org/10.1007/978-3-032-00071-2_33
}
```
## Acknowledgements
- We use the Paddy Doctor dataset for training and evaluation [Petchiammal et al., 2022] |