ShrimpFusionNet / README.md
ducdatit2002's picture
Update README.md
be8c363 verified
---
license: other
pipeline_tag: image-classification
library_name: pytorch
tags:
- multimodal
- aquaculture
- shrimp
- disease-detection
- computer-vision
- time-series
- sensor-fusion
- uncertainty
---
# ShrimpFusionNet for Real-Time Shrimp Disease Detection Using Trust-Aware Multimodal Fusion
MMSD25 is a real-world multimodal shrimp disease dataset introduced in this paper.
This repository provides a public sanitized reference subset of MMSD25 together with the benchmark protocol to support reproducibility and further research.
## 1. Dataset Overview
MMSD25 is designed for shrimp disease detection under real aquaculture conditions, where data are noisy, heterogeneous, asynchronous, and partially missing.
The dataset integrates three modalities:
- RGB shrimp images captured directly in ponds
- Farmer-written textual reports describing shrimp health and pond observations
- Environmental sensor streams, including:
- Temperature
- pH
- Dissolved oxygen
- Turbidity
- Salinity
Data were collected from 8 shrimp ponds in the Mekong Delta, Vietnam, under diverse environmental and operational conditions.
## 2. Public Release Scope
### What is publicly released
This repository and the associated Hugging Face page provide:
- A **sanitized reference subset** of MMSD25
- The **full benchmark protocol**, including:
- Data preprocessing procedures
The public subset is intended to demonstrate data structure.
### What is NOT publicly released
- The **full MMSD25 dataset is NOT publicly available**
- Full raw data are restricted due to data governance and farm partner agreements
Access to the full dataset may be considered for **non-commercial academic research only**, subject to a controlled-access agreement.
## 3. Dataset Composition (Full Dataset Description)
The full MMSD25 dataset (described in the paper) consists of:
- 3, 625 RGB shrimp images
- 12,404 farmer-generated text descriptions
- Synchronized multi-channel sensor time series
- 5 disease classes:
- Healthy
- WSSV
- AHPND
- EHP
- Bacterial necrosis
Each sample is verified by aquaculture experts, with inter-annotator agreement reaching Cohen’s κ = 0.86.
## 4. Train / Validation / Test Split
The benchmark uses a **region-based (pond-level) split** to evaluate generalization:
- Training set: 70% of ponds
- Validation set: 10% of ponds
- Test set: 20% of ponds (unseen ponds)
This setup supports zero-shot domain evaluation under real deployment conditions.
## 5. Hugging Face Repository
The public reference subset is hosted on Hugging Face:
https://huggingface.co/ducdatit2002/ShrimpFusionNet
## 6. Intended Use
MMSD25 is intended for research on:
- Multimodal learning (image + text + sensor)
- Trust-aware and uncertainty-aware fusion
- Robust learning under noisy and missing modalities
- Edge AI and IoT-based aquaculture systems
The dataset is **not intended for commercial use**.
## 7. Limitations
- The public subset is not statistically representative of the full dataset
- Some environmental and operational variability present in the full dataset is not exposed
- Results obtained on the public subset should not be interpreted as full benchmark performance
## 8. Citation
If you use MMSD25 or the benchmark protocol, please cite:
```bibtex
@article{shrimpfusionnet2025,
title={ShrimpFusionNet for Real-Time Shrimp Disease Detection Using Trust-Aware Multimodal Fusion},
author={Le, Tan Duy and Huynh, Kha Tu and Pham, Duc Dat and Nguyen, Hong Quan and Nguyen, Minh Tu},
year={2025}
}
````
## 9. License
The public subset of MMSD25 is released for **non-commercial research use only**.
## 11. Contact
For questions or controlled access requests to the full dataset:
* Duc Dat Pham
* Email: [ducdatit2002@gmail.com](mailto:ducdatit2002@gmail.com)