--- license: other pipeline_tag: image-classification library_name: pytorch tags: - multimodal - aquaculture - shrimp - disease-detection - computer-vision - time-series - sensor-fusion - uncertainty --- # ShrimpFusionNet for Real-Time Shrimp Disease Detection Using Trust-Aware Multimodal Fusion MMSD25 is a real-world multimodal shrimp disease dataset introduced in this paper. This repository provides a public sanitized reference subset of MMSD25 together with the benchmark protocol to support reproducibility and further research. ## 1. Dataset Overview MMSD25 is designed for shrimp disease detection under real aquaculture conditions, where data are noisy, heterogeneous, asynchronous, and partially missing. The dataset integrates three modalities: - RGB shrimp images captured directly in ponds - Farmer-written textual reports describing shrimp health and pond observations - Environmental sensor streams, including: - Temperature - pH - Dissolved oxygen - Turbidity - Salinity Data were collected from 8 shrimp ponds in the Mekong Delta, Vietnam, under diverse environmental and operational conditions. ## 2. Public Release Scope ### What is publicly released This repository and the associated Hugging Face page provide: - A **sanitized reference subset** of MMSD25 - The **full benchmark protocol**, including: - Data preprocessing procedures The public subset is intended to demonstrate data structure. ### What is NOT publicly released - The **full MMSD25 dataset is NOT publicly available** - Full raw data are restricted due to data governance and farm partner agreements Access to the full dataset may be considered for **non-commercial academic research only**, subject to a controlled-access agreement. ## 3. Dataset Composition (Full Dataset Description) The full MMSD25 dataset (described in the paper) consists of: - 3, 625 RGB shrimp images - 12,404 farmer-generated text descriptions - Synchronized multi-channel sensor time series - 5 disease classes: - Healthy - WSSV - AHPND - EHP - Bacterial necrosis Each sample is verified by aquaculture experts, with inter-annotator agreement reaching Cohen’s κ = 0.86. ## 4. Train / Validation / Test Split The benchmark uses a **region-based (pond-level) split** to evaluate generalization: - Training set: 70% of ponds - Validation set: 10% of ponds - Test set: 20% of ponds (unseen ponds) This setup supports zero-shot domain evaluation under real deployment conditions. ## 5. Hugging Face Repository The public reference subset is hosted on Hugging Face: https://huggingface.co/ducdatit2002/ShrimpFusionNet ## 6. Intended Use MMSD25 is intended for research on: - Multimodal learning (image + text + sensor) - Trust-aware and uncertainty-aware fusion - Robust learning under noisy and missing modalities - Edge AI and IoT-based aquaculture systems The dataset is **not intended for commercial use**. ## 7. Limitations - The public subset is not statistically representative of the full dataset - Some environmental and operational variability present in the full dataset is not exposed - Results obtained on the public subset should not be interpreted as full benchmark performance ## 8. Citation If you use MMSD25 or the benchmark protocol, please cite: ```bibtex @article{shrimpfusionnet2025, title={ShrimpFusionNet for Real-Time Shrimp Disease Detection Using Trust-Aware Multimodal Fusion}, author={Le, Tan Duy and Huynh, Kha Tu and Pham, Duc Dat and Nguyen, Hong Quan and Nguyen, Minh Tu}, year={2025} } ```` ## 9. License The public subset of MMSD25 is released for **non-commercial research use only**. ## 11. Contact For questions or controlled access requests to the full dataset: * Duc Dat Pham * Email: [ducdatit2002@gmail.com](mailto:ducdatit2002@gmail.com)