diatom-models / README.md
kemalbsoylu's picture
upload README.md
55b4319 verified
---
tags:
- image-classification
- object-detection
- deep-learning
- fastai
- pytorch
- ultralytics
- yolo
- bioinformatics
- marine-biology
license: cc-by-nc-sa-4.0
language:
- en
---
# Diatom Detection and Classification Models 🔬
This repository contains the trained model weights for the **Diatom Classifier** project, an end-to-end deep learning pipeline for the automated extraction, detection, and classification of microscopic diatoms.
**[🚀 View the Live Web Application Here](https://huggingface.co/spaces/kemalbsoylu/diatom-classifier)**
## Models Included
This repository hosts two distinct models that work together in a decoupled pipeline:
### 1. YOLOv8 Diatom Detector (`yolo_diatom_detector.pt`)
* **Architecture:** Ultralytics YOLOv8 Nano
* **Task:** Object Detection (Single Class: "Diatom")
* **Performance:** 0.906 mAP50
* **Description:** Trained to analyze full microscope slides and draw bounding boxes around diatoms, regardless of their specific genus.
### 2. ResNet18 Genus Classifier (`v2_resnet18_weighted.pkl`)
* **Architecture:** ResNet18 (via FastAI)
* **Task:** Image Classification (Multi-class Genus identification)
* **Performance:** 94.38% Accuracy (Validation Loss: 0.108)
* **Description:** Evaluates cropped images of individual diatoms to determine their specific genus. Trained using advanced data augmentation (presizing, rotation) and smoothed PyTorch class penalty weights (capped at 15.0x) to handle severe dataset imbalances.
## ⚠️ Security Scanner Notice (False Positives)
You may notice that Hugging Face's security scanners (ClamAV / Picklescan) have flagged these files as "Suspicious" or "Unsafe". **These are known false positives.**
Because these models were exported using Python's native `pickle` serialization (.pkl and .pt), the scanners flag standard built-in Python imports required to reconstruct the models.
* The `v2_resnet18_weighted.pkl` file was exported via FastAI, which bundles the entire inference pipeline (including image loading via `pathlib` and `getattr`), triggering the ClamAV heuristic.
* The models were trained locally from scratch and contain no malicious code.
## Dataset & Licensing
The models were trained on a dataset compiled by Gündüz et al.
> **Citation:** GÜNDÜZ, HÜSEYİN; SOLAK, CÜNEYD NADİR; and GÜNAL, SERKAN (2022) "Segmentation of diatoms using edge detection and deep learning," *Turkish Journal of Electrical Engineering and Computer Sciences*: Vol. 30: No. 6, Article 18.
> DOI: 10.55730/1300-0632.3938
**License:** In accordance with the original dataset, these trained model weights are distributed strictly for non-commercial use under the **CC BY-NC-SA 4.0** license.