efficientnetb0-ct / README.md
ashbwell's picture
Update README.md
49cb8e3 verified
---
license: cc-by-nc-sa-4.0
library_name: keras
pipeline_tag: image-classification
language: en
tags:
- medical-imaging
- ct
- lung-cancer
- efficientnet-b0
- transfer-learning
- grad-cam
model-index:
- name: EfficientNetB0 Lung CT Classifier (4-class)
results:
- task:
type: image-classification
name: Image Classification
dataset:
name: Hany Lung Cancer CT (derived; cleaned)
type: custom
split: test
metrics:
- type: accuracy
value: TODO:0.XX
- type: precision
value: TODO:0.XX
- type: recall
value: TODO:0.XX
- type: f1
value: TODO:0.XX
---
## Attribution
**Original Source:**
> Hany H. (2020). *Chest CT-Scan Images Dataset*. Kaggle.
> [https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset](https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset)
**Original License:**
> Database: Open Database Commons Open Database License (ODbL v1.0)
> [https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/)
**Derived Dataset Author:**
> Ashley Blackwell (2025). *Chest CT-Scan Images (Cleaned, Derived from Hany et al.)*. Hugging Face Datasets.
> https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany
---
## Cleaning & Preprocessing Summary
The original dataset was processed and curated to ensure **consistency, quality, and reproducibility** for use in deep-learning experiments (i.e.., the *EfficientNet-B0 Lung CT Classifier*).
### Steps Performed
1. **Integrity Checks:** Removed corrupted or unreadable `.jpg` and `.png` files.
2. **Resolution Standardization:** Resized all images to `224 × 224 × 3` pixels.
3. **Color Normalization:** Converted grayscale scans to RGB format.
4. **Class Organization:** Verified folder structure for four diagnostic categories:
- Adenocarcinoma
- Large-Cell Carcinoma
- Squamous-Cell Carcinoma
- Normal
5. **Stratified Splits:**
- Train: 70%
- Validation: 20%
- Test: 10%
6. **Metadata File:** Generated `metadata.csv` containing filename, class label, and original resolution for traceability.
---
## Dataset Overview
| Split | Approx. Images | Notes |
|:------|---------------:|:------|
| Train | ~TODO | Stratified by class |
| Validation | ~TODO | For hyperparameter tuning |
| Test | ~TODO | Final evaluation set |
| **Total** | ~TODO | All cleaned and standardized |
---
## Intended Use
- **Purpose:**
Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.
- **Out of Scope:**
This dataset **must not** be used for clinical diagnosis, treatment decisions, or commercial medical software development.
---
## Legal & License Information
### License
This dataset is distributed under the **Open Data Commons Open Database License (ODbL v1.0)**.
You are free to:
- **Share:** Copy, distribute, and use the database.
- **Create:** Produce works from the database.
- **Adapt:** Modify, transform, and build upon the database.
Full legal text:
[https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/)
---
## Intended Use
- **Purpose:**
Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines.
## Scope
- **Intended**:
Research, UMGC coursework, model-interpretability demos (Grad-CAM), benchmarking.
## Out-of-scope: Clinical diagnosis, patient triage, or any safety-critical application.
- **Model Architecture**
- **Backbone**: EfficientNet-B0 (ImageNet-initialized, fine-tuned)
- **Input size**: 224 × 224 × 3
- **Head**: GlobalAveragePooling → Dropout (TODO: rate) → Dense(4, softmax)
- **Loss**: Categorical Cross-Entropy
- **Optimizer**: TODO (e.g., Adam, lr = 1e-4 with decay)
- **Epochs / Batch size**: TODO
- **Class labels (index)**:
0: Adenocarcinoma
1: Large-Cell Carcinoma
2: Squamous-Cell Carcinoma
3: Normal
---
## Data & Preprocessing
Source: Derived from Hany Lung Cancer CT Scan dataset (Kaggle). Corrupted and irregular-resolution images were removed and all remaining images standardized to 224×224.
Split: Train/Val/Test = 70/20/10 (stratified).
Transforms: Resize → RGB conversion → normalize to [0,1] or use preprocess_input.
Artifacts logged: Confusion matrix, classification report, Grad-CAM overlays.
Attribution: Credit original dataset per its license when sharing or publishing.
---
## Evaluation
Test set size: TODO:N
Metrics (macro): Accuracy, Precision, Recall, F1
Class Precision Recall F1 Support
Adenocarcinoma TODO TODO TODO TODO
Large-Cell TODO TODO TODO TODO
Squamous TODO TODO TODO TODO
Normal TODO TODO TODO TODO
Macro Avg TODO TODO TODO N
## Suggested Environment
tensorflow==2.15.0
keras==2.15.0
huggingface_hub>=0.23.0
numpy>=1.24
---
## Explainability (Grad-CAM)
Last conv layer: top_conv for EfficientNet-B0.
Tip: Use Grad-CAM to overlay heatmaps and validate that the model focuses on pathologically relevant regions.
## Limitations, Bias & Ethical Considerations
## Domain shift: CT protocols and scanners vary; may affect generalization.
Label noise: Community datasets can contain mislabels.
Generalization: Model is not clinically validated.
Mitigation: Use Grad-CAM audits and external validation before any applied use.
---
## Training & Reproducibility
Hardware: TODO (e.g., NVIDIA T4 / A100 / local GPU).
Training time: TODO
Seed / Determinism: TODO
Reproduction steps: TODO (link to notebook or script if available).
## License
Model weights & code: CC BY-NC-SA 4.0 (non-commercial, share-alike, with attribution).
Dataset (derived): Follow the original dataset’s license terms and provide credit to the creator.
## Citation
If you use this model, please cite:
Blackwell, A. (2025). EfficientNet-B0 Lung CT Classifier (4-class) [Computer software]. Hugging Face. https://huggingface.co/TODO
@software{blackwell2025lungct,
author = {Blackwell, Ashley},
title = {EfficientNet-B0 Lung CT Classifier (4-class)},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/TODO}
}
👩‍🏫 Maintainers
Ashley Blackwell — **Questions and feedback welcome via the Hugging Face Discussions tab.**
🗒 Changelog
2025-10-06: Initial public release (.keras weights), added model card, class map, and metric placeholders.
---
## Citation
If you use this dataset, please cite both the original source and the derived version:
**Original dataset:**
> Hany H. (2020). *Chest CT-Scan Images Dataset*. Kaggle.
> https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset
**Derived version:**
> Blackwell, A. (2025). *Chest CT-Scan Images (Cleaned, Derived from Hany et al.)* [Dataset]. Hugging Face.
> https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany
```bibtex
@dataset{hany2020chestct,
author = {Hany, H.},
title = {Chest CT-Scan Images Dataset},
year = {2020},
publisher = {Kaggle},
url = {https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset}
}
@dataset{blackwell2025lungctcleaned,
author = {Blackwell, Ashley},
title = {Chest CT-Scan Images (Cleaned, Derived from Hany et al.)},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany}
}
---
## How to Use (Load & Inference)
**Option A — Download from the Hub**
- from huggingface_hub import hf_hub_download
import json, numpy as np, tensorflow as tf
from tensorflow.keras.preprocessing import image
REPO_ID = "TODO:your-username/efficientnetb0-lung-ct-4class"
model_path = hf_hub_download(repo_id=REPO_ID, filename="model.keras")
class_map_path = hf_hub_download(repo_id=REPO_ID, filename="class_map.json")
model = tf.keras.models.load_model(model_path, compile=False)
with open(class_map_path) as f:
idx_to_label = json.load(f)
def preprocess(img_path):
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, 0)
x = x / 255.0 # or use tf.keras.applications.efficientnet.preprocess_input(x)
return x
x = preprocess("path/to/ct_slice.png")
probs = model.predict(x, verbose=0)[0]
for i, p in enumerate(probs):
print(f"{idx_to_label[str(i)]}: {p:.3f}")
print("Predicted:", idx_to_label[str(int(np.argmax(probs)))])
**Option B — Snapshot Download (Local Folder)**
from huggingface_hub import snapshot_download
local_dir = snapshot_download(repo_id="TODO:your-username/efficientnetb0-lung-ct-4class")
# loads ./model.keras and ./class_map.json from local_dir
---