|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
library_name: keras |
|
|
pipeline_tag: image-classification |
|
|
language: en |
|
|
tags: |
|
|
- medical-imaging |
|
|
- ct |
|
|
- lung-cancer |
|
|
- efficientnet-b0 |
|
|
- transfer-learning |
|
|
- grad-cam |
|
|
model-index: |
|
|
- name: EfficientNetB0 Lung CT Classifier (4-class) |
|
|
results: |
|
|
- task: |
|
|
type: image-classification |
|
|
name: Image Classification |
|
|
dataset: |
|
|
name: Hany Lung Cancer CT (derived; cleaned) |
|
|
type: custom |
|
|
split: test |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: TODO:0.XX |
|
|
- type: precision |
|
|
value: TODO:0.XX |
|
|
- type: recall |
|
|
value: TODO:0.XX |
|
|
- type: f1 |
|
|
value: TODO:0.XX |
|
|
--- |
|
|
|
|
|
## Attribution |
|
|
|
|
|
**Original Source:** |
|
|
> Hany H. (2020). *Chest CT-Scan Images Dataset*. Kaggle. |
|
|
> [https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset](https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset) |
|
|
|
|
|
**Original License:** |
|
|
> Database: Open Database Commons Open Database License (ODbL v1.0) |
|
|
> [https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/) |
|
|
|
|
|
**Derived Dataset Author:** |
|
|
> Ashley Blackwell (2025). *Chest CT-Scan Images (Cleaned, Derived from Hany et al.)*. Hugging Face Datasets. |
|
|
> https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany |
|
|
|
|
|
--- |
|
|
|
|
|
## Cleaning & Preprocessing Summary |
|
|
|
|
|
The original dataset was processed and curated to ensure **consistency, quality, and reproducibility** for use in deep-learning experiments (i.e.., the *EfficientNet-B0 Lung CT Classifier*). |
|
|
|
|
|
### Steps Performed |
|
|
1. **Integrity Checks:** Removed corrupted or unreadable `.jpg` and `.png` files. |
|
|
2. **Resolution Standardization:** Resized all images to `224 × 224 × 3` pixels. |
|
|
3. **Color Normalization:** Converted grayscale scans to RGB format. |
|
|
4. **Class Organization:** Verified folder structure for four diagnostic categories: |
|
|
- Adenocarcinoma |
|
|
- Large-Cell Carcinoma |
|
|
- Squamous-Cell Carcinoma |
|
|
- Normal |
|
|
5. **Stratified Splits:** |
|
|
- Train: 70% |
|
|
- Validation: 20% |
|
|
- Test: 10% |
|
|
6. **Metadata File:** Generated `metadata.csv` containing filename, class label, and original resolution for traceability. |
|
|
|
|
|
--- |
|
|
|
|
|
## Dataset Overview |
|
|
|
|
|
| Split | Approx. Images | Notes | |
|
|
|:------|---------------:|:------| |
|
|
| Train | ~TODO | Stratified by class | |
|
|
| Validation | ~TODO | For hyperparameter tuning | |
|
|
| Test | ~TODO | Final evaluation set | |
|
|
| **Total** | ~TODO | All cleaned and standardized | |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
- **Purpose:** |
|
|
Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines. |
|
|
|
|
|
- **Out of Scope:** |
|
|
This dataset **must not** be used for clinical diagnosis, treatment decisions, or commercial medical software development. |
|
|
|
|
|
--- |
|
|
|
|
|
## Legal & License Information |
|
|
|
|
|
### License |
|
|
This dataset is distributed under the **Open Data Commons Open Database License (ODbL v1.0)**. |
|
|
You are free to: |
|
|
- **Share:** Copy, distribute, and use the database. |
|
|
- **Create:** Produce works from the database. |
|
|
- **Adapt:** Modify, transform, and build upon the database. |
|
|
|
|
|
Full legal text: |
|
|
[https://opendatacommons.org/licenses/odbl/1-0/](https://opendatacommons.org/licenses/odbl/1-0/) |
|
|
|
|
|
--- |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
- **Purpose:** |
|
|
Designed for research, coursework, and educational demonstrations in medical image classification, model interpretability (Grad-CAM), and reproducible machine learning pipelines. |
|
|
|
|
|
## Scope |
|
|
|
|
|
- **Intended**: |
|
|
Research, UMGC coursework, model-interpretability demos (Grad-CAM), benchmarking. |
|
|
|
|
|
## Out-of-scope: Clinical diagnosis, patient triage, or any safety-critical application. |
|
|
- **Model Architecture** |
|
|
- **Backbone**: EfficientNet-B0 (ImageNet-initialized, fine-tuned) |
|
|
- **Input size**: 224 × 224 × 3 |
|
|
- **Head**: GlobalAveragePooling → Dropout (TODO: rate) → Dense(4, softmax) |
|
|
- **Loss**: Categorical Cross-Entropy |
|
|
- **Optimizer**: TODO (e.g., Adam, lr = 1e-4 with decay) |
|
|
- **Epochs / Batch size**: TODO |
|
|
- **Class labels (index)**: |
|
|
0: Adenocarcinoma |
|
|
1: Large-Cell Carcinoma |
|
|
2: Squamous-Cell Carcinoma |
|
|
3: Normal |
|
|
|
|
|
--- |
|
|
|
|
|
## Data & Preprocessing |
|
|
Source: Derived from Hany Lung Cancer CT Scan dataset (Kaggle). Corrupted and irregular-resolution images were removed and all remaining images standardized to 224×224. |
|
|
Split: Train/Val/Test = 70/20/10 (stratified). |
|
|
Transforms: Resize → RGB conversion → normalize to [0,1] or use preprocess_input. |
|
|
Artifacts logged: Confusion matrix, classification report, Grad-CAM overlays. |
|
|
Attribution: Credit original dataset per its license when sharing or publishing. |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
Test set size: TODO:N |
|
|
Metrics (macro): Accuracy, Precision, Recall, F1 |
|
|
Class Precision Recall F1 Support |
|
|
Adenocarcinoma TODO TODO TODO TODO |
|
|
Large-Cell TODO TODO TODO TODO |
|
|
Squamous TODO TODO TODO TODO |
|
|
Normal TODO TODO TODO TODO |
|
|
Macro Avg TODO TODO TODO N |
|
|
|
|
|
## Suggested Environment |
|
|
tensorflow==2.15.0 |
|
|
keras==2.15.0 |
|
|
huggingface_hub>=0.23.0 |
|
|
numpy>=1.24 |
|
|
|
|
|
--- |
|
|
|
|
|
## Explainability (Grad-CAM) |
|
|
Last conv layer: top_conv for EfficientNet-B0. |
|
|
Tip: Use Grad-CAM to overlay heatmaps and validate that the model focuses on pathologically relevant regions. |
|
|
## Limitations, Bias & Ethical Considerations |
|
|
## Domain shift: CT protocols and scanners vary; may affect generalization. |
|
|
Label noise: Community datasets can contain mislabels. |
|
|
Generalization: Model is not clinically validated. |
|
|
Mitigation: Use Grad-CAM audits and external validation before any applied use. |
|
|
|
|
|
--- |
|
|
|
|
|
## Training & Reproducibility |
|
|
Hardware: TODO (e.g., NVIDIA T4 / A100 / local GPU). |
|
|
Training time: TODO |
|
|
Seed / Determinism: TODO |
|
|
Reproduction steps: TODO (link to notebook or script if available). |
|
|
## License |
|
|
Model weights & code: CC BY-NC-SA 4.0 (non-commercial, share-alike, with attribution). |
|
|
Dataset (derived): Follow the original dataset’s license terms and provide credit to the creator. |
|
|
|
|
|
## Citation |
|
|
If you use this model, please cite: |
|
|
Blackwell, A. (2025). EfficientNet-B0 Lung CT Classifier (4-class) [Computer software]. Hugging Face. https://huggingface.co/TODO |
|
|
@software{blackwell2025lungct, |
|
|
author = {Blackwell, Ashley}, |
|
|
title = {EfficientNet-B0 Lung CT Classifier (4-class)}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/TODO} |
|
|
} |
|
|
👩🏫 Maintainers |
|
|
Ashley Blackwell — **Questions and feedback welcome via the Hugging Face Discussions tab.** |
|
|
🗒 Changelog |
|
|
2025-10-06: Initial public release (.keras weights), added model card, class map, and metric placeholders. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this dataset, please cite both the original source and the derived version: |
|
|
|
|
|
**Original dataset:** |
|
|
> Hany H. (2020). *Chest CT-Scan Images Dataset*. Kaggle. |
|
|
> https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset |
|
|
|
|
|
**Derived version:** |
|
|
> Blackwell, A. (2025). *Chest CT-Scan Images (Cleaned, Derived from Hany et al.)* [Dataset]. Hugging Face. |
|
|
> https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany |
|
|
|
|
|
```bibtex |
|
|
@dataset{hany2020chestct, |
|
|
author = {Hany, H.}, |
|
|
title = {Chest CT-Scan Images Dataset}, |
|
|
year = {2020}, |
|
|
publisher = {Kaggle}, |
|
|
url = {https://www.kaggle.com/datasets/hanyhossam/chest-ctscan-images-dataset} |
|
|
} |
|
|
|
|
|
@dataset{blackwell2025lungctcleaned, |
|
|
author = {Blackwell, Ashley}, |
|
|
title = {Chest CT-Scan Images (Cleaned, Derived from Hany et al.)}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/datasets/ashleyblackwell/lung-ct-cleaned-hany} |
|
|
} |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use (Load & Inference) |
|
|
**Option A — Download from the Hub** |
|
|
- from huggingface_hub import hf_hub_download |
|
|
import json, numpy as np, tensorflow as tf |
|
|
from tensorflow.keras.preprocessing import image |
|
|
|
|
|
REPO_ID = "TODO:your-username/efficientnetb0-lung-ct-4class" |
|
|
|
|
|
model_path = hf_hub_download(repo_id=REPO_ID, filename="model.keras") |
|
|
class_map_path = hf_hub_download(repo_id=REPO_ID, filename="class_map.json") |
|
|
|
|
|
model = tf.keras.models.load_model(model_path, compile=False) |
|
|
with open(class_map_path) as f: |
|
|
idx_to_label = json.load(f) |
|
|
|
|
|
def preprocess(img_path): |
|
|
img = image.load_img(img_path, target_size=(224, 224)) |
|
|
x = image.img_to_array(img) |
|
|
x = np.expand_dims(x, 0) |
|
|
x = x / 255.0 # or use tf.keras.applications.efficientnet.preprocess_input(x) |
|
|
return x |
|
|
|
|
|
x = preprocess("path/to/ct_slice.png") |
|
|
probs = model.predict(x, verbose=0)[0] |
|
|
for i, p in enumerate(probs): |
|
|
print(f"{idx_to_label[str(i)]}: {p:.3f}") |
|
|
print("Predicted:", idx_to_label[str(int(np.argmax(probs)))]) |
|
|
**Option B — Snapshot Download (Local Folder)** |
|
|
from huggingface_hub import snapshot_download |
|
|
local_dir = snapshot_download(repo_id="TODO:your-username/efficientnetb0-lung-ct-4class") |
|
|
# loads ./model.keras and ./class_map.json from local_dir |
|
|
|
|
|
--- |