File size: 6,948 Bytes
14cc005 e188a4c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | ---
license: mit
language:
- en
metrics:
- accuracy
pipeline_tag: image-classification
library_name: transformers
tags:
- biology
- med
- chemistry
- code
---
---
# Multi-Cancer Lymphoma Classification with Convolutional Neural Networks (CNN)
## π Overview
This repository contains an end-to-end deep learning pipeline developed in **Python** using **TensorFlow** and **Keras** for the automated classification of lymphoma subtypes within a multi-cancer dataset. The project leverages **Convolutional Neural Networks (CNNs)** to perform supervised image classification on histopathological cancer images, aiming to provide a robust and scalable solution for medical imaging analysis.
The pipeline encompasses:
* Data ingestion and preprocessing with **ImageDataGenerator**
* Training/validation split and augmentation
* Definition and compilation of a deep CNN architecture
* Training with real-time performance evaluation
* Model persistence (`.h5` file format) for later inference
* Custom prediction utility with visualization
This repository is intended for **medical AI researchers**, **machine learning engineers**, and **healthcare data scientists** who seek to apply convolutional neural networks for diagnostic support in oncology.
---
## π Dataset Information
The dataset used in this project is located at:
```
/kaggle/input/multi-cancer/Multi Cancer/Multi Cancer/Lymphoma
```
This directory contains subfolders representing different classes of lymphoma and potentially other cancer subtypes. The **directory structure** is expected to be of the form:
```
Lymphoma/
βββ Class_A/
β βββ img_1.jpg
β βββ img_2.jpg
β βββ ...
βββ Class_B/
β βββ img_3.jpg
β βββ ...
βββ Class_C/
βββ img_4.jpg
βββ ...
```
* Each subfolder corresponds to one diagnostic class.
* The model automatically infers class labels from these subdirectories.
---
## βοΈ Dependencies
This project requires the following core dependencies:
* **Python 3.8+**
* **TensorFlow 2.x**
* **Keras (integrated with TensorFlow)**
* **NumPy**
* **Matplotlib**
To install dependencies:
```bash
pip install tensorflow numpy matplotlib
```
If running on Kaggle or Google Colab, these libraries are already pre-installed.
---
## π§© Code Structure
The main script (`train.py` or notebook cell) is divided into logical sections:
1. **Imports**
* Standard libraries (`os`, `numpy`)
* Scientific libraries (`matplotlib`)
* Deep learning libraries (`tensorflow`, `keras`, `layers`)
2. **Data Pipeline**
* Data preprocessing with `ImageDataGenerator`
* Automatic normalization of pixel intensities (`rescale=1./255`)
* Splitting into training (90%) and validation (10%)
3. **Model Architecture**
* A sequential CNN architecture with the following layers:
* `Conv2D` (32 filters, 3Γ3 kernel, ReLU)
* `MaxPooling2D` (2Γ2)
* `Conv2D` (64 filters, ReLU)
* `MaxPooling2D` (2Γ2)
* `Conv2D` (128 filters, ReLU)
* `MaxPooling2D` (2Γ2)
* `Flatten`
* `Dense` (512 units, ReLU)
* `Dense` (softmax output for multi-class classification)
4. **Compilation**
* Optimizer: **Adam**
* Loss Function: **Categorical Crossentropy**
* Metrics: **Accuracy**
5. **Training**
* Training via `model.fit()`
* `epochs=10`
* Validation data monitoring
6. **Model Persistence**
* Final trained model is saved as `model5.h5`
7. **Prediction Utility** (`guess()` function)
* Takes an input image path
* Resizes and normalizes the image
* Performs forward propagation using the trained model
* Outputs the predicted class with corresponding visualization
---
## π¬ Methodology
The approach relies on **supervised learning** using CNNs for image recognition.
* **Feature Extraction:** Convolutional and pooling layers learn hierarchical spatial representations of cancerous tissue patterns.
* **Classification:** Dense layers map these features into probabilistic class predictions.
* **Normalization:** All images are rescaled to `[0,1]` for stable gradient descent.
* **Generalization:** Validation set (10%) monitors overfitting and ensures out-of-sample reliability.
This is a **baseline model**, and can be extended with:
* **Data Augmentation** (rotation, zoom, shear, flips)
* **Transfer Learning** (e.g., VGG16, ResNet50, EfficientNet)
* **Regularization** (Dropout, L2 penalty)
* **Hyperparameter Optimization** (learning rate, batch size tuning)
---
## π Training Performance
* **Epochs:** 10
* **Batch Size:** 32
* **Image Size:** 150Γ150 (RGB channels)
* **Optimizer:** Adam (adaptive learning rate)
* **Loss Function:** Categorical Crossentropy
* **Evaluation Metric:** Accuracy
Performance metrics will be printed during runtime and can be plotted for visualization. Example outputs include training/validation accuracy and loss curves.
---
## π§ͺ Inference Example
Using the custom `guess()` function:
```python
from tensorflow.keras.models import load_model
# Load model
model = load_model("model5.h5")
# Predict on new image
guess("example_image.jpg", model, train_generator.class_indices)
```
Expected Output:
* The image is displayed.
* The title above the image indicates the **predicted lymphoma subtype**.
---
## π Applications
* **Medical Decision Support:** Assisting oncologists in rapid and preliminary diagnosis of lymphoma subtypes.
* **Research:** Benchmarking CNN performance on histopathological datasets.
* **Education:** Teaching medical students and engineers about AI applications in pathology.
β οΈ **Disclaimer:** This model is for **research and educational purposes only**. It is **not a substitute for professional medical diagnosis**. Clinical deployment requires extensive validation, regulatory approval, and rigorous testing.
---
## π Future Improvements
1. Integrating **transfer learning** for improved accuracy.
2. Expanding dataset size and diversity.
3. Hyperparameter optimization with automated search tools.
4. Deploying as a web application (e.g., Flask, FastAPI, Streamlit).
5. Exporting to **TensorFlow Lite** or **ONNX** for mobile/edge deployment.
---
## π Conclusion
This project demonstrates the development of a robust, reproducible, and interpretable CNN-based classification model for multi-cancer (lymphoma) image analysis. It provides a **solid foundation** for further advancements in AI-driven oncology research.
By following the modular design of this repository, researchers can:
* Reproduce experiments
* Extend the architecture
* Adapt the pipeline for other cancer datasets
This repository bridges the gap between **machine learning engineering** and **medical research**, contributing towards a future where AI supports healthcare professionals in delivering faster, more accurate, and more reliable diagnoses.
---
|