|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- CernovaAI/CANetv1.2 |
|
|
new_version: CernovaAI/CANet-v1.3 |
|
|
pipeline_tag: image-classification |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ๐งฌ Multi-Cancer Image Classification with CNN |
|
|
|
|
|
## ๐ Project Overview |
|
|
|
|
|
This project focuses on the classification of cancer-related medical images using **Convolutional Neural Networks (CNNs)** implemented with **TensorFlow/Keras**. The dataset consists of cancer image samples (in this case from the `ALL` folder under the Multi Cancer dataset on Kaggle). The model is trained to distinguish between different classes within the dataset using supervised learning. |
|
|
|
|
|
Deep learning techniques, specifically **CNN architectures**, are applied to process and classify images automatically without manual feature extraction. This project demonstrates an end-to-end machine learning pipeline from data loading and preprocessing to model training, evaluation, saving, and prediction. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Project Structure |
|
|
|
|
|
``` |
|
|
โโโ Multi Cancer Dataset |
|
|
โ โโโ ALL |
|
|
โ โ โโโ Class_1 |
|
|
โ โ โโโ Class_2 |
|
|
โ โ โโโ ... |
|
|
โ |
|
|
โโโ model5.h5 # Trained CNN model saved in HDF5 format |
|
|
โโโ cancer_classification.py # Main training & prediction script |
|
|
โโโ README.md # Project documentation (this file) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## โ๏ธ Requirements |
|
|
|
|
|
To run this project, you need the following dependencies: |
|
|
|
|
|
* Python 3.8+ |
|
|
* TensorFlow 2.x |
|
|
* NumPy |
|
|
* Matplotlib |
|
|
* Keras (integrated within TensorFlow) |
|
|
* Kaggle Dataset Access (if using Kaggle Notebook) |
|
|
|
|
|
You can install the dependencies using: |
|
|
|
|
|
```bash |
|
|
pip install tensorflow numpy matplotlib |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐งฉ Data Preprocessing |
|
|
|
|
|
The dataset is organized in **directory format** where each folder represents a class label. |
|
|
|
|
|
Example: |
|
|
|
|
|
``` |
|
|
/ALL |
|
|
/Class_1 |
|
|
image1.jpg |
|
|
image2.jpg |
|
|
/Class_2 |
|
|
image1.jpg |
|
|
image2.jpg |
|
|
``` |
|
|
|
|
|
Steps taken: |
|
|
|
|
|
1. **Rescaling Images** โ All images are normalized by scaling pixel values to the range \[0,1]. |
|
|
2. **Image Resizing** โ Every image is resized to **150x150** pixels to ensure uniform input size. |
|
|
3. **Data Augmentation** โ Implemented via `ImageDataGenerator` with: |
|
|
|
|
|
* `rescale=1./255` |
|
|
* `validation_split=0.1` (10% of data reserved for validation) |
|
|
|
|
|
This allows for efficient training and prevents overfitting. |
|
|
|
|
|
```python |
|
|
train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.1) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐๏ธ Model Architecture |
|
|
|
|
|
The model is a **Sequential CNN** consisting of: |
|
|
|
|
|
1. **Conv2D + MaxPooling Layers**: |
|
|
|
|
|
* Extract features from the images. |
|
|
* 3 convolutional layers with increasing filter sizes (32, 64, 128). |
|
|
* Each followed by max pooling to reduce spatial dimensions. |
|
|
|
|
|
2. **Flatten Layer**: |
|
|
|
|
|
* Converts 2D feature maps into 1D feature vectors. |
|
|
|
|
|
3. **Dense Layers**: |
|
|
|
|
|
* Fully connected layers for learning global patterns. |
|
|
* A hidden layer with 512 neurons (ReLU activation). |
|
|
* Output layer with **softmax activation** for multi-class classification. |
|
|
|
|
|
```python |
|
|
model = keras.Sequential([ |
|
|
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)), |
|
|
layers.MaxPooling2D(2, 2), |
|
|
layers.Conv2D(64, (3, 3), activation='relu'), |
|
|
layers.MaxPooling2D(2, 2), |
|
|
layers.Conv2D(128, (3, 3), activation='relu'), |
|
|
layers.MaxPooling2D(2, 2), |
|
|
layers.Flatten(), |
|
|
layers.Dense(512, activation='relu'), |
|
|
layers.Dense(len(train_generator.class_indices), activation='softmax') |
|
|
]) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## โก Model Compilation & Training |
|
|
|
|
|
* **Loss Function:** Categorical Crossentropy |
|
|
* **Optimizer:** Adam |
|
|
* **Metric:** Accuracy |
|
|
|
|
|
```python |
|
|
model.compile(loss='categorical_crossentropy', |
|
|
optimizer='adam', |
|
|
metrics=['accuracy']) |
|
|
``` |
|
|
|
|
|
The model is trained for **10 epochs**: |
|
|
|
|
|
```python |
|
|
model.fit(train_generator, |
|
|
validation_data=validation_generator, |
|
|
epochs=10) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐พ Model Saving |
|
|
|
|
|
After training, the model is saved in `.h5` format: |
|
|
|
|
|
```python |
|
|
model.save("model5.h5") |
|
|
``` |
|
|
|
|
|
This allows reusing the model later without retraining. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ฎ Prediction Function |
|
|
|
|
|
A custom `guess()` function is provided to make predictions on new images: |
|
|
|
|
|
Steps: |
|
|
|
|
|
1. Load and resize image to **150x150**. |
|
|
2. Normalize pixel values. |
|
|
3. Predict with the trained CNN. |
|
|
4. Map prediction to class label. |
|
|
5. Display image with predicted class title. |
|
|
|
|
|
```python |
|
|
def guess(image_path, model, class_indices): |
|
|
img = load_img(image_path, target_size=(150, 150)) |
|
|
img_array = img_to_array(img) / 255.0 |
|
|
img_array = np.expand_dims(img_array, axis=0) |
|
|
|
|
|
prediction = model.predict(img_array) |
|
|
predicted_class = np.argmax(prediction) |
|
|
class_labels = {v: k for k, v in class_indices.items()} |
|
|
predicted_label = class_labels[predicted_class] |
|
|
|
|
|
plt.imshow(img) |
|
|
plt.title(f"model_guess: {predicted_label}") |
|
|
plt.axis("off") |
|
|
plt.show() |
|
|
``` |
|
|
|
|
|
Example usage: |
|
|
|
|
|
```python |
|
|
guess("test_image.jpg", model, train_generator.class_indices) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Results & Evaluation |
|
|
|
|
|
* The training and validation accuracy/loss values are automatically logged. |
|
|
* These can be plotted using `matplotlib` to visualize performance trends. |
|
|
* Example metrics: |
|
|
|
|
|
* Training Accuracy โ 90%+ |
|
|
* Validation Accuracy โ 85โ95% (depending on dataset balance) |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ Possible Improvements |
|
|
|
|
|
* Apply **data augmentation** (rotation, flip, zoom) to generalize better. |
|
|
* Use **Transfer Learning** (e.g., ResNet50, EfficientNet, VGG16) for higher accuracy. |
|
|
* Implement **early stopping & checkpointing** to avoid overfitting. |
|
|
* Increase **epochs** and adjust learning rates for fine-tuning. |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐ References |
|
|
|
|
|
* TensorFlow Documentation: [https://www.tensorflow.org/](https://www.tensorflow.org/) |
|
|
* Keras Image Classification Guide: [https://keras.io/examples/vision/](https://keras.io/examples/vision/) |
|
|
* Kaggle Multi-Cancer Dataset |
|
|
|
|
|
--- |
|
|
|
|
|
## ๐จโ๐ป Author |
|
|
|
|
|
This project was developed as part of a **medical image classification study** using deep learning. It can be extended to other cancer types or generalized to different medical imaging problems such as X-ray, MRI, or CT scan analysis. |
|
|
|
|
|
--- |
|
|
|
|
|
โก **In summary:** |
|
|
This project demonstrates how to build a **deep learning pipeline** for medical image classification with CNNs, using TensorFlow/Keras. It covers everything from **data preprocessing** to **model training, saving, and prediction visualization**. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
|