File size: 6,438 Bytes
45c7fea
 
 
 
 
 
 
 
 
 
4fb5c2f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
---
license: mit
language:
- en
metrics:
- accuracy
base_model:
- CernovaAI/CANetv1.2
new_version: CernovaAI/CANet-v1.3
pipeline_tag: image-classification
---







# ๐Ÿงฌ Multi-Cancer Image Classification with CNN

## ๐Ÿ“Œ Project Overview

This project focuses on the classification of cancer-related medical images using **Convolutional Neural Networks (CNNs)** implemented with **TensorFlow/Keras**. The dataset consists of cancer image samples (in this case from the `ALL` folder under the Multi Cancer dataset on Kaggle). The model is trained to distinguish between different classes within the dataset using supervised learning.

Deep learning techniques, specifically **CNN architectures**, are applied to process and classify images automatically without manual feature extraction. This project demonstrates an end-to-end machine learning pipeline from data loading and preprocessing to model training, evaluation, saving, and prediction.

---

## ๐Ÿ“‚ Project Structure

```
โ”œโ”€โ”€ Multi Cancer Dataset
โ”‚   โ”œโ”€โ”€ ALL
โ”‚   โ”‚   โ”œโ”€โ”€ Class_1
โ”‚   โ”‚   โ”œโ”€โ”€ Class_2
โ”‚   โ”‚   โ”œโ”€โ”€ ...
โ”‚
โ”œโ”€โ”€ model5.h5                # Trained CNN model saved in HDF5 format
โ”œโ”€โ”€ cancer_classification.py  # Main training & prediction script
โ”œโ”€โ”€ README.md                 # Project documentation (this file)
```

---

## โš™๏ธ Requirements

To run this project, you need the following dependencies:

* Python 3.8+
* TensorFlow 2.x
* NumPy
* Matplotlib
* Keras (integrated within TensorFlow)
* Kaggle Dataset Access (if using Kaggle Notebook)

You can install the dependencies using:

```bash
pip install tensorflow numpy matplotlib
```

---

## ๐Ÿงฉ Data Preprocessing

The dataset is organized in **directory format** where each folder represents a class label.

Example:

```
/ALL
    /Class_1
        image1.jpg
        image2.jpg
    /Class_2
        image1.jpg
        image2.jpg
```

Steps taken:

1. **Rescaling Images** โ€“ All images are normalized by scaling pixel values to the range \[0,1].
2. **Image Resizing** โ€“ Every image is resized to **150x150** pixels to ensure uniform input size.
3. **Data Augmentation** โ€“ Implemented via `ImageDataGenerator` with:

   * `rescale=1./255`
   * `validation_split=0.1` (10% of data reserved for validation)

This allows for efficient training and prevents overfitting.

```python
train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.1)
```

---

## ๐Ÿ—๏ธ Model Architecture

The model is a **Sequential CNN** consisting of:

1. **Conv2D + MaxPooling Layers**:

   * Extract features from the images.
   * 3 convolutional layers with increasing filter sizes (32, 64, 128).
   * Each followed by max pooling to reduce spatial dimensions.

2. **Flatten Layer**:

   * Converts 2D feature maps into 1D feature vectors.

3. **Dense Layers**:

   * Fully connected layers for learning global patterns.
   * A hidden layer with 512 neurons (ReLU activation).
   * Output layer with **softmax activation** for multi-class classification.

```python
model = keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Flatten(),
    layers.Dense(512, activation='relu'),
    layers.Dense(len(train_generator.class_indices), activation='softmax')
])
```

---

## โšก Model Compilation & Training

* **Loss Function:** Categorical Crossentropy
* **Optimizer:** Adam
* **Metric:** Accuracy

```python
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
```

The model is trained for **10 epochs**:

```python
model.fit(train_generator,
          validation_data=validation_generator,
          epochs=10)
```

---

## ๐Ÿ’พ Model Saving

After training, the model is saved in `.h5` format:

```python
model.save("model5.h5")
```

This allows reusing the model later without retraining.

---

## ๐Ÿ”ฎ Prediction Function

A custom `guess()` function is provided to make predictions on new images:

Steps:

1. Load and resize image to **150x150**.
2. Normalize pixel values.
3. Predict with the trained CNN.
4. Map prediction to class label.
5. Display image with predicted class title.

```python
def guess(image_path, model, class_indices):
    img = load_img(image_path, target_size=(150, 150))
    img_array = img_to_array(img) / 255.0
    img_array = np.expand_dims(img_array, axis=0)
    
    prediction = model.predict(img_array)
    predicted_class = np.argmax(prediction)
    class_labels = {v: k for k, v in class_indices.items()}
    predicted_label = class_labels[predicted_class]
    
    plt.imshow(img)
    plt.title(f"model_guess: {predicted_label}")
    plt.axis("off")
    plt.show()
```

Example usage:

```python
guess("test_image.jpg", model, train_generator.class_indices)
```

---

## ๐Ÿ“Š Results & Evaluation

* The training and validation accuracy/loss values are automatically logged.
* These can be plotted using `matplotlib` to visualize performance trends.
* Example metrics:

  * Training Accuracy โ‰ˆ 90%+
  * Validation Accuracy โ‰ˆ 85โ€“95% (depending on dataset balance)

---

## ๐Ÿš€ Possible Improvements

* Apply **data augmentation** (rotation, flip, zoom) to generalize better.
* Use **Transfer Learning** (e.g., ResNet50, EfficientNet, VGG16) for higher accuracy.
* Implement **early stopping & checkpointing** to avoid overfitting.
* Increase **epochs** and adjust learning rates for fine-tuning.

---

## ๐Ÿ“– References

* TensorFlow Documentation: [https://www.tensorflow.org/](https://www.tensorflow.org/)
* Keras Image Classification Guide: [https://keras.io/examples/vision/](https://keras.io/examples/vision/)
* Kaggle Multi-Cancer Dataset

---

## ๐Ÿ‘จโ€๐Ÿ’ป Author

This project was developed as part of a **medical image classification study** using deep learning. It can be extended to other cancer types or generalized to different medical imaging problems such as X-ray, MRI, or CT scan analysis.

---

โšก **In summary:**
This project demonstrates how to build a **deep learning pipeline** for medical image classification with CNNs, using TensorFlow/Keras. It covers everything from **data preprocessing** to **model training, saving, and prediction visualization**.

---