File size: 8,402 Bytes
e1d2d61
 
 
 
 
 
 
 
 
 
 
 
 
3c84be2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dd37792
3c84be2
 
 
 
 
 
 
 
dd37792
3c84be2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
license: mit
language:
- en
metrics:
- accuracy
- precision
- recall
- roc_auc
pipeline_tag: tabular-classification
library_name: sklearn
tags:
- medical
---

### Model Description

<!-- Provide a longer summary of what this model is. -->
The following model is designed to predict whether a patient being screened has a cancerous tumor, depending on certain factors related to
breast shape, texture, smoothness, etc. The model has a total of 30 input features, and is designed to work within form-based applications, i.e. software applications which require 
user input. 

NOTE: The following model is meant as an assistive tool, and must NOT directly be used to produce the final verdict on a patient's condition. 
As it is meant to promote further evaluations upon having completed its prediction. 


- **Developed by:** DeepNeural
- **Model type:** Tabular Classifier
- **Language(s):** English
- **License:** MIT

### Model Inputs
| Variable Name      | Type    | Description                                                                   
|--------------------|---------|-------------------------------------------------------------------------------|
| radius1            | Continuous |  radius (mean of distances from center to points on the perimeter)                                                                          |
| texture1          | Continuous  | texture (standard deviation of gray-scale values)                         |    
| perimeter1        | Continuous  | perimeter                                                                 | 
| area1             | Continuous |  area                                                                       | 
| smoothness1       | Continuous  | smoothness (local variation in radius lengths)                             | 
| compactness1      | Continuous  | compactness (perimeter^2 / area - 1.0)                                     | 
| concavity1       | Continuous | concavity (severity of concave portions of the contour)                       | 
| concave_points1  | Continuous  | concave points (number of concave portions of the contour)                 | 
| symmetry1        | Continuous  | symmetry                                                                   | 
| fractal_dimension1 | Continuous  | ractal dimension ("coastline approximation" - 1)           | 
| radius2      | Continuous  |                                                                                | 
| texture2     | Continuous  |                                                                                | 
| perimeter2     | Continuous  |                                                                              |
| area2         | Continuous  |                                                                                | 
| smoothness2    | Continuous  |                                                                               | 
| compactness2   | Continuous  |                                                                               | 
| concavity2     | Continuous  |                                                                               | 
| concave_points2 | Continuous  |                                                                              | 
| symmetry2       | Continuous |                                                                               | 
| fractal_dimension2 | Continuous |                                                                            | 
| radius3         | Continuous |                                                                               | 
| texture3        | Continuous |                                                                               | 
| perimeter3      | Continuous |                                                                               | 
| area3           | Continuous |                                                                               | 
| smoothness3     | Continuous |                                                                               | 
| compactness3    | Continuous |                                                                               | 
| concavity3      | Continuous |                                                                               | 
| concave_points3  | Continuous |                                                                              | 
| symmetry3        | Continuous |                                                                              | 
| fractal_dimension3 | Continuous |                                                                            | 


### Model Sources 

<!-- Provide the basic links for the model. -->

- **Repository:** https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
This model is primarily designed for Data Scientists, Software Engineers and Machine Learning Engineers who have an interest in developing predictive breast cancer
software applications, for various healthcare institutions, ranging from hospitals to clinics. Furthermore, this model is also designed for educational
purposes within acadamia, whereby breast cancer risk-analysis is a priority of the study. 

Foreseeable users of the software applications to be developed with this model include: doctors, nurses (with respect to their patients)


## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Please be adviced that our model was trained on a specific dataset for breast cancer prediction, and although
it has an high level of accuracy and precision, there may come certain moments where misclassifications occur.


### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More research needed for further recommendations.
Furthermore, the following model will continously undergo improvements and testing for better results capable of fixing the limitations mentioned in the previous 
section. It is further adviced that this model be used an assistive tool in diagnostics procedures.

## How to Get Started with the Model
To properly make use of this model, please refer to the illustration below, which
showcases how this model can be loaded directly into an application. Please note, that, 
because it was built with the Scikit-Learn Machine Learning library, the model has been saved
as a .joblib file. With that in mind, please proceed by copying the following code into your coding environment (Python). 

   1. Install Joblib
      ```python
      !pip install joblib

      ```

   2. Load the model Upon Installation
      ```python
      my_model = joblib.load('breast_cancer_classifier_model_v1.joblib')

      ```
    
   3. Make predictions (Binary or Probability)
      ```python
      my_model.predict(X_test)

      # For probability-based outputs

      my_model.predict_proba(X_test)
      ```
    
   NOTE: This model requires input data in a 2-Dimensional format (Pandas Series) with the column names,
   considering the model is to be used in form-based applications.


#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->
We tested our dataset on various Machine Learning models, namely: logistic regression, Stochastic Gradient Descent, 
and Support Vector Machines. After performing hyperparameter tuning on the Logistic Regression model, we opted to
prioritize said model for our metrics calculations. The metrics used were accuracy, precision, recall, f1-score and AUC. 
The results for our model can be seen in the 'Results' section.

### Results (Best and final scores after fixing imbalanced issues)

Accuracy - 94%
Precision - 100%
Recall - 84%
AUC - 92%
F1-Score - 91%

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

- **Hardware Type:** T4 (for training)
- **Hours used:** < 20hr
- **Cloud Provider:** Google Cloud
- **Compute Region:** Europe
- **Carbon Emitted:** 1.02