File size: 3,672 Bytes
8938d1b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# Logistic Regression  

References: 
https://www.kdnuggets.com/2020/01/guide-precision-recall-confusion-matrix.html
https://developers.google.com/machine-learning/crash-course/classification/thresholding
https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-classification-in-python/ 

https://developers.google.com/machine-learning/crash-course/logistic-regression/model-training

### Thresholding 

A logistic regression model that returns 0.9995 for a particular email message is predicting that it is very likely to be spam. Conversely, another email message with a prediction score of 0.0003 on that same logistic regression model is very likely not spam. However, what about an email message with a prediction score of 0.6? In order to map a logistic regression value to a binary category, you must define a classification threshold (also called the decision threshold). A value above that threshold indicates "spam"; a value below indicates "not spam." It is tempting to assume that the classification threshold should always be 0.5, but thresholds are problem-dependent, and are therefore values that you must tune.

### Accuracy 

$ 
Accuracy = Total correct / total predictions
$ 

Using the Confusion Matrix values

$ 
Accuracy = TP + TN / TP + FP + TN + FN
$

Accuracy alone doesn't tell the full story when you're working with a **class-imbalanced data** set, like this one, where there is a significant disparity between the number of positive and negative labels.

### Precision 

Precision — Also called Positive predictive value
The ratio of correct positive predictions to the *total predicted positives.*

$
Precision = \frac{TP}{TP + FP}
$


### Recall 

Recall — Also called Sensitivity, Probability of Detection, True Positive Rate

The ratio of correct positive predictions to the *total positives examples.*

$
Recall = \frac{TP}{TP + FN}
$

### ROC & AUC 

* ROC Curves summarize the trade-off between the true positive rate and false positive rate for a predictive model using different probability thresholds.
    
* Precision-Recall curves summarize the trade-off between the true positive rate and the positive predictive value for a predictive model using different probability thresholds.

* ROC curves are appropriate when the observations are balanced between each class, whereas precision-recall curves are appropriate for imbalanced datasets.


### sklearn functions

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_curve.html


### What about using XGBoost for classification? 

We can still use XGBoost but logistic regression is linear and XGBoost is *not* linear. 

For example we can see here that we are drawing linear boundaries between classifications in the iris dataset. 

https://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html#sphx-glr-auto-examples-linear-model-plot-iris-logistic-py 


### What about difference between SVM and logistic regression? 


### Logistic Regression Loss function 

**This always trips me up because some people call it *log loss* or cross entropy or logits or something else!**

The loss function for linear regression is squared loss. The loss function for logistic regression is Log Loss, which is defined as follows:
$
put formula in here later
$

is the data set containing many labeled examples, which are
pairs.
is the label in a labeled example. Since this is logistic regression, every value of
must either be 0 or 1.
is the predicted value (somewhere between 0 and 1), given the set of features in .