mboullier commited on
Commit
0a13bec
·
verified ·
1 Parent(s): 15401ae

Upload folder using huggingface_hub

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Model Card
3
+
4
+ ## Model Card Authors
5
+ Mathew
6
+
7
+ ## Model Description
8
+ This is a Linear Regression model trained on the UCI Automobile dataset to predict the 'symboling' insurance risk rating from 17 car features including price, horsepower, bore, and curb-weight, amongst other continous variables.
9
+
10
+ ## Intended Uses & Limitations
11
+ This model is for educational purposes only. It is not suitable for production use because the dataset is small (only ~200 or so entries), outdated (~1980s), and contained a lot of missing values (41 missing normalized-losses, around 20% of all rows had a missing normalized-losses entry). Predictions should not be used for real insurance predictions.
12
+
13
+ ## Training Data
14
+ Data source: UCI Automobile dataset (https://archive.ics.uci.edu/dataset/10/automobile). Contains ~200 cars with mixed numeric and categorical features. Missing values were imputed using MICE.
15
+
16
+ ## Evaluation Metrics
17
+ - R2: 0.603
18
+ - RMSE: 0.713
19
+
20
+ ## Ethical Considerations
21
+ The 'symboling' risk value is not only determined by continous, but categorical variables as well, which the model does not account for. While things such as horsepower, bore, engine-size, and number of doors are good predictors, insurance companies also use brands of cars and the type of car (luxury, sport, etc), as well as a variety of other variables to help determine risk factors.
22
+
23
+ ## Audit Questions
24
+ - What features most strongly influence predictions?
25
+ - Are residuals randomly scattered or patterned?
26
+ - How reliable are the evaluation metrics?
27
+
28
+
29
+ ## Coefficients
30
+ | features | coefficients |
31
+ |:------------------|---------------:|
32
+ | price | -1.73704e-05 |
33
+ | highway-mpg | 0.0438076 |
34
+ | city-mpg | -0.0610687 |
35
+ | peak-rpm | -5.49499e-05 |
36
+ | horsepower | 0.00207246 |
37
+ | compression-ratio | 0.0187334 |
38
+ | stroke | -0.555667 |
39
+ | bore | -0.827261 |
40
+ | engine-size | 0.013724 |
41
+ | num-of-cylinders | -0.498651 |
42
+ | curb-weight | -5.04019e-05 |
43
+ | height | 0.0239754 |
44
+ | width | 0.195005 |
45
+ | length | 0.0120506 |
46
+ | wheel-base | -0.153431 |
47
+ | num-of-doors | -0.428882 |
48
+ | normalized-losses | 0.0116676 |
49
+
50
+ ## Plots
51
+ ### Predicted vs Actual
52
+ ![Predicted vs Actual](predicted_vs_actual.png)
53
+
54
+ ### Residuals Plot
55
+ ![Residuals Plot](residuals_plot.png)
README.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Model Card
3
+
4
+ ## Model Card Authors
5
+ Mathew
6
+
7
+ ## Model Description
8
+ This is a Linear Regression model trained on the UCI Automobile dataset to predict the 'symboling' insurance risk rating from 17 car features including price, horsepower, bore, and curb-weight, amongst other continous variables.
9
+
10
+ ## Intended Uses & Limitations
11
+ This model is for educational purposes only. It is not suitable for production use because the dataset is small (only ~200 or so entries), outdated (~1980s), and contained a lot of missing values (41 missing normalized-losses, around 20% of all rows had a missing normalized-losses entry). Predictions should not be used for real insurance predictions.
12
+
13
+ ## Training Data
14
+ Data source: UCI Automobile dataset (https://archive.ics.uci.edu/dataset/10/automobile). Contains ~200 cars with mixed numeric and categorical features. Missing values were imputed using MICE.
15
+
16
+ ## Evaluation Metrics
17
+ - R2: 0.603
18
+ - RMSE: 0.713
19
+
20
+ ## Ethical Considerations
21
+ The 'symboling' risk value is not only determined by continous, but categorical variables as well, which the model does not account for. While things such as horsepower, bore, engine-size, and number of doors are good predictors, insurance companies also use brands of cars and the type of car (luxury, sport, etc), as well as a variety of other variables to help determine risk factors.
22
+
23
+ ## Audit Questions
24
+ - What features most strongly influence predictions?
25
+ - Are residuals randomly scattered or patterned?
26
+ - How reliable are the evaluation metrics?
27
+
28
+
29
+ ## Coefficients
30
+ | features | coefficients |
31
+ |:------------------|---------------:|
32
+ | price | -1.73704e-05 |
33
+ | highway-mpg | 0.0438076 |
34
+ | city-mpg | -0.0610687 |
35
+ | peak-rpm | -5.49499e-05 |
36
+ | horsepower | 0.00207246 |
37
+ | compression-ratio | 0.0187334 |
38
+ | stroke | -0.555667 |
39
+ | bore | -0.827261 |
40
+ | engine-size | 0.013724 |
41
+ | num-of-cylinders | -0.498651 |
42
+ | curb-weight | -5.04019e-05 |
43
+ | height | 0.0239754 |
44
+ | width | 0.195005 |
45
+ | length | 0.0120506 |
46
+ | wheel-base | -0.153431 |
47
+ | num-of-doors | -0.428882 |
48
+ | normalized-losses | 0.0116676 |
49
+
50
+ ## Plots
51
+ ### Predicted vs Actual
52
+ ![Predicted vs Actual](predicted_vs_actual.png)
53
+
54
+ ### Residuals Plot
55
+ ![Residuals Plot](residuals_plot.png)
auto_testing.csv ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ,price,highway-mpg,city-mpg,peak-rpm,horsepower,compression-ratio,stroke,bore,engine-size,num-of-cylinders,curb-weight,height,width,length,wheel-base,num-of-doors,normalized-losses,symboling
2
+ 111,15580.0,24,19,5000.0,95.0,8.4,2.19,3.46,120,4,3075,56.7,68.4,186.7,107.9,4.0,161.0,0
3
+ 17,36880.0,20,15,5400.0,182.0,8.0,3.39,3.62,209,6,3505,56.3,70.9,197.0,110.0,4.0,128.0,0
4
+ 116,17950.0,33,28,4150.0,95.0,21.0,3.52,3.7,152,4,3252,56.7,68.4,186.7,107.9,4.0,161.0,0
5
+ 6,17710.0,25,19,5500.0,110.0,8.5,3.4,3.19,136,5,2844,55.7,71.4,192.7,105.8,4.0,158.0,1
6
+ 9,17710.0,22,16,5500.0,160.0,7.0,3.4,3.13,131,5,3053,52.0,67.9,178.2,99.5,2.0,197.0,0
7
+ 141,7126.0,37,32,4800.0,82.0,9.5,2.64,3.62,108,4,2145,52.5,65.4,172.0,97.2,4.0,102.0,0
8
+ 197,16515.0,28,24,5400.0,114.0,9.5,3.15,3.78,141,4,3042,57.5,67.2,188.8,104.3,4.0,74.0,-1
9
+ 185,8195.0,34,27,5250.0,85.0,9.0,3.4,3.19,109,4,2212,55.7,65.5,171.7,97.3,4.0,94.0,2
10
+ 196,15985.0,28,24,5400.0,114.0,9.5,3.15,3.78,141,4,2935,56.2,67.2,188.8,104.3,4.0,103.0,-2
11
+ 127,34028.0,25,17,5900.0,207.0,9.5,2.9,3.74,194,6,2756,51.6,65.0,168.9,89.5,2.0,145.0,3
12
+ 183,7975.0,34,27,5250.0,85.0,9.0,3.4,3.19,109,4,2209,55.7,65.5,171.7,97.3,2.0,122.0,2
13
+ 62,10245.0,32,26,4800.0,84.0,8.6,3.39,3.39,122,4,2410,55.5,66.5,177.8,98.8,4.0,115.0,0
14
+ 187,9495.0,42,37,4500.0,68.0,23.0,3.4,3.01,97,4,2319,55.7,65.5,171.7,97.3,4.0,94.0,2
15
+ 20,6575.0,43,38,5400.0,70.0,9.6,3.11,3.03,90,4,1909,52.0,63.6,158.8,94.5,4.0,81.0,0
16
+ 135,15510.0,28,21,5250.0,110.0,9.3,3.07,3.54,121,4,2758,56.1,66.5,186.6,99.1,4.0,104.0,2
17
+ 162,9258.0,34,28,4800.0,70.0,9.0,3.03,3.19,98,4,2140,52.8,64.4,166.3,95.7,4.0,91.0,0
18
+ 45,6338.0,43,38,5400.0,70.0,9.6,3.11,3.03,90,4,1909,52.0,63.6,155.9,94.5,4.0,83.0,0
19
+ 83,14869.0,24,19,5000.0,145.0,7.0,3.86,3.59,156,4,2921,50.2,66.3,173.2,95.9,2.0,164.0,3
20
+ 129,35056.0,28,17,5750.0,288.0,10.0,3.11,3.94,203,8,3366,50.5,72.3,175.7,98.4,2.0,188.0,1
21
+ 2,16500.0,26,19,5000.0,154.0,9.0,3.47,2.68,152,6,2823,52.4,65.5,171.2,94.5,2.0,145.0,1
22
+ 40,10295.0,33,27,5800.0,86.0,9.0,3.58,3.15,110,4,2372,54.1,62.5,175.4,96.5,4.0,85.0,0
23
+ 52,6795.0,38,31,5000.0,68.0,9.0,3.15,3.03,91,4,1905,54.1,64.2,159.1,93.1,2.0,104.0,1
24
+ 75,16503.0,24,19,5000.0,175.0,8.0,3.12,3.78,140,4,2910,54.8,68.0,178.4,102.7,2.0,158.0,1
25
+ 13,21105.0,28,21,4250.0,121.0,9.0,3.19,3.31,164,6,2765,54.3,64.8,176.8,101.2,4.0,188.0,0
26
+ 171,11549.0,30,24,4800.0,116.0,9.3,3.5,3.62,146,4,2714,52.0,65.6,176.2,98.4,2.0,134.0,2
27
+ 21,5572.0,41,37,5500.0,68.0,9.41,3.23,2.97,90,4,1876,50.8,63.8,157.3,93.7,2.0,118.0,1
28
+ 54,7395.0,38,31,5000.0,68.0,9.0,3.15,3.08,91,4,1950,54.1,64.2,166.8,93.1,4.0,113.0,1
29
+ 42,10345.0,31,25,5500.0,100.0,9.1,3.58,3.15,110,4,2293,51.0,66.0,169.1,96.5,2.0,107.0,1
30
+ 194,12940.0,28,23,5400.0,114.0,9.5,3.15,3.78,141,4,2912,56.2,67.2,188.8,104.3,4.0,103.0,-2
31
+ 202,21485.0,23,18,5500.0,134.0,8.8,2.87,3.58,173,6,3012,55.5,68.9,188.8,109.1,4.0,95.0,-1
32
+ 156,6938.0,37,30,4800.0,70.0,9.0,3.03,3.19,98,4,2081,53.0,64.4,166.3,95.7,4.0,91.0,0
33
+ 198,18420.0,22,17,5100.0,162.0,7.5,3.15,3.62,130,4,3045,56.2,67.2,188.8,104.3,4.0,103.0,-2
34
+ 150,5348.0,39,35,4800.0,62.0,9.0,3.03,3.05,92,4,1985,54.5,63.6,158.7,95.7,2.0,87.0,1
35
+ 147,10198.0,31,25,5200.0,94.0,9.0,2.64,3.62,108,4,2455,53.0,65.4,173.5,97.0,4.0,89.0,0
36
+ 19,6295.0,43,38,5400.0,70.0,9.6,3.11,3.03,90,4,1874,52.0,63.6,155.9,94.5,2.0,98.0,1
37
+ 108,13200.0,33,28,4150.0,95.0,21.0,3.52,3.7,152,4,3197,56.7,68.4,186.7,107.9,4.0,161.0,0
38
+ 168,9639.0,30,24,4800.0,116.0,9.3,3.5,3.62,146,4,2536,52.0,65.6,176.2,98.4,2.0,134.0,2
39
+ 22,6377.0,38,31,5500.0,68.0,9.4,3.23,2.97,90,4,1876,50.8,63.8,157.3,93.7,2.0,118.0,1
40
+ 140,7603.0,31,26,4400.0,73.0,8.7,2.64,3.62,108,4,2240,55.7,63.8,157.3,93.3,2.0,83.0,2
41
+ 199,18950.0,22,17,5100.0,162.0,7.5,3.15,3.62,130,4,3157,57.5,67.2,188.8,104.3,4.0,74.0,-1
42
+ 155,8778.0,32,27,4800.0,62.0,9.0,3.03,3.05,92,4,3110,59.1,63.6,169.7,95.7,4.0,91.0,0
config.json ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "sklearn": {
3
+ "columns": [
4
+ "price",
5
+ "highway-mpg",
6
+ "city-mpg",
7
+ "peak-rpm",
8
+ "horsepower",
9
+ "compression-ratio",
10
+ "stroke",
11
+ "bore",
12
+ "engine-size",
13
+ "num-of-cylinders",
14
+ "curb-weight",
15
+ "height",
16
+ "width",
17
+ "length",
18
+ "wheel-base",
19
+ "num-of-doors",
20
+ "normalized-losses"
21
+ ],
22
+ "environment": [
23
+ "scikit-learn=1.0.2"
24
+ ],
25
+ "example_input": {
26
+ "price": [
27
+ 13495,
28
+ 16500,
29
+ 13950
30
+ ],
31
+ "highway-mpg": [
32
+ 29,
33
+ 28,
34
+ 31
35
+ ],
36
+ "city-mpg": [
37
+ 21,
38
+ 19,
39
+ 24
40
+ ],
41
+ "peak-rpm": [
42
+ 5000,
43
+ 5500,
44
+ 4800
45
+ ],
46
+ "horsepower": [
47
+ 102,
48
+ 115,
49
+ 110
50
+ ],
51
+ "compression-ratio": [
52
+ 9.0,
53
+ 9.0,
54
+ 9.0
55
+ ],
56
+ "stroke": [
57
+ 3.4,
58
+ 3.4,
59
+ 3.2
60
+ ],
61
+ "bore": [
62
+ 3.47,
63
+ 3.01,
64
+ 3.19
65
+ ],
66
+ "engine-size": [
67
+ 109,
68
+ 136,
69
+ 120
70
+ ],
71
+ "num-of-cylinders": [
72
+ 4,
73
+ 4,
74
+ 4
75
+ ],
76
+ "curb-weight": [
77
+ 2548,
78
+ 2823,
79
+ 2507
80
+ ],
81
+ "height": [
82
+ 54.3,
83
+ 55.1,
84
+ 54.5
85
+ ],
86
+ "width": [
87
+ 64.1,
88
+ 65.5,
89
+ 66.2
90
+ ],
91
+ "length": [
92
+ 168.8,
93
+ 171.2,
94
+ 176.6
95
+ ],
96
+ "wheel-base": [
97
+ 94.5,
98
+ 94.5,
99
+ 96.5
100
+ ],
101
+ "num-of-doors": [
102
+ 4.0,
103
+ 2.0,
104
+ 4.0
105
+ ],
106
+ "normalized-losses": [
107
+ 65,
108
+ 103,
109
+ 74
110
+ ]
111
+ },
112
+ "model": {
113
+ "file": "model.pkl"
114
+ },
115
+ "task": "tabular-regression"
116
+ }
117
+ }
model.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:397d94c940c067e31b74c2026c14e70ebed350169a6a9ab44405a86b48a254d6
3
+ size 10632
predicted_vs_actual.png ADDED
residuals_plot.png ADDED