mboullier commited on
Commit
c4485cf
·
verified ·
1 Parent(s): 85a4d68

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -8,7 +8,7 @@ Mathew
8
  This is a Linear Regression model trained on the UCI Automobile dataset to predict the 'symboling' insurance risk rating from 17 car features including price, horsepower, bore, and curb-weight, amongst other continous variables. Symboling is defined as an integer value (whole number), ranging from -3 to +3.
9
 
10
  ## Intended Uses & Limitations
11
- This model is for educational purposes only. It is not suitable for production use because the dataset is small (only 200 or so entries), outdated (1980s), and contained a lot of missing values (41 missing normalized-losses, around 20% of all rows had a missing normalized-losses entry). Predictions should not be used for real insurance predictions.
12
 
13
  ## Training Data
14
  Data source: UCI Automobile dataset (https://archive.ics.uci.edu/dataset/10/automobile). Contains ~200 cars with mixed numeric and categorical features. Missing values were imputed using MICE.
@@ -18,7 +18,7 @@ Data source: UCI Automobile dataset (https://archive.ics.uci.edu/dataset/10/auto
18
  - RMSE: 0.713
19
 
20
  ## Ethical Considerations
21
- The 'symboling' risk value is not only determined by continous, but categorical variables as well, which the model does not account for. While things such as horsepower, bore, engine-size, and number of doors are good predictors, insurance companies also use brands of cars and the type of car (luxury, sport, etc), as well as a variety of other variables to help determine risk factors.
22
 
23
  ## Audit Questions
24
  - What features most strongly influence predictions?
 
8
  This is a Linear Regression model trained on the UCI Automobile dataset to predict the 'symboling' insurance risk rating from 17 car features including price, horsepower, bore, and curb-weight, amongst other continous variables. Symboling is defined as an integer value (whole number), ranging from -3 to +3.
9
 
10
  ## Intended Uses & Limitations
11
+ This model is for educational purposes only. It is not suitable for production use because the dataset is small (only 200 or so entries), outdated (1980s), and contained a lot of missing values (41 missing normalized-losses, around 20% of all rows had a missing normalized-losses entry). While the missing data was imputated, predictions should not be used for real insurance predictions.
12
 
13
  ## Training Data
14
  Data source: UCI Automobile dataset (https://archive.ics.uci.edu/dataset/10/automobile). Contains ~200 cars with mixed numeric and categorical features. Missing values were imputed using MICE.
 
18
  - RMSE: 0.713
19
 
20
  ## Ethical Considerations
21
+ The 'symboling' risk value is not only determined by continous, but categorical variables as well, which the model does not account for. While things such as horsepower, bore, engine-size, and number of doors are good predictors, insurance companies also use brands of cars and the type of car (luxury, sport, etc), as well as a variety of other variables to help determine risk factors. Because the model does not take these variables into account, it is very unreliable.
22
 
23
  ## Audit Questions
24
  - What features most strongly influence predictions?