Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -8,7 +8,7 @@ Mathew
|
|
| 8 |
This is a Linear Regression model trained on the UCI Automobile dataset to predict the 'symboling' insurance risk rating from 17 car features including price, horsepower, bore, and curb-weight, amongst other continous variables. Symboling is defined as an integer value (whole number), ranging from -3 to +3.
|
| 9 |
|
| 10 |
## Intended Uses & Limitations
|
| 11 |
-
This model is for educational purposes only. It is not suitable for production use because the dataset is small (only 200 or so entries), outdated (1980s), and contained a lot of missing values (41 missing normalized-losses, around 20% of all rows had a missing normalized-losses entry).
|
| 12 |
|
| 13 |
## Training Data
|
| 14 |
Data source: UCI Automobile dataset (https://archive.ics.uci.edu/dataset/10/automobile). Contains ~200 cars with mixed numeric and categorical features. Missing values were imputed using MICE.
|
|
@@ -18,7 +18,7 @@ Data source: UCI Automobile dataset (https://archive.ics.uci.edu/dataset/10/auto
|
|
| 18 |
- RMSE: 0.713
|
| 19 |
|
| 20 |
## Ethical Considerations
|
| 21 |
-
The 'symboling' risk value is not only determined by continous, but categorical variables as well, which the model does not account for. While things such as horsepower, bore, engine-size, and number of doors are good predictors, insurance companies also use brands of cars and the type of car (luxury, sport, etc), as well as a variety of other variables to help determine risk factors.
|
| 22 |
|
| 23 |
## Audit Questions
|
| 24 |
- What features most strongly influence predictions?
|
|
|
|
| 8 |
This is a Linear Regression model trained on the UCI Automobile dataset to predict the 'symboling' insurance risk rating from 17 car features including price, horsepower, bore, and curb-weight, amongst other continous variables. Symboling is defined as an integer value (whole number), ranging from -3 to +3.
|
| 9 |
|
| 10 |
## Intended Uses & Limitations
|
| 11 |
+
This model is for educational purposes only. It is not suitable for production use because the dataset is small (only 200 or so entries), outdated (1980s), and contained a lot of missing values (41 missing normalized-losses, around 20% of all rows had a missing normalized-losses entry). While the missing data was imputated, predictions should not be used for real insurance predictions.
|
| 12 |
|
| 13 |
## Training Data
|
| 14 |
Data source: UCI Automobile dataset (https://archive.ics.uci.edu/dataset/10/automobile). Contains ~200 cars with mixed numeric and categorical features. Missing values were imputed using MICE.
|
|
|
|
| 18 |
- RMSE: 0.713
|
| 19 |
|
| 20 |
## Ethical Considerations
|
| 21 |
+
The 'symboling' risk value is not only determined by continous, but categorical variables as well, which the model does not account for. While things such as horsepower, bore, engine-size, and number of doors are good predictors, insurance companies also use brands of cars and the type of car (luxury, sport, etc), as well as a variety of other variables to help determine risk factors. Because the model does not take these variables into account, it is very unreliable.
|
| 22 |
|
| 23 |
## Audit Questions
|
| 24 |
- What features most strongly influence predictions?
|