Update README.md
Browse files
README.md
CHANGED
|
@@ -10,10 +10,72 @@ tags:
|
|
| 10 |
pinned: false
|
| 11 |
short_description: Streamlit template space
|
| 12 |
---
|
|
|
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
pinned: false
|
| 11 |
short_description: Streamlit template space
|
| 12 |
---
|
| 13 |
+
# π¦ Porto Seguro β Safe Driver Prediction
|
| 14 |
|
| 15 |
+
This machine learning app predicts the probability that a driver will file an auto insurance claim.
|
| 16 |
|
| 17 |
+
## π Problem Statement
|
| 18 |
|
| 19 |
+
Insurance companies need accurate risk estimation to price policies fairly.
|
| 20 |
+
In this Kaggle competition, the goal is to build a model that predicts whether a policyholder will file a claim in the next year.
|
| 21 |
+
|
| 22 |
+
Better predictions help:
|
| 23 |
+
|
| 24 |
+
- reduce costs for safe drivers
|
| 25 |
+
- price high-risk drivers correctly
|
| 26 |
+
- improve accessibility of insurance
|
| 27 |
+
|
| 28 |
+
This is a **binary classification problem** with highly imbalanced data.
|
| 29 |
+
|
| 30 |
+
## π Dataset Overview
|
| 31 |
+
|
| 32 |
+
The dataset contains anonymized features related to:
|
| 33 |
+
|
| 34 |
+
- driver information (`ind`)
|
| 35 |
+
- regional data (`reg`)
|
| 36 |
+
- car characteristics (`car`)
|
| 37 |
+
- calculated features (`calc`)
|
| 38 |
+
- binary and categorical variables
|
| 39 |
+
|
| 40 |
+
Missing values are represented by **-1**.
|
| 41 |
+
|
| 42 |
+
Target:
|
| 43 |
+
- `target = 1` β claim filed
|
| 44 |
+
- `target = 0` β no claim
|
| 45 |
+
|
| 46 |
+
## βοΈ Machine Learning Pipeline
|
| 47 |
+
|
| 48 |
+
1. Data cleaning & handling missing values
|
| 49 |
+
2. Feature selection
|
| 50 |
+
3. Train-test split
|
| 51 |
+
4. Model training
|
| 52 |
+
5. Evaluation
|
| 53 |
+
## π€ Model
|
| 54 |
+
|
| 55 |
+
Algorithm used:
|
| 56 |
+
|
| 57 |
+
- Logistic Regression / Random Forest / XGBoost *(pas aan naar jouw model)*
|
| 58 |
+
|
| 59 |
+
The model outputs the **probability of a claim**.
|
| 60 |
+
|
| 61 |
+
## π Evaluation Metric
|
| 62 |
+
|
| 63 |
+
Competition metric:
|
| 64 |
+
|
| 65 |
+
**Normalized Gini Coefficient**
|
| 66 |
+
|
| 67 |
+
Why Gini?
|
| 68 |
+
|
| 69 |
+
It measures how well the model ranks high-risk drivers above low-risk drivers.
|
| 70 |
+
|
| 71 |
+
## π Streamlit App
|
| 72 |
+
|
| 73 |
+
The app allows users to:
|
| 74 |
+
|
| 75 |
+
- Enter driver & vehicle features
|
| 76 |
+
- Get real-time claim probability prediction
|
| 77 |
+
|
| 78 |
+
### Output
|
| 79 |
+
|
| 80 |
+
- Claim probability
|
| 81 |
+
- Risk interpretation
|