BeyzaTopbas commited on
Commit
2aa6112
Β·
verified Β·
1 Parent(s): 6d8a6f1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -4
README.md CHANGED
@@ -10,10 +10,72 @@ tags:
10
  pinned: false
11
  short_description: Streamlit template space
12
  ---
 
13
 
14
- # Welcome to Streamlit!
15
 
16
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
17
 
18
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
19
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  pinned: false
11
  short_description: Streamlit template space
12
  ---
13
+ # 🏦 Porto Seguro – Safe Driver Prediction
14
 
15
+ This machine learning app predicts the probability that a driver will file an auto insurance claim.
16
 
17
+ ## πŸ“Œ Problem Statement
18
 
19
+ Insurance companies need accurate risk estimation to price policies fairly.
20
+ In this Kaggle competition, the goal is to build a model that predicts whether a policyholder will file a claim in the next year.
21
+
22
+ Better predictions help:
23
+
24
+ - reduce costs for safe drivers
25
+ - price high-risk drivers correctly
26
+ - improve accessibility of insurance
27
+
28
+ This is a **binary classification problem** with highly imbalanced data.
29
+
30
+ ## πŸ“Š Dataset Overview
31
+
32
+ The dataset contains anonymized features related to:
33
+
34
+ - driver information (`ind`)
35
+ - regional data (`reg`)
36
+ - car characteristics (`car`)
37
+ - calculated features (`calc`)
38
+ - binary and categorical variables
39
+
40
+ Missing values are represented by **-1**.
41
+
42
+ Target:
43
+ - `target = 1` β†’ claim filed
44
+ - `target = 0` β†’ no claim
45
+
46
+ ## βš™οΈ Machine Learning Pipeline
47
+
48
+ 1. Data cleaning & handling missing values
49
+ 2. Feature selection
50
+ 3. Train-test split
51
+ 4. Model training
52
+ 5. Evaluation
53
+ ## πŸ€– Model
54
+
55
+ Algorithm used:
56
+
57
+ - Logistic Regression / Random Forest / XGBoost *(pas aan naar jouw model)*
58
+
59
+ The model outputs the **probability of a claim**.
60
+
61
+ ## πŸ“ Evaluation Metric
62
+
63
+ Competition metric:
64
+
65
+ **Normalized Gini Coefficient**
66
+
67
+ Why Gini?
68
+
69
+ It measures how well the model ranks high-risk drivers above low-risk drivers.
70
+
71
+ ## πŸš€ Streamlit App
72
+
73
+ The app allows users to:
74
+
75
+ - Enter driver & vehicle features
76
+ - Get real-time claim probability prediction
77
+
78
+ ### Output
79
+
80
+ - Claim probability
81
+ - Risk interpretation