metadata
title: Porto Seguro Safe Driver Prediction
emoji: π
colorFrom: red
colorTo: red
sdk: docker
app_port: 8501
tags:
- streamlit
pinned: false
short_description: Streamlit template space
π¦ Porto Seguro β Safe Driver Prediction
This machine learning app predicts the probability that a driver will file an auto insurance claim.
π Problem Statement
Insurance companies need accurate risk estimation to price policies fairly.
In this Kaggle competition, the goal is to build a model that predicts whether a policyholder will file a claim in the next year.
Better predictions help:
- reduce costs for safe drivers
- price high-risk drivers correctly
- improve accessibility of insurance
This is a binary classification problem with highly imbalanced data.
π Dataset Overview
The dataset contains anonymized features related to:
- driver information (
ind) - regional data (
reg) - car characteristics (
car) - calculated features (
calc) - binary and categorical variables
Missing values are represented by -1.
Target:
target = 1β claim filedtarget = 0β no claim
βοΈ Machine Learning Pipeline
- Data cleaning & handling missing values
- Feature selection
- Train-test split
- Model training
- Evaluation
π€ Model
Algorithm used:
- Logistic Regression / Random Forest / XGBoost (pas aan naar jouw model)
The model outputs the probability of a claim.
π Evaluation Metric
Competition metric:
Normalized Gini Coefficient
Why Gini?
It measures how well the model ranks high-risk drivers above low-risk drivers.
π Streamlit App
The app allows users to:
- Enter driver & vehicle features
- Get real-time claim probability prediction
Output
- Claim probability
- Risk interpretation