Spaces:
Sleeping
Sleeping
| title: Credit Card Fraud Detection with DuckDB | |
| emoji: 💳 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.4.0 | |
| python_version: '3.10' | |
| app_file: app.py | |
| pinned: false | |
| # Credit Card Fraud Detection with DuckDB and Medallion Architecture | |
| This project demonstrates an end-to-end pipeline for credit card fraud detection. It uses DuckDB to process data in a Medallion Architecture (Bronze, Silver, Gold) and trains a Random Forest model to identify fraudulent transactions. | |
| ## Project Structure | |
| - `data/`: Contains the raw CSV datasets (`fraudTrain.csv`, `fraudTest.csv`). | |
| - `src/`: Contains the Python scripts for the data pipeline and model training. | |
| - `bronze.py`: Ingests raw data into the bronze layer. | |
| - `silver.py`: Cleans and transforms data for the silver layer. | |
| - `gold.py`: Creates aggregated features for the gold (analytics) layer. | |
| - `train.py`: Trains a `RandomForestClassifier` on the gold data and saves the model. | |
| - `models/`: Directory where the trained model is saved. | |
| - `requirements.txt`: Lists the required Python packages. | |
| ## How to Run | |
| 1. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt |