--- title: Credit Card Fraud Detection with DuckDB emoji: 💳 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 6.4.0 python_version: '3.10' app_file: app.py pinned: false --- # Credit Card Fraud Detection with DuckDB and Medallion Architecture This project demonstrates an end-to-end pipeline for credit card fraud detection. It uses DuckDB to process data in a Medallion Architecture (Bronze, Silver, Gold) and trains a Random Forest model to identify fraudulent transactions. ## Project Structure - `data/`: Contains the raw CSV datasets (`fraudTrain.csv`, `fraudTest.csv`). - `src/`: Contains the Python scripts for the data pipeline and model training. - `bronze.py`: Ingests raw data into the bronze layer. - `silver.py`: Cleans and transforms data for the silver layer. - `gold.py`: Creates aggregated features for the gold (analytics) layer. - `train.py`: Trains a `RandomForestClassifier` on the gold data and saves the model. - `models/`: Directory where the trained model is saved. - `requirements.txt`: Lists the required Python packages. ## How to Run 1. **Install dependencies:** ```bash pip install -r requirements.txt