File size: 1,155 Bytes
7380aca
d77c194
 
 
 
7380aca
a4a29e3
 
7380aca
 
 
 
d77c194
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
144318e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
---
title: Credit Card Fraud Detection with DuckDB
emoji: 💳
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.4.0
python_version: '3.10'
app_file: app.py
pinned: false
---

# Credit Card Fraud Detection with DuckDB and Medallion Architecture

This project demonstrates an end-to-end pipeline for credit card fraud detection. It uses DuckDB to process data in a Medallion Architecture (Bronze, Silver, Gold) and trains a Random Forest model to identify fraudulent transactions.

## Project Structure

- `data/`: Contains the raw CSV datasets (`fraudTrain.csv`, `fraudTest.csv`).
- `src/`: Contains the Python scripts for the data pipeline and model training.
  - `bronze.py`: Ingests raw data into the bronze layer.
  - `silver.py`: Cleans and transforms data for the silver layer.
  - `gold.py`: Creates aggregated features for the gold (analytics) layer.
  - `train.py`: Trains a `RandomForestClassifier` on the gold data and saves the model.
- `models/`: Directory where the trained model is saved.
- `requirements.txt`: Lists the required Python packages.

## How to Run

1. **Install dependencies:**
```bash
pip install -r requirements.txt