Spaces:
Runtime error
Runtime error
Upload 2 files
Browse files
README.md
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Credit Card Fraud Detection with DuckDB
|
| 3 |
+
emoji: 💳
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "4.44.0"
|
| 8 |
+
python_version: "3.10"
|
| 9 |
+
app_file: app.py
|
| 10 |
+
pinned: false
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Credit Card Fraud Detection with DuckDB and Medallion Architecture
|
| 14 |
+
|
| 15 |
+
This project demonstrates an end-to-end pipeline for credit card fraud detection. It uses DuckDB to process data in a Medallion Architecture (Bronze, Silver, Gold) and trains a Random Forest model to identify fraudulent transactions.
|
| 16 |
+
|
| 17 |
+
## Project Structure
|
| 18 |
+
|
| 19 |
+
- `data/`: Contains the raw CSV datasets (`fraudTrain.csv`, `fraudTest.csv`).
|
| 20 |
+
- `src/`: Contains the Python scripts for the data pipeline and model training.
|
| 21 |
+
- `bronze.py`: Ingests raw data into the bronze layer.
|
| 22 |
+
- `silver.py`: Cleans and transforms data for the silver layer.
|
| 23 |
+
- `gold.py`: Creates aggregated features for the gold (analytics) layer.
|
| 24 |
+
- `train.py`: Trains a `RandomForestClassifier` on the gold data and saves the model.
|
| 25 |
+
- `models/`: Directory where the trained model is saved.
|
| 26 |
+
- `requirements.txt`: Lists the required Python packages.
|
| 27 |
+
|
| 28 |
+
## How to Run
|
| 29 |
+
|
| 30 |
+
1. **Install dependencies:**
|
| 31 |
+
```bash
|
| 32 |
+
pip install -r requirements.txt
|
app.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
|
| 3 |
+
def status():
|
| 4 |
+
return "Credit Card Fraud Detection Pipeline is ready. Run training using src/train.py"
|
| 5 |
+
|
| 6 |
+
gr.Interface(fn=status, inputs=[], outputs="text").launch()
|