JAYASREESS commited on
Commit
b531b77
·
verified ·
1 Parent(s): d1fb1ab

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -33
README.md DELETED
@@ -1,33 +0,0 @@
1
- # Credit Card Fraud Detection with DuckDB and Medallion Architecture
2
-
3
- This project demonstrates an end-to-end pipeline for credit card fraud detection. It uses DuckDB to process data in a Medallion Architecture (Bronze, Silver, Gold) and trains a Random Forest model to identify fraudulent transactions.
4
-
5
- ## Project Structure
6
-
7
- - `data/`: Contains the raw CSV datasets (`fraudTrain.csv`, `fraudTest.csv`).
8
- - `src/`: Contains the Python scripts for the data pipeline and model training.
9
- - `bronze.py`: Ingests raw data into the bronze layer.
10
- - `silver.py`: Cleans and transforms data for the silver layer.
11
- - `gold.py`: Creates aggregated features for the gold (analytics) layer.
12
- - `train.py`: Trains a `RandomForestClassifier` on the gold data and saves the model.
13
- - `models/`: Directory where the trained model is saved.
14
- - `requirements.txt`: Lists the required Python packages.
15
-
16
- ## How to Run
17
-
18
- 1. **Install dependencies:**
19
- ```bash
20
- pip install -r requirements.txt
21
- ```
22
-
23
- 2. **Run the training pipeline:**
24
- This command executes the entire data pipeline (Bronze, Silver, Gold) and trains the model.
25
- ```bash
26
- python src/train.py
27
- ```
28
-
29
- ## Medallion Architecture
30
-
31
- - **Bronze Layer**: Raw, unfiltered data ingested directly from the source CSVs.
32
- - **Silver Layer**: Cleaned and transformed data. Timestamps are corrected, and new features like cardholder `age` are derived.
33
- - **Gold Layer**: Analytics-ready data with aggregated features (e.g., `avg_merch_spend`) suitable for machine learning.