File size: 1,786 Bytes
d130315
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
tags:
- fraud-detection
- credit-card
- lightgbm
- binary-classification
library_name: sklearn
---

# Credit Card Fraud Classifier (LightGBM)

## Model Description

This is a LightGBM-based binary classifier trained to detect credit card fraud transactions.

## Dataset

- **Source**: ULB/Kaggle Credit Card Fraud Dataset
- **Timeframe**: 2 days of transactions
- **Positive Rate**: 0.172% (highly imbalanced)
- **Features**: Amount + V1-V28 (PCA-transformed features)

## Model Details

- **Algorithm**: LightGBM Classifier
- **Task**: Binary classification (Fraud vs Non-fraud)
- **Threshold**: Calibrated to 0.1% FPR (False Positive Rate) cap
- **Input Features**: 29 features (Amount + V1 through V28)

## Usage

```python
import joblib
import pandas as pd
from huggingface_hub import hf_hub_download

# Download model
model_path = hf_hub_download(repo_id="yahiaehab10/fraud-ccf-lightgbm", filename="pipeline.pkl")
pipeline = joblib.load(model_path)

# Download threshold
threshold_path = hf_hub_download(repo_id="yahiaehab10/fraud-ccf-lightgbm", filename="threshold.json")
import json
threshold = json.load(open(threshold_path))["threshold"]

# Make predictions
# X should have columns: Amount, V1, V2, ..., V28
probabilities = pipeline.predict_proba(X)[:, 1]
predictions = (probabilities >= threshold).astype(int)
```

## Performance

The model is optimized for fraud detection with a focus on minimizing false positives while maintaining high recall for fraud cases.

## Limitations

- **Educational purposes only** - Not intended for production use
- Trained on historical data - may not generalize to future fraud patterns
- Highly imbalanced dataset - requires careful threshold calibration

## License

Educational use only. Please refer to the original dataset license.