yahiaehab10
/

fraud-ccf-lightgbm

fraud-detection

binary-classification

Model card Files Files and versions

fraud-ccf-lightgbm / README.md

yahiaehab10's picture

Upload README.md with huggingface_hub

d130315 verified 6 months ago

|

history blame contribute delete

1.79 kB

	---
	tags:
	- fraud-detection
	- credit-card
	- lightgbm
	- binary-classification
	library_name: sklearn
	---

	# Credit Card Fraud Classifier (LightGBM)

	## Model Description

	This is a LightGBM-based binary classifier trained to detect credit card fraud transactions.

	## Dataset

	- Source: ULB/Kaggle Credit Card Fraud Dataset
	- Timeframe: 2 days of transactions
	- Positive Rate: 0.172% (highly imbalanced)
	- Features: Amount + V1-V28 (PCA-transformed features)

	## Model Details

	- Algorithm: LightGBM Classifier
	- Task: Binary classification (Fraud vs Non-fraud)
	- Threshold: Calibrated to 0.1% FPR (False Positive Rate) cap
	- Input Features: 29 features (Amount + V1 through V28)

	## Usage

	```python
	import joblib
	import pandas as pd
	from huggingface_hub import hf_hub_download

	# Download model
	model_path = hf_hub_download(repo_id="yahiaehab10/fraud-ccf-lightgbm", filename="pipeline.pkl")
	pipeline = joblib.load(model_path)

	# Download threshold
	threshold_path = hf_hub_download(repo_id="yahiaehab10/fraud-ccf-lightgbm", filename="threshold.json")
	import json
	threshold = json.load(open(threshold_path))["threshold"]

	# Make predictions
	# X should have columns: Amount, V1, V2, ..., V28
	probabilities = pipeline.predict_proba(X)[:, 1]
	predictions = (probabilities >= threshold).astype(int)
	```

	## Performance

	The model is optimized for fraud detection with a focus on minimizing false positives while maintaining high recall for fraud cases.

	## Limitations

	- Educational purposes only - Not intended for production use
	- Trained on historical data - may not generalize to future fraud patterns
	- Highly imbalanced dataset - requires careful threshold calibration

	## License

	Educational use only. Please refer to the original dataset license.