Fine-Tuned LLM for Credit Card Fraud Detection 💳⚠️

This model is a QLoRA-fine-tuned version of microsoft/Phi-3-mini-4k-instruct, trained to classify credit-card transactions as fraudulent (1) or legitimate (0) using textual prompts generated from tabular data.

The goal was to teach an LLM to handle highly imbalanced binary classification, where fraud cases are extremely rare.

Model Details

Base Model: microsoft/Phi-3-mini-4k-instruct
Fine-tuning method: QLoRA
Task: Binary fraud classification via text prompts
Language: English
Dataset: https://huggingface.co/datasets/David-Egea/Creditcard-fraud-detection
License: MIT (base model: see Microsoft license)
Intended Use: Fraud detection experimentation and research

Training Summary

Input Format

Each row of the credit-card dataset was converted into a natural-language prompt describing the transaction.

Class Imbalance Handling

Original fraud ratio: ~0.17%
Upsampled frauds to ~10% for fine-tuning to avoid majority-class collapse.

Hardware + Method

1× NVIDIA T4 / A100 (Colab)
QLoRA 4-bit fine-tuning
fp16 inference

Evaluation

Performance (N=5,000 test samples)

Metric	Score
Accuracy	0.993
Precision	0.273
Recall	0.923 ✅
F1 Score	0.421

Confusion Matrix

✅ 12/13 fraud cases detected
❌ 1 fraud missed
⚠️ 32 false positives

ROC-AUC

~0.55 (expected — LLM classification logits not calibrated)

Key Takeaway:
Baseline model detected 0 fraud cases (accuracy illusion).
Fine-tuning + class balancing enabled real fraud detection with high recall.

Intended Uses

Appropriate Use

Research in LLMs for tabular→text classification
Imbalanced classification experimentation
Educational fraud-detection modeling

Not Intended For

⚠️ Production financial fraud systems
⚠️ Real-world credit scoring or loan decisions
⚠️ Any high-stakes decision without calibration + supervision

Bias, Risks & Limitations

LLM outputs are not probability-calibrated
Some false positives expected (trade-off for high recall)
Not trained on personal or sensitive real data
Not guaranteed to generalize to other fraud domains

Example Usage

from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
model = AutoModelForCausalLM.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")

pipe = pipeline("text-generation", model=model, tokenizer=tok)

prompt = "Transaction: amount=$650.23, time=12:45pm, category=electronics, location=NY. Should we flag it as fraud?"

print(pipe(prompt, max_new_tokens=2, do_sample=False))

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support