Fine-Tuned LLM for Credit Card Fraud Detection 💳⚠️

This model is a QLoRA-fine-tuned version of microsoft/Phi-3-mini-4k-instruct, trained to classify credit-card transactions as fraudulent (1) or legitimate (0) using textual prompts generated from tabular data.

The goal was to teach an LLM to handle highly imbalanced binary classification, where fraud cases are extremely rare.


Model Details


Training Summary

Input Format

Each row of the credit-card dataset was converted into a natural-language prompt describing the transaction.

Class Imbalance Handling

Original fraud ratio: ~0.17%
Upsampled frauds to ~10% for fine-tuning to avoid majority-class collapse.

Hardware + Method

  • 1× NVIDIA T4 / A100 (Colab)
  • QLoRA 4-bit fine-tuning
  • fp16 inference

Evaluation

Performance (N=5,000 test samples)

Metric Score
Accuracy 0.993
Precision 0.273
Recall 0.923 ✅
F1 Score 0.421

Confusion Matrix

  • 12/13 fraud cases detected
  • 1 fraud missed
  • ⚠️ 32 false positives

ROC-AUC

~0.55 (expected — LLM classification logits not calibrated)

Key Takeaway:
Baseline model detected 0 fraud cases (accuracy illusion).
Fine-tuning + class balancing enabled real fraud detection with high recall.


Intended Uses

Appropriate Use

  • Research in LLMs for tabular→text classification
  • Imbalanced classification experimentation
  • Educational fraud-detection modeling

Not Intended For

⚠️ Production financial fraud systems
⚠️ Real-world credit scoring or loan decisions
⚠️ Any high-stakes decision without calibration + supervision


Bias, Risks & Limitations

  • LLM outputs are not probability-calibrated
  • Some false positives expected (trade-off for high recall)
  • Not trained on personal or sensitive real data
  • Not guaranteed to generalize to other fraud domains

Example Usage

from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM

tok = AutoTokenizer.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")
model = AutoModelForCausalLM.from_pretrained("YOUR_USERNAME/YOUR_MODEL_NAME")

pipe = pipeline("text-generation", model=model, tokenizer=tok)

prompt = "Transaction: amount=$650.23, time=12:45pm, category=electronics, location=NY. Should we flag it as fraud?"

print(pipe(prompt, max_new_tokens=2, do_sample=False))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support