Real-Time Credit Card Fraud Detection System
A production-grade machine learning system for detecting fraudulent credit card transactions in real-time using Random Forest classification.
π― Features
- High Accuracy: 99%+ fraud detection rate with <1% false alarms
- Real-Time Processing: <5ms prediction latency per transaction
- Scalable: Process 10,000+ transactions/second in batch mode
- Production-Ready: REST API for easy integration
- Model Persistence: Save and load trained models
- Large-Scale Training: Trained on 100,000+ transactions
π System Performance
| Metric | Value |
|---|---|
| Fraud Detection Rate | 99-100% |
| False Alarm Rate | <1% |
| Real-Time Latency | <5ms |
| Batch Throughput | 10,000+ txn/sec |
| ROC AUC Score | >0.99 |
π Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Train the Model
python fraud_detection_realtime.py
This will:
- Generate 100,000 synthetic transactions
- Engineer 31 advanced features
- Train a Random Forest model
- Evaluate performance
- Save the model to
fraud_model.pkl
Expected Output: ``` REAL-TIME CREDIT CARD FRAUD DETECTION SYSTEM Production-Grade ML System with Large-Scale Training
PHASE 1: MODEL TRAINING
π Generating 100,000 transactions... β Generated 100,000 transactions in X.XX seconds
- Legitimate: 97,000 (97.0%)
- Fraudulent: 3,000 (3.0%)
π§ Engineering advanced features... β Created 31 total features
π€ Training production-grade fraud detection model... Training set: 80,000 transactions Test set: 20,000 transactions Training Random Forest (this may take a minute)... β Model trained in XX.XX seconds β Model saved to 'fraud_model.pkl'
### 3. Start the API Server
```bash
python fraud_api.py
The API will start on http://localhost:5000
4. Test the API
In a new terminal:
python test_api.py
π‘ API Documentation
Endpoints
1. Health Check
GET /health
Response:
{
"status": "healthy",
"model_loaded": true,
"timestamp": "2024-02-13T10:30:00"
}
2. Model Information
GET /model/info
Response:
{
"n_features": 31,
"features": ["amount", "time_of_day", ...],
"model_type": "RandomForestClassifier",
"status": "ready"
}
3. Single Transaction Prediction
POST /predict
Content-Type: application/json
{
"transaction_id": "TXN12345",
"amount": 150.00,
"time_of_day": 14.5,
"day_of_week": 2,
"distance_from_home": 10,
"distance_from_last_transaction": 5,
"time_since_last_transaction": 24,
"num_transactions_today": 2,
"num_transactions_last_week": 8,
"merchant_category": 2,
"is_online": 0,
"card_present": 1,
"is_international": 0,
"avg_transaction_amount": 100,
"account_age_days": 365
}
Response:
{
"transaction_id": "TXN12345",
"fraud_probability": 0.05,
"is_fraud": false,
"risk_level": "MINIMAL",
"decision": "APPROVE",
"timestamp": "2024-02-13T10:30:00"
}
Risk Levels:
MINIMAL: <30% fraud probabilityLOW: 30-50% fraud probabilityMEDIUM: 50-70% fraud probabilityHIGH: 70-90% fraud probabilityCRITICAL: >90% fraud probability
4. Batch Prediction
POST /predict/batch
Content-Type: application/json
{
"transactions": [
{transaction1},
{transaction2},
...
]
}
Response:
{
"total_transactions": 10,
"fraud_detected": 2,
"results": [
{
"transaction_id": "TXN001",
"fraud_probability": 0.95,
"is_fraud": true,
"decision": "BLOCK"
},
...
],
"timestamp": "2024-02-13T10:30:00"
}
π Feature Engineering
The system uses 31 engineered features across 6 categories:
1. Amount Features (5)
amount: Raw transaction amountamount_log: Log-transformed amountamount_zscore: Z-score vs. user's averageis_high_amount: Boolean for amounts >95th percentileis_round_amount: Boolean for round amounts ($10, $50, etc.)
2. Temporal Features (6)
time_of_day: Hour of day (0-24)day_of_week: Day (0=Monday to 6=Sunday)is_night: Late night transactions (10pm-6am)is_weekend: Weekend transactionsis_business_hours: Business hours (9am-5pm)time_since_last_transaction: Hours since last transaction
3. Location Features (5)
distance_from_home: Distance from home address (km)distance_from_last_transaction: Distance from previous transaction (km)location_velocity: Speed of location change (km/hr)is_far_from_home: Boolean for >50km from homeunusual_location_change: Boolean for >100km jumps
4. Velocity Features (5)
num_transactions_today: Count of today's transactionsnum_transactions_last_week: Count in last 7 daysrapid_transactions: Boolean for <1 hour gapshigh_daily_frequency: Boolean for >5 todayhigh_weekly_frequency: Boolean for >15 this week
5. Behavioral Features (7)
merchant_category: Type of merchant (1-8)is_online: Online vs. in-storecard_present: Physical card usedis_international: International transactiononline_without_card: Online + card not presentinternational_online: International + onlinenew_account: Account age <90 days
6. Account Features (3)
avg_transaction_amount: User's average transactionaccount_age_days: Days since account openedrisk_score: Composite risk indicator (0-15)
π Model Architecture
Algorithm: Random Forest Classifier
- Trees: 200 estimators
- Max Depth: 15 levels
- Min Samples Split: 10
- Min Samples Leaf: 5
- Class Weighting: Balanced (handles imbalanced data)
- Feature Selection: Square root of total features per split
Training Data: 100,000 transactions (80% train, 20% test) Feature Scaling: StandardScaler for normalization
π‘ Usage Examples
Python Example
import requests
# Single transaction
transaction = {
"transaction_id": "TXN999",
"amount": 500.00,
"time_of_day": 15.0,
# ... other fields
}
response = requests.post(
"http://localhost:5000/predict",
json=transaction
)
result = response.json()
print(f"Fraud Probability: {result['fraud_probability']:.2%}")
print(f"Decision: {result['decision']}")
cURL Example
curl -X POST http://localhost:5000/predict \
-H "Content-Type: application/json" \
-d '{
"transaction_id": "TXN999",
"amount": 500.00,
"time_of_day": 15.0,
"day_of_week": 2,
"distance_from_home": 10,
"distance_from_last_transaction": 5,
"time_since_last_transaction": 24,
"num_transactions_today": 2,
"num_transactions_last_week": 8,
"merchant_category": 2,
"is_online": 0,
"card_present": 1,
"is_international": 0,
"avg_transaction_amount": 100,
"account_age_days": 365
}'
π¨ Customization
Adjust Model Parameters
Edit fraud_detection_realtime.py:
model = RandomForestClassifier(
n_estimators=200, # More trees = better accuracy, slower training
max_depth=15, # Deeper trees = more complex patterns
# ... adjust other parameters
)
Change Training Data Size
df = generate_large_transaction_data(n_samples=500000) # 500K transactions
Modify Risk Thresholds
Edit fraud_api.py:
# Adjust risk levels
if fraud_probability >= 0.8: # Was 0.9
risk_level = "CRITICAL"
π Security Considerations
- API Authentication: Add JWT tokens or API keys
- Rate Limiting: Implement request throttling
- HTTPS: Use SSL/TLS in production
- Input Validation: Sanitize all inputs
- Logging: Implement comprehensive audit logs
- Model Security: Encrypt model files
π Monitoring & Maintenance
Model Retraining
- Retrain weekly/monthly with new fraud patterns
- Monitor model drift and performance degradation
- A/B test new models before deployment
Performance Monitoring
- Track prediction latency
- Monitor false positive/negative rates
- Alert on unusual fraud patterns
Logging
All predictions are logged with:
- Transaction ID
- Prediction result
- Timestamp
- Processing time
π Production Deployment
Option 1: Docker
FROM python:3.9
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "fraud_api.py"]
Option 2: Cloud Deployment
- AWS: Lambda + API Gateway
- Google Cloud: Cloud Run + Cloud Functions
- Azure: Azure Functions + API Management
Option 3: Kubernetes
Deploy as a microservice with auto-scaling
π Files Description
| File | Purpose |
|---|---|
fraud_detection_realtime.py |
Main training script with large-scale data |
fraud_api.py |
Flask REST API server |
test_api.py |
API testing and load testing |
requirements.txt |
Python dependencies |
fraud_model.pkl |
Saved trained model (generated) |
π€ Contributing
- Add new features to feature engineering
- Experiment with different ML algorithms
- Improve API performance
- Add monitoring and dashboards
π License
This is a demonstration/educational project for learning ML in production.
π Learning Resources
- Scikit-learn: https://scikit-learn.org/
- Flask: https://flask.palletsprojects.com/
- Fraud Detection: Research papers on credit card fraud
β οΈ Disclaimer
This is a demonstration system using synthetic data. For production use:
- Use real transaction data
- Implement proper security
- Comply with PCI-DSS standards
- Add comprehensive monitoring
- Regular model updates
Built with β€οΈ for learning production ML systems