Spaces:

Learnerbegginer
/

fraud-detection-system

Sleeping

App Files Files Community

Learnerbegginer commited on Dec 31, 2025

Commit

c302dd6

0 Parent(s):

Initial commit: AI Powered Transaction Fraud Detection System

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +22 -0
README.md +269 -0
app.py +306 -0
docker-compose.yml +27 -0
drift/adapter.py +0 -0
drift/detector.py +61 -0
fraud_detection.ipynb +0 -0
graph_models/data_loader.py +38 -0
graph_models/gnn_model.py +45 -0
graph_models/train_gnn.py +35 -0
mlruns/0/meta.yaml +6 -0
mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/meta.yaml +15 -0
mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/metrics/roc_auc +1 -0
mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.log-model.history +1 -0
mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.runName +1 -0
mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.source.name +1 -0
mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.source.type +1 -0
mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.user +1 -0
mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/meta.yaml +15 -0
mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/metrics/roc_auc +1 -0
mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.log-model.history +1 -0
mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.runName +1 -0
mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.source.name +1 -0
mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.source.type +1 -0
mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.user +1 -0
mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/meta.yaml +15 -0
mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/metrics/roc_auc +1 -0
mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.log-model.history +1 -0
mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.runName +1 -0
mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.source.name +1 -0
mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.source.type +1 -0
mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.user +1 -0
mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/meta.yaml +15 -0
mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/metrics/roc_auc +1 -0
mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.log-model.history +1 -0
mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.runName +1 -0
mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.source.name +1 -0
mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.source.type +1 -0
mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.user +1 -0
mlruns/588265755531758591/12c38a58707142b49abf42946712e666/meta.yaml +15 -0
mlruns/588265755531758591/12c38a58707142b49abf42946712e666/metrics/roc_auc +1 -0
mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.log-model.history +1 -0
mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.runName +1 -0
mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.source.name +1 -0
mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.source.type +1 -0
mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.user +1 -0
mlruns/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/meta.yaml +15 -0
mlruns/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/metrics/roc_auc +1 -0
mlruns/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/tags/mlflow.log-model.history +1 -0
mlruns/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/tags/mlflow.runName +1 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,22 @@

+# Python
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.env
+venv/
+.env/
+# ML / Data
+data/
+reports/
+*.csv
+*.log
+# Model artifacts (optional – keep if asked by reviewer)
+trained_models/*.pkl
+models/*.pt
+# OS
+.DS_Store
+Thumbs.db

README.md ADDED Viewed

	@@ -0,0 +1,269 @@

+🛡️ AI-Powered Transaction Fraud Detection System
+📌 Project Overview
+The AI-Powered Transaction Fraud Detection System is a real-time financial fraud monitoring platform designed to detect, analyze, and report suspicious transactions using Machine Learning, Graph Neural Networks (GNNs), and Explainable AI (SHAP).
+The system continuously ingests transactions, evaluates fraud risk using multiple models, visualizes insights through an interactive dashboard, and generates Suspicious Activity Reports (SAR) in PDF format.
+This project follows industry-grade architecture and demonstrates concepts from:
+Cybersecurity
+Machine Learning
+Data Science
+Web Application Development
+Model Monitoring & Drift Detection
+🎯 Key Objectives
+Detect fraudulent financial transactions in real time
+Combine multiple ML models for higher accuracy
+Provide explainability for fraud predictions
+Visualize risk trends and transaction networks
+Generate regulatory-ready SAR reports
+Support continuous model monitoring and improvement
+🧠 System Architecture
+Frontend
+HTML5, CSS3, Bootstrap 5
+Chart.js (Risk charts & trends)
+Vis.js (Transaction network graph)
+JavaScript (Real-time updates)
+Backend
+Flask (Python web framework)
+REST APIs for data exchange
+Background threads for live transaction simulation
+Machine Learning
+Isolation Forest (Anomaly Detection)
+XGBoost (Supervised Fraud Classification)
+Graph Neural Network (Relationship-based fraud detection)
+SHAP (Explainable AI)
+Other Components
+Concept Drift Detection
+AutoML-based retraining
+SAR PDF generation using ReportLab
+🧩 Core Features
+🔹 Real-Time Transaction Monitoring
+Live transaction feed
+Automatic refresh every few seconds
+Risk-based color coding
+🔹 Fraud Detection Models
+Isolation Forest – Detects anomalies
+XGBoost – Predicts fraud probability
+GNN – Detects suspicious account-merchant-device relationships
+🔹 Composite Risk Scoring
+A weighted risk score combining:
+Isolation Forest score
+XGBoost probability
+GNN probability
+Customer risk profile
+🔹 Explainable AI (SHAP)
+Displays top contributing risk features
+Improves transparency and trust
+Helps analysts understand why a transaction is flagged
+🔹 Risk Visualization Dashboard
+Risk distribution (Low / Medium / High)
+Average risk trends
+Top risk indicators
+Interactive transaction table
+🔹 Transaction Network Graph
+Visualizes relationships between:
+Accounts
+Merchants
+Devices
+Helps identify fraud rings and suspicious behavior
+🔹 Suspicious Activity Report (SAR)
+One-click SAR generation
+Automatically includes high-risk transactions
+Downloadable PDF report
+🔹 Concept Drift Detection
+Monitors data distribution changes
+Flags model drift risks
+Supports long-term model reliability
+📁 Project Directory Structure
+AI-Powered-Transaction-Fraud-Detection-System/
+│
+├── app.py                         # Flask backend
+├── templates/
+│   └── dashboard.html             # Frontend dashboard
+│
+├── trained_models/
+│   ├── isolation_forest.pkl
+│   ├── xgboost.pkl
+│   └── shap_explainer.pkl
+│
+├── graph_models/
+│   ├── gnn_model.py
+│   └── data_loader.py
+│
+├── models/
+│   └── automl/
+│       └── trainer.py
+│
+├── drift/
+│   └── detector.py
+│
+├── profiling/
+│   └── builder.py
+│
+├── reporting/
+│   └── generator.py
+│
+├── data/
+│   └── bank_transactions_data_2.csv
+│
+└── README.md
+⚙️ Installation & Setup (Local Execution)
+1️⃣ Create Virtual Environment
+python -m venv venv
+venv\Scripts\activate   # Windows
+2️⃣ Install Dependencies
+pip install -r requirements.txt
+3️⃣ Run the Application
+python app.py
+4️⃣ Access the Dashboard
+Open your browser and visit:
+http://127.0.0.1:5000
+🧪 How the System Works (Execution Flow)
+Dummy or real transactions are generated
+Data is sent to backend APIs
+ML models compute fraud risk
+SHAP explains model decisions
+Dashboard updates in real time
+High-risk transactions trigger SAR reports
+📊 APIs Overview
+Endpoint	Method	Description
+/api/transactions	GET	Fetch recent transactions
+/api/analyze	POST	Analyze a transaction
+/api/reports/sar	POST	Generate SAR PDF
+/api/drift/status	GET	Concept drift status
+🔒 Security Considerations
+Backend APIs are modular and extendable
+Can be integrated with authentication systems
+Ready for production-grade deployment
+🚀 Future Enhancements
+User authentication & role-based access
+Database integration (PostgreSQL / MongoDB)
+Real banking transaction feeds
+Advanced fraud pattern learning
+Cloud deployment (AWS / Azure)
+SOC-style alerting system
+🎓 Academic Relevance
+This project demonstrates:
+Applied Machine Learning
+Cybersecurity analytics
+Explainable AI
+Full-stack development
+Real-time monitoring systems
+Suitable for:
+Major Project
+Final Year Project
+Capstone Project
+Research-oriented submissions
+👤 Author
+Saheel Yadav
+B.Tech – Computer Science Engineering
+Specialization: Cybersecurity & AI

app.py ADDED Viewed

	@@ -0,0 +1,306 @@

+from flask import Flask, render_template, request, jsonify, send_file
+import pandas as pd
+import joblib
+import numpy as np
+import io
+from reportlab.lib.pagesizes import A4
+from reportlab.pdfgen import canvas
+from datetime import datetime
+import torch
+import mlflow
+import threading
+import time
+from graph_models.gnn_model import load_gnn_model
+from graph_models.data_loader import TransactionGraphBuilder
+from reporting.generator import ReportGenerator
+from profiling.builder import CustomerRiskProfiler
+from drift.detector import ConceptDriftDetector
+from models.automl.trainer import AutoMLTrainer
+import os
+import logging
+import random
+TRANSACTIONS = []
+import random
+def generate_dummy_transaction():
+    return {
+        "TransactionID": f"TX{random.randint(100000, 999999)}",
+        "AccountID": f"AC{random.randint(10000, 99999)}",
+        "TransactionAmount": round(random.uniform(10, 5000), 2),
+        "TransactionDate": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+        "TransactionType": random.choice(["Debit", "Credit"]),
+        "Location": random.choice([
+            "New York, NY", "Chicago, IL", "Miami, FL"
+        ]),
+        "RiskScore": round(random.uniform(0, 1), 2),
+        "Status": random.choice(["Approved", "Flagged", "Pending Review"])
+    }
+def transaction_generator_loop():
+    while True:
+        txn = generate_dummy_transaction()
+        TRANSACTIONS.insert(0, txn)
+        # keep only latest 20 transactions
+        if len(TRANSACTIONS) > 20:
+            TRANSACTIONS.pop()
+        time.sleep(random.randint(120, 300))  # 2–5 minutes
+threading.Thread(
+    target=transaction_generator_loop,
+    daemon=True
+).start()
+# Preload initial transactions for better UX
+for _ in range(5):
+    TRANSACTIONS.append(generate_dummy_transaction())
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+app = Flask(__name__)
+# Initialize components
+iso_forest = joblib.load('trained_models/isolation_forest.pkl')
+xgb = joblib.load('trained_models/xgboost.pkl')
+# Load SHAP explainer if available (optional)
+try:
+    shap_explainer = joblib.load('trained_models/shap_explainer.pkl')
+except FileNotFoundError:
+    shap_explainer = None
+    print("SHAP explainer not found. Continuing without explainability.")
+gnn_model = load_gnn_model('models/gnn_model.pt')
+graph_builder = TransactionGraphBuilder()
+report_generator = ReportGenerator()
+profiler = CustomerRiskProfiler()
+drift_detector = ConceptDriftDetector()
+# Feature names
+features = ['TransactionAmount', 'TransactionDuration', 'LoginAttempts',
+            'AccountBalance', 'DaysSinceLastTransaction', 'TransactionSpeed',
+            'AvgAmount', 'StdAmount', 'MaxAmount', 'AvgDuration', 'UniqueLocations',
+            'AmountDeviation', 'DurationDeviation', 'TransactionType',
+            'Location', 'DeviceID', 'MerchantID', 'Channel', 'CustomerOccupation']
+# Background tasks
+def auto_retrain():
+    while True:
+        try:
+            trainer = AutoMLTrainer("data/bank_transactions_data_2.csv")
+            best_model, score = trainer.train_models()
+            app.logger.info(f"AutoML retraining completed. Best model: {type(best_model).__name__} with score: {score:.4f}")
+        except Exception as e:
+            app.logger.error(f"AutoML retraining failed: {str(e)}")
+        time.sleep(7 * 24 * 60 * 60)  # Run weekly
+# Start background thread
+retrain_thread = threading.Thread(target=auto_retrain, daemon=True)
+retrain_thread.start()
+# Initialize AutoML Trainer with proper error handling
+try:
+    automl_trainer = AutoMLTrainer("data/bank_transactions_data_2.csv")
+    # Check if models exist, if not train initial models
+    required_models = ['isolation_forest.pkl', 'xgboost.pkl', 'shap_explainer.pkl']
+    if not all(os.path.exists(f"trained_models/{model}") for model in required_models):
+        logger.info("Initial models not found, training initial models...")
+        automl_trainer.train_models()
+except Exception as e:
+    logger.error(f"Failed to initialize AutoML trainer: {str(e)}")
+    raise
+@app.route('/')
+def dashboard():
+    return render_template('dashboard.html')
+@app.route('/api/analyze', methods=['POST'])
+def analyze_transaction():
+    data = request.json
+    # Update customer profile
+    profiler.update_profile(data['AccountID'], {
+        'amount': float(data['TransactionAmount']),
+        'type': data['TransactionType'],
+        'date': data['TransactionDate']
+    })
+    # Get customer stats
+    cust_profile = profiler.get_risk_profile(data['AccountID'])
+    cust_stats = {
+        'AvgAmount': cust_profile.get('avg_amount', 150.0),
+        'StdAmount': cust_profile.get('std_amount', 75.0),
+        'MaxAmount': cust_profile.get('max_amount', 1000.0),
+        'AvgDuration': cust_profile.get('avg_duration', 120.0),
+        'UniqueLocations': cust_profile.get('unique_locations', 3)
+    }
+    # Create feature vector
+    transaction_date = datetime.strptime(data['TransactionDate'], '%Y-%m-%d %H:%M:%S')
+    prev_date = datetime.strptime(data['PreviousTransactionDate'], '%Y-%m-%d %H:%M:%S')
+    features_dict = {
+        'TransactionAmount': float(data['TransactionAmount']),
+        'TransactionDuration': float(data['TransactionDuration']),
+        'LoginAttempts': int(data['LoginAttempts']),
+        'AccountBalance': float(data['AccountBalance']),
+        'DaysSinceLastTransaction': (datetime.now() - prev_date).days,
+        'TransactionSpeed': float(data['TransactionAmount']) / float(data['TransactionDuration']),
+        'AvgAmount': cust_stats['AvgAmount'],
+        'StdAmount': cust_stats['StdAmount'],
+        'MaxAmount': cust_stats['MaxAmount'],
+        'AvgDuration': cust_stats['AvgDuration'],
+        'UniqueLocations': cust_stats['UniqueLocations'],
+        'AmountDeviation': (float(data['TransactionAmount']) - cust_stats['AvgAmount']) / cust_stats['StdAmount'],
+        'DurationDeviation': (float(data['TransactionDuration']) - cust_stats['AvgDuration']) / cust_stats['AvgDuration'],
+        'TransactionType': 0 if data['TransactionType'] == 'Debit' else 1,
+        'Location': hash(data['Location']) % 100,
+        'DeviceID': hash(data['DeviceID']) % 100,
+        'MerchantID': hash(data['MerchantID']) % 100,
+        'Channel': {'ATM': 0, 'Online': 1, 'Branch': 2}.get(data['Channel'], 0),
+        'CustomerOccupation': {'Student': 0, 'Doctor': 1, 'Engineer': 2, 'Retired': 3}.get(data['CustomerOccupation'], 0)
+    }
+    # Convert to DataFrame for prediction
+    X = pd.DataFrame([features_dict], columns=features)
+    # Check for concept drift
+    drift_detector.add_data(X.values[0])
+    # Get predictions
+    iso_score = -iso_forest.decision_function(X)[0]
+    xgb_prob = xgb.predict_proba(X)[0, 1]
+    # GNN prediction
+    graph_data = graph_builder.add_transaction(data)
+    with torch.no_grad():
+        gnn_prob = gnn_model(graph_data.x, graph_data.edge_index).item()
+    explanation = []
+    # --- SHAP explanations ---
+    explanation = []
+    if shap_explainer is not None:
+        shap_values = shap_explainer.shap_values(X)
+        for i, feature in enumerate(features):
+            explanation.append({
+                'feature': feature,
+                'value': X.iloc[0, i],
+                'shap_value': shap_values[0][i]
+            })
+    # Sort explanation (works even if empty)
+    explanation.sort(key=lambda x: abs(x['shap_value']), reverse=True)
+    # Composite score weighted by customer risk profile
+    cust_risk = cust_profile['risk_score'] if cust_profile else 0.5
+    composite_score = (
+        iso_score * 0.4 +
+        xgb_prob * 0.4 +
+        gnn_prob * 0.2
+    ) * (0.5 + cust_risk)
+    return jsonify({
+        'isolation_forest_score': float(iso_score),
+        'xgboost_probability': float(xgb_prob),
+        'gnn_probability': float(gnn_prob),
+        'composite_score': float(composite_score),
+        'customer_risk_score': float(cust_risk),
+        'explanation': explanation[:5],
+        'drift_detected': drift_detector.drift_count > 0
+    })
+from datetime import timedelta
+@app.route('/api/transactions')
+def get_recent_transactions():
+    days = request.args.get('days', default=1, type=int)
+    cutoff = datetime.now() - timedelta(days=days)
+    filtered = [
+        t for t in TRANSACTIONS
+        if datetime.strptime(
+            t['TransactionDate'], "%Y-%m-%d %H:%M:%S"
+        ) >= cutoff
+    ]
+    return jsonify(filtered)
+@app.route('/api/reports/sar', methods=['POST'])
+def generate_sar():
+    payload = request.json
+    transactions = payload.get("transactions", TRANSACTIONS)
+    buffer = io.BytesIO()
+    pdf = canvas.Canvas(buffer, pagesize=A4)
+    pdf.setFont("Helvetica", 12)
+    pdf.drawString(50, 800, "Suspicious Activity Report (SAR)")
+    pdf.drawString(50, 780, f"Generated: {datetime.now()}")
+    y = 750
+    for tx in transactions:
+        pdf.drawString(
+            50, y,
+            f"{tx['TransactionID']} | {tx['AccountID']} | "
+            f"Amount: {tx['TransactionAmount']} | Risk: {tx['RiskScore']}"
+        )
+        y -= 18
+        if y < 50:
+            pdf.showPage()
+            y = 800
+    pdf.save()
+    buffer.seek(0)
+    return send_file(
+        buffer,
+        as_attachment=True,
+        download_name="SAR_Report.pdf",
+        mimetype="application/pdf"
+    )
+@app.route('/api/customer/<customer_id>/profile')
+def get_customer_profile(customer_id):
+    profile = profiler.get_risk_profile(customer_id)
+    if profile:
+        return jsonify(profile)
+    return jsonify({"error": "Customer not found"}), 404
+@app.route('/api/models/retrain', methods=['POST'])
+def trigger_retraining():
+    try:
+        trainer = AutoMLTrainer("data/bank_transactions_data_2.csv")
+        best_model, score = trainer.train_models()
+        return jsonify({
+            "status": "success",
+            "best_model": type(best_model).__name__,
+            "score": score
+        })
+    except Exception as e:
+        return jsonify({"status": "error", "message": str(e)}), 500
+@app.route('/api/drift/status')
+def get_drift_status():
+    return jsonify({
+        "drift_detected": drift_detector.drift_count > 0,
+        "drift_count": drift_detector.drift_count
+    })
+if __name__ == '__main__':
+    # Create required directories
+    import os
+    os.makedirs("reports", exist_ok=True)
+    os.makedirs("data", exist_ok=True)
+    # Initialize MLflow
+    mlflow.set_tracking_uri("http://localhost:5001")
+    app.run(debug=True, host='0.0.0.0')

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,27 @@

+version: '3'
+services:
+  mlflow:
+    image: python:3.8
+    command: >
+      sh -c "pip install mlflow &&
+      mlflow server --backend-store-uri sqlite:///mlflow.db
+                   --default-artifact-root ./mlruns
+                   --host 0.0.0.0
+                   --port 5000"
+    ports:
+      - "5000:5000"
+    volumes:
+      - ./mlruns:/mlruns
+      - ./mlflow.db:/mlflow.db
+  dashboard:
+    build: .
+    ports:
+      - "5001:5001"
+    depends_on:
+      - mlflow
+    environment:
+      - MLFLOW_TRACKING_URI=http://mlflow:5000
+    volumes:
+      - ./models:/app/models
+      - ./data:/app/data

drift/adapter.py ADDED Viewed

File without changes

drift/detector.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import numpy as np
+from scipy.stats import ks_2samp
+from sklearn.covariance import MinCovDet
+import warnings
+class ConceptDriftDetector:
+    def __init__(self, window_size=1000):
+        self.window_size = window_size
+        self.reference_window = None
+        self.current_window = []
+        self.drift_count = 0
+    def add_data(self, features):
+        if self.reference_window is None:
+            if len(self.current_window) < self.window_size:
+                self.current_window.append(features)
+            else:
+                self.reference_window = np.array(self.current_window)
+                self.current_window = []
+        else:
+            if len(self.current_window) < self.window_size:
+                self.current_window.append(features)
+            else:
+                self._test_for_drift()
+                self.current_window = []
+    def _test_for_drift(self):
+        current_data = np.array(self.current_window)
+        # 1. Kolmogorov-Smirnov test for each feature
+        p_values = []
+        for i in range(self.reference_window.shape[1]):
+            try:
+                _, p_value = ks_2samp(self.reference_window[:, i], current_data[:, i])
+                p_values.append(p_value)
+            except:
+                p_values.append(1.0)
+        # 2. Covariance shift detection
+        robust_cov = MinCovDet().fit(self.reference_window)
+        try:
+            cov_score = robust_cov.mahalanobis(current_data).mean()
+            cov_threshold = robust_cov.mahalanobis(self.reference_window).mean() * 1.5
+        except:
+            cov_score = 0
+            cov_threshold = 0
+        # Combined decision
+        significant_drift = any(p < 0.01 for p in p_values) or cov_score > cov_threshold
+        if significant_drift:
+            self.drift_count += 1
+            if self.drift_count >= 3:  # Persistent drift
+                self._alert_drift()
+                self.drift_count = 0
+    def _alert_drift(self):
+        # In practice, this would trigger model retraining
+        print("Warning: Significant concept drift detected!")
+        # Could integrate with AutoML retraining

fraud_detection.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

graph_models/data_loader.py ADDED Viewed

	@@ -0,0 +1,38 @@

+import torch
+from torch_geometric.data import Data
+from collections import defaultdict
+class TransactionGraphBuilder:
+    def __init__(self):
+        self.node_index = defaultdict(int)
+        self.current_id = 0
+        self.edges = []
+        self.node_features = []
+        self.node_types = []
+    def get_node_id(self, node_key, node_type):
+        if node_key not in self.node_index:
+            self.node_index[node_key] = self.current_id
+            self.current_id += 1
+            # Simple feature representation
+            self.node_features.append([1.0 if i == node_type else 0.0 for i in range(3)])
+            self.node_types.append(node_type)
+        return self.node_index[node_key]
+    def add_transaction(self, transaction):
+        # Account node (type 0)
+        acc_id = self.get_node_id(transaction['AccountID'], 0)
+        # Merchant node (type 1)
+        merchant_id = self.get_node_id(transaction['MerchantID'], 1)
+        # Device node (type 2)
+        device_id = self.get_node_id(transaction['DeviceID'], 2)
+        # Add edges
+        self.edges.append((acc_id, merchant_id))
+        self.edges.append((acc_id, device_id))
+        # Convert to PyG format
+        edge_index = torch.tensor(list(zip(*self.edges)), dtype=torch.long)
+        x = torch.tensor(self.node_features, dtype=torch.float)
+        return Data(x=x, edge_index=edge_index)

graph_models/gnn_model.py ADDED Viewed

	@@ -0,0 +1,45 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch_geometric.nn import GCNConv
+import os
+class FraudGNN(nn.Module):
+    def __init__(self, num_node_features, hidden_channels):
+        super(FraudGNN, self).__init__()
+        self.conv1 = GCNConv(num_node_features, hidden_channels)
+        self.conv2 = GCNConv(hidden_channels, hidden_channels)
+        self.classifier = nn.Linear(hidden_channels, 1)
+    def forward(self, x, edge_index):
+        # Node embeddings
+        x = self.conv1(x, edge_index)
+        x = F.relu(x)
+        x = F.dropout(x, training=self.training)
+        x = self.conv2(x, edge_index)
+        # Graph-level classification
+        x = torch.mean(x, dim=0)  # Global mean pooling
+        x = self.classifier(x)
+        return torch.sigmoid(x)
+def load_gnn_model(model_path='trained_models/gnn_model.pt', device='cpu'):
+    # Create models directory if it doesn't exist
+    os.makedirs('models', exist_ok=True)
+    # Initialize model
+    model = FraudGNN(num_node_features=32, hidden_channels=64)
+    try:
+        # Try to load pretrained weights
+        model.load_state_dict(torch.load(model_path))
+        print(f"Loaded GNN model from {model_path}")
+    except FileNotFoundError:
+        # If no model exists, initialize with random weights and save
+        print(f"No model found at {model_path}, creating new model")
+        torch.save(model.state_dict(), model_path)
+        print(f"New model saved to {model_path}")
+    model.to(device)
+    model.eval()
+    return model

graph_models/train_gnn.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import torch
+from torch_geometric.data import Data
+from gnn_model import FraudGNN
+import os
+def train_and_save_gnn_model():
+    # Create sample data for demonstration
+    num_nodes = 100
+    num_features = 32
+    x = torch.randn((num_nodes, num_features))
+    edge_index = torch.randint(0, num_nodes, (2, 200))
+    y = torch.randint(0, 2, (1,)).float()
+    # Initialize model
+    model = FraudGNN(num_node_features=num_features, hidden_channels=64)
+    # Simple training loop (for demonstration)
+    optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
+    criterion = nn.BCELoss()
+    for epoch in range(10):
+        optimizer.zero_grad()
+        out = model(x, edge_index)
+        loss = criterion(out, y)
+        loss.backward()
+        optimizer.step()
+        print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
+    # Save model
+    os.makedirs('models', exist_ok=True)
+    torch.save(model.state_dict(), 'trained_models/gnn_model.pt')
+    print("GNN model saved to models/gnn_model.pt")
+if __name__ == '__main__':
+    train_and_save_gnn_model()

mlruns/0/meta.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+artifact_location: mlflow-artifacts:/0
+creation_time: 1766465772017
+experiment_id: '0'
+last_update_time: 1766465772017
+lifecycle_stage: active
+name: Default

mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/meta.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+artifact_uri: mlflow-artifacts:/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/artifacts
+end_time: 1766478502132
+entry_point_name: ''
+experiment_id: '588265755531758591'
+lifecycle_stage: active
+run_id: 0311fe8dff3e46ee9d5c6c013e0580bc
+run_name: isolation_forest_2025-12-23 13:58:14.718726
+run_uuid: 0311fe8dff3e46ee9d5c6c013e0580bc
+source_name: ''
+source_type: 4
+source_version: ''
+start_time: 1766478494726
+status: 3
+tags: []
+user_id: Saheel Yadav

mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/metrics/roc_auc ADDED Viewed

	@@ -0,0 +1 @@


1	+ 1766478495195 0.8415397408963584 0

mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.log-model.history ADDED Viewed

	@@ -0,0 +1 @@

+ [{"run_id": "0311fe8dff3e46ee9d5c6c013e0580bc", "artifact_path": "isolation_forest", "utc_time_created": "2025-12-23 08:28:15.216022", "model_uuid": "e6224c07956145faa2bc6621cb207dbc", "flavors": {"python_function": {"model_path": "model.pkl", "predict_fn": "predict", "loader_module": "mlflow.sklearn", "python_version": "3.10.6", "env": {"conda": "conda.yaml", "virtualenv": "python_env.yaml"}}, "sklearn": {"pickled_model": "model.pkl", "sklearn_version": "1.6.1", "serialization_format": "cloudpickle", "code": null}}}]

mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.runName ADDED Viewed

	@@ -0,0 +1 @@


1	+ isolation_forest_2025-12-23 13:58:14.718726

mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.source.name ADDED Viewed

	@@ -0,0 +1 @@


1	+ app.py

mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.source.type ADDED Viewed

	@@ -0,0 +1 @@


1	+ LOCAL

mlruns/588265755531758591/0311fe8dff3e46ee9d5c6c013e0580bc/tags/mlflow.user ADDED Viewed

	@@ -0,0 +1 @@


1	+ Saheel Yadav

mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/meta.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+artifact_uri: mlflow-artifacts:/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/artifacts
+end_time: 1766478455065
+entry_point_name: ''
+experiment_id: '588265755531758591'
+lifecycle_stage: active
+run_id: 04c33c5f043e4977b3b7a930580a2dcb
+run_name: random_forest_2025-12-23 13:57:16.650560
+run_uuid: 04c33c5f043e4977b3b7a930580a2dcb
+source_name: ''
+source_type: 4
+source_version: ''
+start_time: 1766478436929
+status: 3
+tags: []
+user_id: Saheel Yadav

mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/metrics/roc_auc ADDED Viewed

	@@ -0,0 +1 @@


1	+ 1766478438200 0.9999781162464987 0

mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.log-model.history ADDED Viewed

	@@ -0,0 +1 @@

+ [{"run_id": "04c33c5f043e4977b3b7a930580a2dcb", "artifact_path": "random_forest", "utc_time_created": "2025-12-23 08:27:18.238060", "model_uuid": "0673e163efb24bd685ad61bd0ab4966d", "flavors": {"python_function": {"model_path": "model.pkl", "predict_fn": "predict", "loader_module": "mlflow.sklearn", "python_version": "3.10.6", "env": {"conda": "conda.yaml", "virtualenv": "python_env.yaml"}}, "sklearn": {"pickled_model": "model.pkl", "sklearn_version": "1.6.1", "serialization_format": "cloudpickle", "code": null}}}]

mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.runName ADDED Viewed

	@@ -0,0 +1 @@


1	+ random_forest_2025-12-23 13:57:16.650560

mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.source.name ADDED Viewed

	@@ -0,0 +1 @@


1	+ app.py

mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.source.type ADDED Viewed

	@@ -0,0 +1 @@


1	+ LOCAL

mlruns/588265755531758591/04c33c5f043e4977b3b7a930580a2dcb/tags/mlflow.user ADDED Viewed

	@@ -0,0 +1 @@


1	+ Saheel Yadav

mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/meta.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+artifact_uri: mlflow-artifacts:/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/artifacts
+end_time: 1766478488484
+entry_point_name: ''
+experiment_id: '588265755531758591'
+lifecycle_stage: active
+run_id: 0ceb1e53b25f47d0b62d41bdb664e954
+run_name: random_forest_2025-12-23 13:57:57.276290
+run_uuid: 0ceb1e53b25f47d0b62d41bdb664e954
+source_name: ''
+source_type: 4
+source_version: ''
+start_time: 1766478477505
+status: 3
+tags: []
+user_id: Saheel Yadav

mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/metrics/roc_auc ADDED Viewed

	@@ -0,0 +1 @@


1	+ 1766478478238 0.9999781162464986 0

mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.log-model.history ADDED Viewed

	@@ -0,0 +1 @@

+ [{"run_id": "0ceb1e53b25f47d0b62d41bdb664e954", "artifact_path": "random_forest", "utc_time_created": "2025-12-23 08:27:58.260159", "model_uuid": "dc3b7db6cd894bf996ba04c2fc45630b", "flavors": {"python_function": {"model_path": "model.pkl", "predict_fn": "predict", "loader_module": "mlflow.sklearn", "python_version": "3.10.6", "env": {"conda": "conda.yaml", "virtualenv": "python_env.yaml"}}, "sklearn": {"pickled_model": "model.pkl", "sklearn_version": "1.6.1", "serialization_format": "cloudpickle", "code": null}}}]

mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.runName ADDED Viewed

	@@ -0,0 +1 @@


1	+ random_forest_2025-12-23 13:57:57.276290

mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.source.name ADDED Viewed

	@@ -0,0 +1 @@


1	+ app.py

mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.source.type ADDED Viewed

	@@ -0,0 +1 @@


1	+ LOCAL

mlruns/588265755531758591/0ceb1e53b25f47d0b62d41bdb664e954/tags/mlflow.user ADDED Viewed

	@@ -0,0 +1 @@


1	+ Saheel Yadav

mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/meta.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+artifact_uri: mlflow-artifacts:/588265755531758591/0d42202ebbf14102b9771574825528e2/artifacts
+end_time: 1766473160955
+entry_point_name: ''
+experiment_id: '588265755531758591'
+lifecycle_stage: active
+run_id: 0d42202ebbf14102b9771574825528e2
+run_name: isolation_forest_2025-12-23 12:29:15.631115
+run_uuid: 0d42202ebbf14102b9771574825528e2
+source_name: ''
+source_type: 4
+source_version: ''
+start_time: 1766473155636
+status: 3
+tags: []
+user_id: Saheel Yadav

mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/metrics/roc_auc ADDED Viewed

	@@ -0,0 +1 @@


1	+ 1766473155979 0.8715204831932772 0

mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.log-model.history ADDED Viewed

	@@ -0,0 +1 @@

+ [{"run_id": "0d42202ebbf14102b9771574825528e2", "artifact_path": "isolation_forest", "utc_time_created": "2025-12-23 06:59:15.990127", "model_uuid": "f4ef2bef9eef4d629e898d4b53ce4f3c", "flavors": {"python_function": {"model_path": "model.pkl", "predict_fn": "predict", "loader_module": "mlflow.sklearn", "python_version": "3.10.6", "env": {"conda": "conda.yaml", "virtualenv": "python_env.yaml"}}, "sklearn": {"pickled_model": "model.pkl", "sklearn_version": "1.6.1", "serialization_format": "cloudpickle", "code": null}}}]

mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.runName ADDED Viewed

	@@ -0,0 +1 @@


1	+ isolation_forest_2025-12-23 12:29:15.631115

mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.source.name ADDED Viewed

	@@ -0,0 +1 @@


1	+ app.py

mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.source.type ADDED Viewed

	@@ -0,0 +1 @@


1	+ LOCAL

mlruns/588265755531758591/0d42202ebbf14102b9771574825528e2/tags/mlflow.user ADDED Viewed

	@@ -0,0 +1 @@


1	+ Saheel Yadav

mlruns/588265755531758591/12c38a58707142b49abf42946712e666/meta.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+artifact_uri: mlflow-artifacts:/588265755531758591/12c38a58707142b49abf42946712e666/artifacts
+end_time: 1766466362212
+entry_point_name: ''
+experiment_id: '588265755531758591'
+lifecycle_stage: active
+run_id: 12c38a58707142b49abf42946712e666
+run_name: random_forest_2025-12-23 10:35:44.274066
+run_uuid: 12c38a58707142b49abf42946712e666
+source_name: ''
+source_type: 4
+source_version: ''
+start_time: 1766466344647
+status: 3
+tags: []
+user_id: Saheel Yadav

mlruns/588265755531758591/12c38a58707142b49abf42946712e666/metrics/roc_auc ADDED Viewed

	@@ -0,0 +1 @@


1	+ 1766466346831 0.9999781162464987 0

mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.log-model.history ADDED Viewed

	@@ -0,0 +1 @@

+ [{"run_id": "12c38a58707142b49abf42946712e666", "artifact_path": "random_forest", "utc_time_created": "2025-12-23 05:05:46.868461", "model_uuid": "c2bc64e4725b4427bcfc4308e84c030e", "flavors": {"python_function": {"model_path": "model.pkl", "predict_fn": "predict", "loader_module": "mlflow.sklearn", "python_version": "3.10.6", "env": {"conda": "conda.yaml", "virtualenv": "python_env.yaml"}}, "sklearn": {"pickled_model": "model.pkl", "sklearn_version": "1.6.1", "serialization_format": "cloudpickle", "code": null}}}]

mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.runName ADDED Viewed

	@@ -0,0 +1 @@


1	+ random_forest_2025-12-23 10:35:44.274066

mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.source.name ADDED Viewed

	@@ -0,0 +1 @@


1	+ app.py

mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.source.type ADDED Viewed

	@@ -0,0 +1 @@


1	+ LOCAL

mlruns/588265755531758591/12c38a58707142b49abf42946712e666/tags/mlflow.user ADDED Viewed

	@@ -0,0 +1 @@


1	+ Saheel Yadav

mlruns/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/meta.yaml ADDED Viewed

	@@ -0,0 +1,15 @@

+artifact_uri: mlflow-artifacts:/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/artifacts
+end_time: 1766473383191
+entry_point_name: ''
+experiment_id: '588265755531758591'
+lifecycle_stage: active
+run_id: 159a3dd3b1204e7fa3008a0eb85f5678
+run_name: random_forest_2025-12-23 12:32:51.249903
+run_uuid: 159a3dd3b1204e7fa3008a0eb85f5678
+source_name: ''
+source_type: 4
+source_version: ''
+start_time: 1766473371623
+status: 3
+tags: []
+user_id: Saheel Yadav

mlruns/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/metrics/roc_auc ADDED Viewed

	@@ -0,0 +1 @@


1	+ 1766473373025 0.9999781162464986 0

mlruns/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/tags/mlflow.log-model.history ADDED Viewed

	@@ -0,0 +1 @@

+ [{"run_id": "159a3dd3b1204e7fa3008a0eb85f5678", "artifact_path": "random_forest", "utc_time_created": "2025-12-23 07:02:53.046664", "model_uuid": "75d2ea2144cc437c886f9b57ce663873", "flavors": {"python_function": {"model_path": "model.pkl", "predict_fn": "predict", "loader_module": "mlflow.sklearn", "python_version": "3.10.6", "env": {"conda": "conda.yaml", "virtualenv": "python_env.yaml"}}, "sklearn": {"pickled_model": "model.pkl", "sklearn_version": "1.6.1", "serialization_format": "cloudpickle", "code": null}}}]

mlruns/588265755531758591/159a3dd3b1204e7fa3008a0eb85f5678/tags/mlflow.runName ADDED Viewed

	@@ -0,0 +1 @@


1	+ random_forest_2025-12-23 12:32:51.249903