Spaces:

NeerajCodz
/

creditCardFraudDetection

Sleeping

App Files Files Community

NeerajCodz commited on Dec 12, 2025

Commit

1309e50

1 Parent(s): 07d92b1

Fixed prefix fraud_

Browse files

Files changed (1) hide show

app.py +9 -7

app.py CHANGED Viewed

@@ -505,8 +505,11 @@ async def llm_analyse(payload: LLMAnalysePayload):
         # Convert to DataFrame
         df = pd.DataFrame(transactions)
-        # Convert fraud_score to percentage string
-        if 'fraud_score' in df.columns:
             def format_score(x):
                 try:
                     val = float(x) * 100  # multiply by 100
@@ -517,11 +520,11 @@ async def llm_analyse(payload: LLMAnalysePayload):
                 except:
                     return f"{x}%"  # fallback in case of unexpected value
-            df['fraud_score'] = df['fraud_score'].apply(format_score)
         # Convert DataFrame to CSV string
         csv_string = df.to_csv(index=False)
         # Craft more descriptive prompt
         prompt = f"""
 You are a senior fraud analyst. Analyze the following credit card transaction dataset in CSV format. Each transaction includes a fraud_score (as percentage, e.g., '94%'), STATUS, transaction details, merchant, amount, location, time, and other relevant features.
@@ -534,9 +537,8 @@ Instructions:
 1. Determine an **overall fraud risk score** (0-1 scale) reflecting the dataset’s general risk. Scale the score so that even a small number of high-risk transactions meaningfully increases the score. Mostly safe transactions should still be low, a few high-risk transactions should produce a moderate-to-high score, and many high-risk transactions should produce a higher score. Use narrative judgment to scale; do not state exact thresholds.
 2. Provide a detailed **insights** paragraph (150-200 words) describing patterns, anomalies, clusters, temporal or geographic trends, and merchant behaviors. Avoid listing exact counts or percentages.
 3. Provide a detailed **recommendation** paragraph (100-150 words) suggesting practical actions to mitigate risk, including monitoring, alerts, or investigation. Keep guidance non-prescriptive about individual transactions.
-4. Treat merchant names prefixed with "fraud_" as normal test data; do not interpret them as inherently suspicious.
-5. Output ONLY valid JSON in this format: {{"fraud_score": <float 0-1>, "insights": "<string insights paragraph>", "recommendation": "<string recommendation paragraph>"}}.
-6. Let the fraud_score scale more sharply: even a few high-risk transactions should noticeably increase the score, and more high-risk transactions should push it even higher, while mostly safe datasets remain near the bottom of the scale.
 Focus on narrative-style, descriptive analysis and make the fraud score percentages in the CSV the key reference points for your reasoning.
 """

         # Convert to DataFrame
         df = pd.DataFrame(transactions)
+        # Remove 'fraud_' from all column names
+        df.columns = [col.replace('fraud_', '') for col in df.columns]
+        # Convert 'score' (previously 'fraud_score') to percentage string if it exists
+        if 'score' in df.columns:
             def format_score(x):
                 try:
                     val = float(x) * 100  # multiply by 100
                 except:
                     return f"{x}%"  # fallback in case of unexpected value
+            df['score'] = df['score'].apply(format_score)
         # Convert DataFrame to CSV string
         csv_string = df.to_csv(index=False)
         # Craft more descriptive prompt
         prompt = f"""
 You are a senior fraud analyst. Analyze the following credit card transaction dataset in CSV format. Each transaction includes a fraud_score (as percentage, e.g., '94%'), STATUS, transaction details, merchant, amount, location, time, and other relevant features.
 1. Determine an **overall fraud risk score** (0-1 scale) reflecting the dataset’s general risk. Scale the score so that even a small number of high-risk transactions meaningfully increases the score. Mostly safe transactions should still be low, a few high-risk transactions should produce a moderate-to-high score, and many high-risk transactions should produce a higher score. Use narrative judgment to scale; do not state exact thresholds.
 2. Provide a detailed **insights** paragraph (150-200 words) describing patterns, anomalies, clusters, temporal or geographic trends, and merchant behaviors. Avoid listing exact counts or percentages.
 3. Provide a detailed **recommendation** paragraph (100-150 words) suggesting practical actions to mitigate risk, including monitoring, alerts, or investigation. Keep guidance non-prescriptive about individual transactions.
+4. Output ONLY valid JSON in this format: {{"fraud_score": <float 0-1>, "insights": "<string insights paragraph>", "recommendation": "<string recommendation paragraph>"}}.
+5. Let the fraud_score scale more sharply: even a few high-risk transactions should noticeably increase the score, and more high-risk transactions should push it even higher, while mostly safe datasets remain near the bottom of the scale.
 Focus on narrative-style, descriptive analysis and make the fraud score percentages in the CSV the key reference points for your reasoning.
 """