NeerajCodz commited on
Commit
1309e50
·
1 Parent(s): 07d92b1

Fixed prefix fraud_

Browse files
Files changed (1) hide show
  1. app.py +9 -7
app.py CHANGED
@@ -505,8 +505,11 @@ async def llm_analyse(payload: LLMAnalysePayload):
505
  # Convert to DataFrame
506
  df = pd.DataFrame(transactions)
507
 
508
- # Convert fraud_score to percentage string
509
- if 'fraud_score' in df.columns:
 
 
 
510
  def format_score(x):
511
  try:
512
  val = float(x) * 100 # multiply by 100
@@ -517,11 +520,11 @@ async def llm_analyse(payload: LLMAnalysePayload):
517
  except:
518
  return f"{x}%" # fallback in case of unexpected value
519
 
520
- df['fraud_score'] = df['fraud_score'].apply(format_score)
521
 
522
  # Convert DataFrame to CSV string
523
  csv_string = df.to_csv(index=False)
524
-
525
  # Craft more descriptive prompt
526
  prompt = f"""
527
  You are a senior fraud analyst. Analyze the following credit card transaction dataset in CSV format. Each transaction includes a fraud_score (as percentage, e.g., '94%'), STATUS, transaction details, merchant, amount, location, time, and other relevant features.
@@ -534,9 +537,8 @@ Instructions:
534
  1. Determine an **overall fraud risk score** (0-1 scale) reflecting the dataset’s general risk. Scale the score so that even a small number of high-risk transactions meaningfully increases the score. Mostly safe transactions should still be low, a few high-risk transactions should produce a moderate-to-high score, and many high-risk transactions should produce a higher score. Use narrative judgment to scale; do not state exact thresholds.
535
  2. Provide a detailed **insights** paragraph (150-200 words) describing patterns, anomalies, clusters, temporal or geographic trends, and merchant behaviors. Avoid listing exact counts or percentages.
536
  3. Provide a detailed **recommendation** paragraph (100-150 words) suggesting practical actions to mitigate risk, including monitoring, alerts, or investigation. Keep guidance non-prescriptive about individual transactions.
537
- 4. Treat merchant names prefixed with "fraud_" as normal test data; do not interpret them as inherently suspicious.
538
- 5. Output ONLY valid JSON in this format: {{"fraud_score": <float 0-1>, "insights": "<string insights paragraph>", "recommendation": "<string recommendation paragraph>"}}.
539
- 6. Let the fraud_score scale more sharply: even a few high-risk transactions should noticeably increase the score, and more high-risk transactions should push it even higher, while mostly safe datasets remain near the bottom of the scale.
540
 
541
  Focus on narrative-style, descriptive analysis and make the fraud score percentages in the CSV the key reference points for your reasoning.
542
  """
 
505
  # Convert to DataFrame
506
  df = pd.DataFrame(transactions)
507
 
508
+ # Remove 'fraud_' from all column names
509
+ df.columns = [col.replace('fraud_', '') for col in df.columns]
510
+
511
+ # Convert 'score' (previously 'fraud_score') to percentage string if it exists
512
+ if 'score' in df.columns:
513
  def format_score(x):
514
  try:
515
  val = float(x) * 100 # multiply by 100
 
520
  except:
521
  return f"{x}%" # fallback in case of unexpected value
522
 
523
+ df['score'] = df['score'].apply(format_score)
524
 
525
  # Convert DataFrame to CSV string
526
  csv_string = df.to_csv(index=False)
527
+
528
  # Craft more descriptive prompt
529
  prompt = f"""
530
  You are a senior fraud analyst. Analyze the following credit card transaction dataset in CSV format. Each transaction includes a fraud_score (as percentage, e.g., '94%'), STATUS, transaction details, merchant, amount, location, time, and other relevant features.
 
537
  1. Determine an **overall fraud risk score** (0-1 scale) reflecting the dataset’s general risk. Scale the score so that even a small number of high-risk transactions meaningfully increases the score. Mostly safe transactions should still be low, a few high-risk transactions should produce a moderate-to-high score, and many high-risk transactions should produce a higher score. Use narrative judgment to scale; do not state exact thresholds.
538
  2. Provide a detailed **insights** paragraph (150-200 words) describing patterns, anomalies, clusters, temporal or geographic trends, and merchant behaviors. Avoid listing exact counts or percentages.
539
  3. Provide a detailed **recommendation** paragraph (100-150 words) suggesting practical actions to mitigate risk, including monitoring, alerts, or investigation. Keep guidance non-prescriptive about individual transactions.
540
+ 4. Output ONLY valid JSON in this format: {{"fraud_score": <float 0-1>, "insights": "<string insights paragraph>", "recommendation": "<string recommendation paragraph>"}}.
541
+ 5. Let the fraud_score scale more sharply: even a few high-risk transactions should noticeably increase the score, and more high-risk transactions should push it even higher, while mostly safe datasets remain near the bottom of the scale.
 
542
 
543
  Focus on narrative-style, descriptive analysis and make the fraud score percentages in the CSV the key reference points for your reasoning.
544
  """