Spaces:
Sleeping
Sleeping
Commit
·
1309e50
1
Parent(s):
07d92b1
Fixed prefix fraud_
Browse files
app.py
CHANGED
|
@@ -505,8 +505,11 @@ async def llm_analyse(payload: LLMAnalysePayload):
|
|
| 505 |
# Convert to DataFrame
|
| 506 |
df = pd.DataFrame(transactions)
|
| 507 |
|
| 508 |
-
#
|
| 509 |
-
|
|
|
|
|
|
|
|
|
|
| 510 |
def format_score(x):
|
| 511 |
try:
|
| 512 |
val = float(x) * 100 # multiply by 100
|
|
@@ -517,11 +520,11 @@ async def llm_analyse(payload: LLMAnalysePayload):
|
|
| 517 |
except:
|
| 518 |
return f"{x}%" # fallback in case of unexpected value
|
| 519 |
|
| 520 |
-
df['
|
| 521 |
|
| 522 |
# Convert DataFrame to CSV string
|
| 523 |
csv_string = df.to_csv(index=False)
|
| 524 |
-
|
| 525 |
# Craft more descriptive prompt
|
| 526 |
prompt = f"""
|
| 527 |
You are a senior fraud analyst. Analyze the following credit card transaction dataset in CSV format. Each transaction includes a fraud_score (as percentage, e.g., '94%'), STATUS, transaction details, merchant, amount, location, time, and other relevant features.
|
|
@@ -534,9 +537,8 @@ Instructions:
|
|
| 534 |
1. Determine an **overall fraud risk score** (0-1 scale) reflecting the dataset’s general risk. Scale the score so that even a small number of high-risk transactions meaningfully increases the score. Mostly safe transactions should still be low, a few high-risk transactions should produce a moderate-to-high score, and many high-risk transactions should produce a higher score. Use narrative judgment to scale; do not state exact thresholds.
|
| 535 |
2. Provide a detailed **insights** paragraph (150-200 words) describing patterns, anomalies, clusters, temporal or geographic trends, and merchant behaviors. Avoid listing exact counts or percentages.
|
| 536 |
3. Provide a detailed **recommendation** paragraph (100-150 words) suggesting practical actions to mitigate risk, including monitoring, alerts, or investigation. Keep guidance non-prescriptive about individual transactions.
|
| 537 |
-
4.
|
| 538 |
-
5.
|
| 539 |
-
6. Let the fraud_score scale more sharply: even a few high-risk transactions should noticeably increase the score, and more high-risk transactions should push it even higher, while mostly safe datasets remain near the bottom of the scale.
|
| 540 |
|
| 541 |
Focus on narrative-style, descriptive analysis and make the fraud score percentages in the CSV the key reference points for your reasoning.
|
| 542 |
"""
|
|
|
|
| 505 |
# Convert to DataFrame
|
| 506 |
df = pd.DataFrame(transactions)
|
| 507 |
|
| 508 |
+
# Remove 'fraud_' from all column names
|
| 509 |
+
df.columns = [col.replace('fraud_', '') for col in df.columns]
|
| 510 |
+
|
| 511 |
+
# Convert 'score' (previously 'fraud_score') to percentage string if it exists
|
| 512 |
+
if 'score' in df.columns:
|
| 513 |
def format_score(x):
|
| 514 |
try:
|
| 515 |
val = float(x) * 100 # multiply by 100
|
|
|
|
| 520 |
except:
|
| 521 |
return f"{x}%" # fallback in case of unexpected value
|
| 522 |
|
| 523 |
+
df['score'] = df['score'].apply(format_score)
|
| 524 |
|
| 525 |
# Convert DataFrame to CSV string
|
| 526 |
csv_string = df.to_csv(index=False)
|
| 527 |
+
|
| 528 |
# Craft more descriptive prompt
|
| 529 |
prompt = f"""
|
| 530 |
You are a senior fraud analyst. Analyze the following credit card transaction dataset in CSV format. Each transaction includes a fraud_score (as percentage, e.g., '94%'), STATUS, transaction details, merchant, amount, location, time, and other relevant features.
|
|
|
|
| 537 |
1. Determine an **overall fraud risk score** (0-1 scale) reflecting the dataset’s general risk. Scale the score so that even a small number of high-risk transactions meaningfully increases the score. Mostly safe transactions should still be low, a few high-risk transactions should produce a moderate-to-high score, and many high-risk transactions should produce a higher score. Use narrative judgment to scale; do not state exact thresholds.
|
| 538 |
2. Provide a detailed **insights** paragraph (150-200 words) describing patterns, anomalies, clusters, temporal or geographic trends, and merchant behaviors. Avoid listing exact counts or percentages.
|
| 539 |
3. Provide a detailed **recommendation** paragraph (100-150 words) suggesting practical actions to mitigate risk, including monitoring, alerts, or investigation. Keep guidance non-prescriptive about individual transactions.
|
| 540 |
+
4. Output ONLY valid JSON in this format: {{"fraud_score": <float 0-1>, "insights": "<string insights paragraph>", "recommendation": "<string recommendation paragraph>"}}.
|
| 541 |
+
5. Let the fraud_score scale more sharply: even a few high-risk transactions should noticeably increase the score, and more high-risk transactions should push it even higher, while mostly safe datasets remain near the bottom of the scale.
|
|
|
|
| 542 |
|
| 543 |
Focus on narrative-style, descriptive analysis and make the fraud score percentages in the CSV the key reference points for your reasoning.
|
| 544 |
"""
|