Spaces:
Sleeping
Sleeping
Upload 7 files
Browse files- Agentic Honey-Pot for Scam Detection & Intelligence Extraction.md +19 -0
- Dockerfile +26 -0
- Implementation and Deployment Guide for Agentic Honey-Pot.md +96 -0
- agent.py +329 -0
- app.py +155 -0
- models.py +62 -0
- requirements.txt +16 -0
Agentic Honey-Pot for Scam Detection & Intelligence Extraction.md
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Agentic Honey-Pot for Scam Detection & Intelligence Extraction
|
| 2 |
+
|
| 3 |
+
This project implements the solution for Problem Statement 2: **Agentic Honey-Pot for Scam Detection & Intelligence Extraction**.
|
| 4 |
+
|
| 5 |
+
## Technology Stack
|
| 6 |
+
* **Agentic Framework:** [LangGraph](https://langchain-ai.github.io/langgraph/tutorials/introduction/) for stateful, cyclical conversation management.
|
| 7 |
+
* **LLM:** [Qwen 2.5 3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) (Open-Source, optimized for resource-constrained deployment).
|
| 8 |
+
* **Backend:** [FastAPI](https://fastapi.tiangolo.com/) for the low-latency REST API.
|
| 9 |
+
* **Deployment:** Hugging Face Space (Self-hosted on Free Tier).
|
| 10 |
+
|
| 11 |
+
## API Endpoint
|
| 12 |
+
The main endpoint for the honeypot is:
|
| 13 |
+
`POST /api/honeypot-detection`
|
| 14 |
+
|
| 15 |
+
## Authentication
|
| 16 |
+
The API requires an `x-api-key` header for authentication. The key is set via a Space Secret.
|
| 17 |
+
|
| 18 |
+
## Development Notes
|
| 19 |
+
The core logic is implemented in `agent.py` using LangGraph to manage the multi-turn conversation state. The model is loaded with 4-bit quantization (`bitsandbytes`) for efficient use of the free-tier hardware.
|
Dockerfile
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Use a base image with Python and CUDA for GPU support (recommended for Qwen 2.5 3B)
|
| 2 |
+
# This image includes Python, CUDA, and common ML libraries
|
| 3 |
+
FROM nvcr.io/nvidia/pytorch:24.01-py3
|
| 4 |
+
|
| 5 |
+
# Set environment variables
|
| 6 |
+
ENV PYTHONUNBUFFERED=1
|
| 7 |
+
# Hugging Face Spaces uses port 7860 by default for web applications
|
| 8 |
+
ENV PORT=7860
|
| 9 |
+
|
| 10 |
+
# Set working directory
|
| 11 |
+
WORKDIR /app
|
| 12 |
+
|
| 13 |
+
# Copy requirements and install Python dependencies
|
| 14 |
+
# We use --no-cache-dir to keep the image size small
|
| 15 |
+
COPY requirements.txt .
|
| 16 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 17 |
+
|
| 18 |
+
# Copy application code
|
| 19 |
+
COPY . .
|
| 20 |
+
|
| 21 |
+
# Expose the port
|
| 22 |
+
EXPOSE 7860
|
| 23 |
+
|
| 24 |
+
# Command to run the application (matches the Procfile logic)
|
| 25 |
+
# We use the explicit port 7860 as required by HF Spaces
|
| 26 |
+
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
Implementation and Deployment Guide for Agentic Honey-Pot.md
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation and Deployment Guide for Agentic Honey-Pot
|
| 2 |
+
|
| 3 |
+
This guide provides instructions for setting up, running, and deploying the provided Python codebase for the Agentic Honey-Pot solution.
|
| 4 |
+
|
| 5 |
+
## 1. Code Structure
|
| 6 |
+
|
| 7 |
+
The solution is modularized into the following files:
|
| 8 |
+
|
| 9 |
+
| File | Purpose |
|
| 10 |
+
| :--- | :--- |
|
| 11 |
+
| `requirements.txt` | Lists all necessary Python dependencies (LangGraph, FastAPI, Qwen model dependencies). |
|
| 12 |
+
| `models.py` | Contains all Pydantic schemas for API input/output, LangGraph state, and structured intelligence extraction. |
|
| 13 |
+
| `agent.py` | Contains the core **LangGraph** state machine logic, the **Qwen 2.5 3B-Instruct** model loading, and the node functions (`detect_scam`, `agent_persona_response`, `extract_intelligence`, `final_callback`). |
|
| 14 |
+
| `app.py` | The **FastAPI** application that exposes the `/api/honeypot-detection` endpoint and integrates with the LangGraph agent. |
|
| 15 |
+
| `Procfile` | Configuration file for the Hugging Face Space to run the FastAPI application using Uvicorn. (Only needed if not using Dockerfile) |
|
| 16 |
+
| `README.md` | A brief description for the Hugging Face Space repository. |
|
| 17 |
+
| `Dockerfile` | Defines the environment and dependencies for a robust Docker-based deployment on Hugging Face Spaces. |},{find:
|
| 18 |
+
|
| 19 |
+
## 2. Local Setup and Testing
|
| 20 |
+
|
| 21 |
+
### Step 2.1: Setup Environment
|
| 22 |
+
|
| 23 |
+
1. **Install Dependencies:**
|
| 24 |
+
```bash
|
| 25 |
+
pip install -r requirements.txt
|
| 26 |
+
```
|
| 27 |
+
*Note: The `bitsandbytes` library requires a compatible CUDA setup for GPU usage. If running on CPU, you may need to adjust the model loading in `agent.py` to remove the quantization configuration.*
|
| 28 |
+
|
| 29 |
+
2. **Set Environment Variables:**
|
| 30 |
+
The `app.py` and `agent.py` files rely on an environment variable for the API key.
|
| 31 |
+
```bash
|
| 32 |
+
export HONEYPOT_API_KEY="YOUR_SECRET_API_KEY_FOR_AUTH"
|
| 33 |
+
```
|
| 34 |
+
*Note: Replace the placeholder with your actual key.*
|
| 35 |
+
|
| 36 |
+
### Step 2.2: Run Locally
|
| 37 |
+
|
| 38 |
+
1. **Start the FastAPI Server:**
|
| 39 |
+
```bash
|
| 40 |
+
uvicorn app:app --host 0.0.0.0 --port 8000
|
| 41 |
+
```
|
| 42 |
+
2. **Test the Endpoint:**
|
| 43 |
+
Use a tool like `curl` or Postman to send a request to `http://localhost:8000/api/honeypot-detection`.
|
| 44 |
+
|
| 45 |
+
**Example cURL Request (Initial Message):**
|
| 46 |
+
```bash
|
| 47 |
+
curl -X POST "http://localhost:8000/api/honeypot-detection" \
|
| 48 |
+
-H "accept: application/json" \
|
| 49 |
+
-H "x-api-key: YOUR_SECRET_API_KEY_FOR_AUTH" \
|
| 50 |
+
-H "Content-Type: application/json" \
|
| 51 |
+
-d '{
|
| 52 |
+
"sessionId": "test-session-123",
|
| 53 |
+
"message": {
|
| 54 |
+
"sender": "scammer",
|
| 55 |
+
"text": "Your account is blocked. Click this link immediately: http://malicious-link.example",
|
| 56 |
+
"timestamp": "2026-01-28T10:00:00Z"
|
| 57 |
+
},
|
| 58 |
+
"conversationHistory": [],
|
| 59 |
+
"metadata": {
|
| 60 |
+
"channel": "SMS",
|
| 61 |
+
"language": "English",
|
| 62 |
+
"locale": "IN"
|
| 63 |
+
}
|
| 64 |
+
}'
|
| 65 |
+
```
|
| 66 |
+
*Send subsequent messages by including the previous conversation in the `conversationHistory` field.*
|
| 67 |
+
|
| 68 |
+
## 3. Deployment on Hugging Face Space (Recommended Strategy)
|
| 69 |
+
|
| 70 |
+
This strategy bypasses the severe limitations of the Hugging Face Inference API free tier by self-hosting the model.
|
| 71 |
+
|
| 72 |
+
### Step 3.1: Create a Hugging Face Space
|
| 73 |
+
|
| 74 |
+
1. Go to [Hugging Face Spaces](https://huggingface.co/spaces).
|
| 75 |
+
2. Click **"Create new Space"**.
|
| 76 |
+
3. **Name:** Choose a name (e.g., `my-honeypot-agent`).
|
| 77 |
+
4. **License:** Select a license.
|
| 78 |
+
5. **Space SDK:** Select **`Docker`** for maximum control over the environment, or **`Gradio`** if you want a simple UI for monitoring. *For a pure API, Docker is the most robust choice.*
|
| 79 |
+
6. **Hardware:** Select the **Free CPU** or **Free T4 Medium GPU** (if available). **T4 GPU is highly recommended for better latency.**
|
| 80 |
+
|
| 81 |
+
### Step 3.2: Configure Environment and Upload Files
|
| 82 |
+
|
| 83 |
+
1. **Set Secrets:** In your Space settings, go to **"Secrets"** and add the following:
|
| 84 |
+
* **Name:** `HONEYPOT_API_KEY`
|
| 85 |
+
* **Value:** `YOUR_SECRET_API_KEY_FOR_AUTH` (This is the key your API will validate against).
|
| 86 |
+
|
| 87 |
+
2. **Upload Code:** Upload all the provided files (`requirements.txt`, `models.py`, `agent.py`, `app.py`, `Procfile`, `README.md`) to your Space repository.
|
| 88 |
+
|
| 89 |
+
3. **Upload Dockerfile:** The provided `Dockerfile` is optimized for a GPU-enabled environment (recommended for the Qwen 2.5 3B model). Ensure this file is uploaded to the root of your Space repository.
|
| 90 |
+
|
| 91 |
+
### Step 3.3: Final Testing
|
| 92 |
+
|
| 93 |
+
1. Once the Space builds successfully, the public URL will be your API base URL (e.g., `https://[your-user]-[your-space].hf.space`).
|
| 94 |
+
2. Test the live endpoint using the cURL command from Step 2.2, replacing `http://localhost:8000` with your Space URL.
|
| 95 |
+
|
| 96 |
+
This setup ensures your agent is running in a dedicated, free environment, providing the stability and performance required for the competition.
|
agent.py
ADDED
|
@@ -0,0 +1,329 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import json
|
| 3 |
+
import requests
|
| 4 |
+
import torch
|
| 5 |
+
from typing import List, Dict, Any
|
| 6 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
| 7 |
+
from langgraph.graph import StateGraph, END, START
|
| 8 |
+
from langgraph.checkpoint.base import BaseCheckpointSaver
|
| 9 |
+
from langgraph.checkpoint.memory import MemorySaver
|
| 10 |
+
from pydantic import ValidationError
|
| 11 |
+
|
| 12 |
+
from models import AgentState, Message, ExtractedIntelligence, ScamClassification
|
| 13 |
+
|
| 14 |
+
# --- Configuration ---
|
| 15 |
+
MODEL_ID = "Qwen/Qwen2.5-3B-Instruct"
|
| 16 |
+
# Placeholder for the final evaluation endpoint
|
| 17 |
+
CALLBACK_URL = "https://hackathon.guvi.in/api/updateHoneyPotFinalResult"
|
| 18 |
+
# Placeholder for the honeypot's own API key (for the callback)
|
| 19 |
+
HONEYPOT_API_KEY = os.environ.get("HONEYPOT_API_KEY", "YOUR_SECRET_API_KEY_FOR_CALLBACK")
|
| 20 |
+
|
| 21 |
+
# --- Model Initialization (Singleton Pattern) ---
|
| 22 |
+
|
| 23 |
+
class ModelLoader:
|
| 24 |
+
"""Handles loading the Qwen 2.5 3B model with quantization."""
|
| 25 |
+
_model = None
|
| 26 |
+
_tokenizer = None
|
| 27 |
+
|
| 28 |
+
@classmethod
|
| 29 |
+
def get_model_and_tokenizer(cls):
|
| 30 |
+
if cls._model is None or cls._tokenizer is None:
|
| 31 |
+
print(f"Loading model {MODEL_ID}...")
|
| 32 |
+
# 4-bit quantization for memory efficiency on small GPUs/CPUs
|
| 33 |
+
bnb_config = BitsAndBytesConfig(
|
| 34 |
+
load_in_4bit=True,
|
| 35 |
+
bnb_4bit_quant_type="nf4",
|
| 36 |
+
bnb_4bit_compute_dtype=torch.bfloat16
|
| 37 |
+
)
|
| 38 |
+
|
| 39 |
+
cls._tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
|
| 40 |
+
cls._model = AutoModelForCausalLM.from_pretrained(
|
| 41 |
+
MODEL_ID,
|
| 42 |
+
quantization_config=bnb_config,
|
| 43 |
+
device_map="auto" # Use auto to place model parts efficiently
|
| 44 |
+
)
|
| 45 |
+
print("Model loaded successfully.")
|
| 46 |
+
return cls._model, cls._tokenizer
|
| 47 |
+
|
| 48 |
+
# --- LangGraph Nodes (Functions) ---
|
| 49 |
+
|
| 50 |
+
def _invoke_llm(messages: List[Dict[str, str]], system_prompt: str, json_schema: Optional[Dict[str, Any]] = None) -> str:
|
| 51 |
+
"""Helper function to invoke the Qwen model."""
|
| 52 |
+
model, tokenizer = ModelLoader.get_model_and_tokenizer()
|
| 53 |
+
|
| 54 |
+
# Construct the full conversation history including the system prompt
|
| 55 |
+
full_messages = [{"role": "system", "content": system_prompt}] + messages
|
| 56 |
+
|
| 57 |
+
# Add instruction for JSON output if a schema is provided
|
| 58 |
+
if json_schema:
|
| 59 |
+
full_messages.append({"role": "user", "content": f"Please output the result as a JSON object that strictly conforms to the following schema: {json.dumps(json_schema)}"})
|
| 60 |
+
|
| 61 |
+
# Apply chat template and tokenize
|
| 62 |
+
input_ids = tokenizer.apply_chat_template(
|
| 63 |
+
full_messages,
|
| 64 |
+
return_tensors="pt",
|
| 65 |
+
add_generation_prompt=True
|
| 66 |
+
).to(model.device)
|
| 67 |
+
|
| 68 |
+
# Generate response
|
| 69 |
+
with torch.no_grad():
|
| 70 |
+
output_ids = model.generate(
|
| 71 |
+
input_ids,
|
| 72 |
+
max_new_tokens=512,
|
| 73 |
+
do_sample=True,
|
| 74 |
+
temperature=0.7,
|
| 75 |
+
pad_token_id=tokenizer.eos_token_id
|
| 76 |
+
)
|
| 77 |
+
|
| 78 |
+
# Decode and clean up the response
|
| 79 |
+
response = tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
|
| 80 |
+
|
| 81 |
+
# Simple JSON extraction (often required for open-source models)
|
| 82 |
+
if json_schema:
|
| 83 |
+
try:
|
| 84 |
+
# Attempt to find and parse the JSON block
|
| 85 |
+
start = response.find('{')
|
| 86 |
+
end = response.rfind('}') + 1
|
| 87 |
+
json_str = response[start:end]
|
| 88 |
+
return json_str
|
| 89 |
+
except:
|
| 90 |
+
# Return raw response if parsing fails, let the caller handle it
|
| 91 |
+
return response
|
| 92 |
+
|
| 93 |
+
return response
|
| 94 |
+
|
| 95 |
+
def detect_scam(state: AgentState) -> AgentState:
|
| 96 |
+
"""Node 1: Detects scam intent from the initial message."""
|
| 97 |
+
latest_message = state["conversationHistory"][-1]
|
| 98 |
+
|
| 99 |
+
system_prompt = (
|
| 100 |
+
"You are an expert scam detection system. Analyze the user's message and determine "
|
| 101 |
+
"if it contains clear scam or fraudulent intent (e.g., bank fraud, phishing, urgent account block). "
|
| 102 |
+
"Your output MUST be a JSON object conforming to the ScamClassification schema."
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
+
messages = [{"role": "user", "content": latest_message.text}]
|
| 106 |
+
|
| 107 |
+
try:
|
| 108 |
+
json_str = _invoke_llm(messages, system_prompt, ScamClassification.model_json_schema())
|
| 109 |
+
classification = ScamClassification.model_validate_json(json_str)
|
| 110 |
+
|
| 111 |
+
state["scamDetected"] = classification.is_scam
|
| 112 |
+
state["agentNotes"] += f"Initial Detection: {classification.reason}. "
|
| 113 |
+
print(f"Scam Detected: {classification.is_scam}")
|
| 114 |
+
|
| 115 |
+
except (json.JSONDecodeError, ValidationError) as e:
|
| 116 |
+
print(f"Detection failed, defaulting to False: {e}")
|
| 117 |
+
state["scamDetected"] = False
|
| 118 |
+
state["agentNotes"] += "Detection failed due to parsing error. "
|
| 119 |
+
|
| 120 |
+
return state
|
| 121 |
+
|
| 122 |
+
def agent_persona_response(state: AgentState) -> AgentState:
|
| 123 |
+
"""Node 2: Generates a human-like response to engage the scammer."""
|
| 124 |
+
if not state["scamDetected"]:
|
| 125 |
+
# Should not happen in the main flow, but as a safeguard
|
| 126 |
+
state["agent_response_text"] = "Conversation ended by system."
|
| 127 |
+
state["should_continue_engagement"] = False
|
| 128 |
+
return state
|
| 129 |
+
|
| 130 |
+
# Construct the full conversation history for the model
|
| 131 |
+
history_for_llm = [m.model_dump() for m in state["conversationHistory"]]
|
| 132 |
+
|
| 133 |
+
system_prompt = (
|
| 134 |
+
"You are a human-like honeypot agent. Your goal is to engage the 'scammer' "
|
| 135 |
+
"in a believable, multi-turn conversation to extract intelligence (e.g., links, IDs, numbers) "
|
| 136 |
+
"without revealing that you are an AI or that you suspect a scam. "
|
| 137 |
+
"Maintain a slightly confused, cautious, but engaged persona. "
|
| 138 |
+
"Your response must be ONLY the text of the message to send back to the scammer."
|
| 139 |
+
)
|
| 140 |
+
|
| 141 |
+
# The last message in history_for_llm is the scammer's latest message
|
| 142 |
+
messages = history_for_llm
|
| 143 |
+
|
| 144 |
+
response_text = _invoke_llm(messages, system_prompt)
|
| 145 |
+
|
| 146 |
+
# Update state with the agent's response
|
| 147 |
+
agent_message = Message(
|
| 148 |
+
sender="user", # The honeypot agent is acting as the 'user'
|
| 149 |
+
text=response_text,
|
| 150 |
+
timestamp=state["conversationHistory"][-1].timestamp # Placeholder, should be current time
|
| 151 |
+
)
|
| 152 |
+
state["conversationHistory"].append(agent_message)
|
| 153 |
+
state["agent_response_text"] = response_text
|
| 154 |
+
state["totalMessagesExchanged"] += 1
|
| 155 |
+
|
| 156 |
+
# Simple heuristic to decide if engagement should continue (e.g., if the agent is satisfied)
|
| 157 |
+
# In a real system, this would be a separate node or a more complex heuristic.
|
| 158 |
+
state["should_continue_engagement"] = True
|
| 159 |
+
|
| 160 |
+
return state
|
| 161 |
+
|
| 162 |
+
def extract_intelligence(state: AgentState) -> AgentState:
|
| 163 |
+
"""Node 3: Extracts structured intelligence from the full conversation history."""
|
| 164 |
+
|
| 165 |
+
# Combine all messages into a single text block for the model to analyze
|
| 166 |
+
full_transcript = "\n".join([f"{m.sender}: {m.text}" for m in state["conversationHistory"]])
|
| 167 |
+
|
| 168 |
+
system_prompt = (
|
| 169 |
+
"You are an intelligence extraction specialist. Analyze the following conversation transcript "
|
| 170 |
+
"between a 'scammer' and a 'user' (honeypot agent). "
|
| 171 |
+
"Extract all relevant intelligence (bank accounts, UPI IDs, links, phone numbers, keywords) "
|
| 172 |
+
"mentioned by the 'scammer'. Your output MUST be a JSON object conforming to the ExtractedIntelligence schema. "
|
| 173 |
+
"If no item is found for a field, use an empty list."
|
| 174 |
+
)
|
| 175 |
+
|
| 176 |
+
messages = [{"role": "user", "content": f"Transcript:\n{full_transcript}"}]
|
| 177 |
+
|
| 178 |
+
try:
|
| 179 |
+
json_str = _invoke_llm(messages, system_prompt, ExtractedIntelligence.model_json_schema())
|
| 180 |
+
extracted_data = ExtractedIntelligence.model_validate_json(json_str)
|
| 181 |
+
|
| 182 |
+
# Merge new intelligence with existing intelligence (if any)
|
| 183 |
+
current_data = state["extractedIntelligence"].model_dump()
|
| 184 |
+
new_data = extracted_data.model_dump()
|
| 185 |
+
|
| 186 |
+
for key in current_data:
|
| 187 |
+
current_data[key] = list(set(current_data[key] + new_data[key]))
|
| 188 |
+
|
| 189 |
+
state["extractedIntelligence"] = ExtractedIntelligence.model_validate(current_data)
|
| 190 |
+
state["agentNotes"] += f"Intelligence updated. "
|
| 191 |
+
|
| 192 |
+
except (json.JSONDecodeError, ValidationError) as e:
|
| 193 |
+
print(f"Intelligence extraction failed: {e}")
|
| 194 |
+
state["agentNotes"] += "Intelligence extraction failed due to parsing error. "
|
| 195 |
+
|
| 196 |
+
return state
|
| 197 |
+
|
| 198 |
+
def final_callback(state: AgentState) -> AgentState:
|
| 199 |
+
"""Node 4: Sends the mandatory final result callback to the evaluation endpoint."""
|
| 200 |
+
|
| 201 |
+
if not state["scamDetected"]:
|
| 202 |
+
print("Callback skipped: Scam not detected.")
|
| 203 |
+
return state
|
| 204 |
+
|
| 205 |
+
payload = {
|
| 206 |
+
"sessionId": state["sessionId"],
|
| 207 |
+
"scamDetected": state["scamDetected"],
|
| 208 |
+
"totalMessagesExchanged": state["totalMessagesExchanged"],
|
| 209 |
+
"extractedIntelligence": state["extractedIntelligence"].model_dump(),
|
| 210 |
+
"agentNotes": state["agentNotes"]
|
| 211 |
+
}
|
| 212 |
+
|
| 213 |
+
headers = {
|
| 214 |
+
"Content-Type": "application/json",
|
| 215 |
+
"x-api-key": HONEYPOT_API_KEY # Use the honeypot's own API key for the callback
|
| 216 |
+
}
|
| 217 |
+
|
| 218 |
+
try:
|
| 219 |
+
response = requests.post(CALLBACK_URL, json=payload, headers=headers, timeout=10)
|
| 220 |
+
response.raise_for_status()
|
| 221 |
+
print(f"Final callback successful. Status: {response.status_code}")
|
| 222 |
+
state["agentNotes"] += "Final callback sent successfully. "
|
| 223 |
+
except requests.exceptions.RequestException as e:
|
| 224 |
+
print(f"Final callback failed: {e}")
|
| 225 |
+
state["agentNotes"] += f"Final callback failed: {e}. "
|
| 226 |
+
|
| 227 |
+
return state
|
| 228 |
+
|
| 229 |
+
# --- Graph Definition ---
|
| 230 |
+
|
| 231 |
+
def create_honeypot_graph(checkpoint_saver: BaseCheckpointSaver):
|
| 232 |
+
"""Defines and compiles the LangGraph state machine."""
|
| 233 |
+
|
| 234 |
+
workflow = StateGraph(AgentState)
|
| 235 |
+
|
| 236 |
+
# Add nodes
|
| 237 |
+
workflow.add_node("detect_scam", detect_scam)
|
| 238 |
+
workflow.add_node("agent_persona_response", agent_persona_response)
|
| 239 |
+
workflow.add_node("extract_intelligence", extract_intelligence)
|
| 240 |
+
workflow.add_node("final_callback", final_callback)
|
| 241 |
+
|
| 242 |
+
# Define the entry point
|
| 243 |
+
workflow.add_edge(START, "detect_scam")
|
| 244 |
+
|
| 245 |
+
# Conditional edge after scam detection
|
| 246 |
+
def should_continue(state: AgentState) -> str:
|
| 247 |
+
if state["scamDetected"]:
|
| 248 |
+
return "extract_intelligence"
|
| 249 |
+
else:
|
| 250 |
+
return END
|
| 251 |
+
|
| 252 |
+
workflow.add_conditional_edges("detect_scam", should_continue)
|
| 253 |
+
|
| 254 |
+
# Main loop: Extract -> Respond -> (Wait for next message)
|
| 255 |
+
# The loop is broken by the external API call (the next message)
|
| 256 |
+
# For a single API call, we just extract and respond.
|
| 257 |
+
workflow.add_edge("extract_intelligence", "agent_persona_response")
|
| 258 |
+
|
| 259 |
+
# After the agent responds, the process ends, waiting for the next API call
|
| 260 |
+
# which will restart the graph from the checkpoint.
|
| 261 |
+
workflow.add_edge("agent_persona_response", END)
|
| 262 |
+
|
| 263 |
+
# The final callback is assumed to be a separate, manual trigger
|
| 264 |
+
# or a final step after a predetermined number of turns.
|
| 265 |
+
# For this implementation, we will assume the final callback is triggered
|
| 266 |
+
# by a separate endpoint or a flag in the state.
|
| 267 |
+
|
| 268 |
+
# Compile the graph
|
| 269 |
+
app = workflow.compile(checkpointer=checkpoint_saver)
|
| 270 |
+
return app
|
| 271 |
+
|
| 272 |
+
# Initialize the graph with a memory saver for local testing
|
| 273 |
+
# In a real deployment, a database checkpointer (e.g., SQLite, Postgres) would be used.
|
| 274 |
+
memory_saver = MemorySaver()
|
| 275 |
+
honeypot_app = create_honeypot_graph(memory_saver)
|
| 276 |
+
|
| 277 |
+
# Optional: Run a test flow locally
|
| 278 |
+
if __name__ == "__main__":
|
| 279 |
+
# Initialize the state for a new conversation
|
| 280 |
+
initial_state = AgentState(
|
| 281 |
+
sessionId="test-session-123",
|
| 282 |
+
conversationHistory=[
|
| 283 |
+
Message(
|
| 284 |
+
sender="scammer",
|
| 285 |
+
text="Your bank account will be blocked today. Verify immediately by clicking this link: http://malicious-link.example",
|
| 286 |
+
timestamp="2026-01-28T10:00:00Z"
|
| 287 |
+
)
|
| 288 |
+
],
|
| 289 |
+
scamDetected=False,
|
| 290 |
+
extractedIntelligence=ExtractedIntelligence(),
|
| 291 |
+
agentNotes="",
|
| 292 |
+
totalMessagesExchanged=1,
|
| 293 |
+
should_continue_engagement=False,
|
| 294 |
+
agent_response_text=""
|
| 295 |
+
)
|
| 296 |
+
|
| 297 |
+
# Run the first turn
|
| 298 |
+
print("--- Running Turn 1 (Detection, Extraction, Response) ---")
|
| 299 |
+
final_state = honeypot_app.invoke(initial_state, config={"configurable": {"thread_id": "test-session-123"}})
|
| 300 |
+
|
| 301 |
+
print("\n--- Final State After Turn 1 ---")
|
| 302 |
+
print(f"Scam Detected: {final_state['scamDetected']}")
|
| 303 |
+
print(f"Agent Response: {final_state['agent_response_text']}")
|
| 304 |
+
print(f"Intelligence: {final_state['extractedIntelligence'].model_dump()}")
|
| 305 |
+
|
| 306 |
+
# Simulate the next incoming message from the scammer
|
| 307 |
+
next_scammer_message = Message(
|
| 308 |
+
sender="scammer",
|
| 309 |
+
text="Why are you asking so many questions? Just give me your UPI ID now or I will block your account permanently.",
|
| 310 |
+
timestamp="2026-01-28T10:05:00Z"
|
| 311 |
+
)
|
| 312 |
+
|
| 313 |
+
# Load the previous state and add the new message
|
| 314 |
+
final_state["conversationHistory"].append(next_scammer_message)
|
| 315 |
+
final_state["totalMessagesExchanged"] += 1
|
| 316 |
+
|
| 317 |
+
# Run the second turn (LangGraph will load the checkpoint and continue)
|
| 318 |
+
print("\n--- Running Turn 2 (Extraction, Response) ---")
|
| 319 |
+
final_state_2 = honeypot_app.invoke(final_state, config={"configurable": {"thread_id": "test-session-123"}})
|
| 320 |
+
|
| 321 |
+
print("\n--- Final State After Turn 2 ---")
|
| 322 |
+
print(f"Agent Response: {final_state_2['agent_response_text']}")
|
| 323 |
+
print(f"Intelligence: {final_state_2['extractedIntelligence'].model_dump()}")
|
| 324 |
+
|
| 325 |
+
# Manually trigger the final callback (simulating the end of engagement)
|
| 326 |
+
print("\n--- Triggering Final Callback ---")
|
| 327 |
+
# Note: The final callback logic needs to be integrated into the API structure
|
| 328 |
+
# or triggered by a separate endpoint/condition. For this example, we call the function directly.
|
| 329 |
+
final_callback(final_state_2)
|
app.py
ADDED
|
@@ -0,0 +1,155 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import time
|
| 3 |
+
from fastapi import FastAPI, Request, HTTPException, Depends, status
|
| 4 |
+
from fastapi.security import APIKeyHeader
|
| 5 |
+
from typing import Dict, Any
|
| 6 |
+
from datetime import datetime
|
| 7 |
+
|
| 8 |
+
# LangGraph and Model Imports
|
| 9 |
+
from langgraph.checkpoint.memory import MemorySaver
|
| 10 |
+
from langgraph.checkpoint.base import BaseCheckpointSaver
|
| 11 |
+
from agent import create_honeypot_graph, final_callback
|
| 12 |
+
from models import HoneypotRequest, HoneypotResponse, AgentState, ExtractedIntelligence, Message
|
| 13 |
+
|
| 14 |
+
# --- Configuration ---
|
| 15 |
+
API_KEY_NAME = "x-api-key"
|
| 16 |
+
API_KEY = os.environ.get("HONEYPOT_API_KEY", "sk_test_123456789") # Default for local testing
|
| 17 |
+
api_key_header = APIKeyHeader(name=API_KEY_NAME, auto_error=False)
|
| 18 |
+
|
| 19 |
+
# --- Initialization ---
|
| 20 |
+
app = FastAPI(
|
| 21 |
+
title="Agentic Honey-Pot API",
|
| 22 |
+
description="REST API for Scam Detection and Intelligence Extraction using LangGraph and Qwen 2.5 3B.",
|
| 23 |
+
version="1.0.0"
|
| 24 |
+
)
|
| 25 |
+
|
| 26 |
+
# Initialize LangGraph Checkpointer (Use MemorySaver for simplicity, replace with a database for production)
|
| 27 |
+
# For Hugging Face Spaces, a persistent volume or database is recommended for checkpointer.
|
| 28 |
+
# For this example, we use MemorySaver, which will reset on Space restart.
|
| 29 |
+
checkpointer: BaseCheckpointSaver = MemorySaver()
|
| 30 |
+
honeypot_app = create_honeypot_graph(checkpointer)
|
| 31 |
+
|
| 32 |
+
# --- Dependency for API Key Validation ---
|
| 33 |
+
|
| 34 |
+
async def get_api_key(api_key_header: str = Depends(api_key_header)):
|
| 35 |
+
if api_key_header is None or api_key_header != API_KEY:
|
| 36 |
+
raise HTTPException(
|
| 37 |
+
status_code=status.HTTP_401_UNAUTHORIZED,
|
| 38 |
+
detail="Invalid API Key or missing 'x-api-key' header.",
|
| 39 |
+
)
|
| 40 |
+
return api_key_header
|
| 41 |
+
|
| 42 |
+
# --- API Endpoints ---
|
| 43 |
+
|
| 44 |
+
@app.post("/api/honeypot-detection", response_model=HoneypotResponse)
|
| 45 |
+
async def honeypot_detection(
|
| 46 |
+
request_data: HoneypotRequest,
|
| 47 |
+
api_key: str = Depends(get_api_key)
|
| 48 |
+
) -> Dict[str, Any]:
|
| 49 |
+
"""
|
| 50 |
+
Accepts an incoming message event, runs the LangGraph agent, and returns the response.
|
| 51 |
+
"""
|
| 52 |
+
session_id = request_data.sessionId
|
| 53 |
+
|
| 54 |
+
# 1. Load or Initialize State
|
| 55 |
+
# LangGraph uses the thread_id for checkpointing
|
| 56 |
+
config = {"configurable": {"thread_id": session_id}}
|
| 57 |
+
|
| 58 |
+
# Check if a checkpoint exists for this session
|
| 59 |
+
checkpoint = checkpointer.get_state(config)
|
| 60 |
+
|
| 61 |
+
if checkpoint:
|
| 62 |
+
# Load existing state and append the new message
|
| 63 |
+
current_state_dict = checkpoint.dict["values"]
|
| 64 |
+
# Convert the dictionary back to the TypedDict structure
|
| 65 |
+
current_state = AgentState(**current_state_dict)
|
| 66 |
+
|
| 67 |
+
# Append the new message from the scammer
|
| 68 |
+
current_state["conversationHistory"].append(request_data.message)
|
| 69 |
+
current_state["totalMessagesExchanged"] += 1
|
| 70 |
+
|
| 71 |
+
# The LangGraph will continue from the last node
|
| 72 |
+
input_state = current_state
|
| 73 |
+
start_time = time.time()
|
| 74 |
+
|
| 75 |
+
else:
|
| 76 |
+
# New conversation: Initialize the state
|
| 77 |
+
initial_history = request_data.conversationHistory + [request_data.message]
|
| 78 |
+
|
| 79 |
+
input_state = AgentState(
|
| 80 |
+
sessionId=session_id,
|
| 81 |
+
conversationHistory=initial_history,
|
| 82 |
+
scamDetected=False,
|
| 83 |
+
extractedIntelligence=ExtractedIntelligence(),
|
| 84 |
+
agentNotes="New session started. ",
|
| 85 |
+
totalMessagesExchanged=len(initial_history),
|
| 86 |
+
should_continue_engagement=False,
|
| 87 |
+
agent_response_text=""
|
| 88 |
+
)
|
| 89 |
+
start_time = time.time()
|
| 90 |
+
|
| 91 |
+
# 2. Invoke LangGraph
|
| 92 |
+
try:
|
| 93 |
+
# Invoke the graph with the updated state
|
| 94 |
+
final_state_dict = honeypot_app.invoke(input_state, config=config)
|
| 95 |
+
final_state = AgentState(**final_state_dict)
|
| 96 |
+
|
| 97 |
+
end_time = time.time()
|
| 98 |
+
engagement_duration = end_time - start_time
|
| 99 |
+
|
| 100 |
+
# 3. Prepare API Response
|
| 101 |
+
response_data = {
|
| 102 |
+
"status": "success",
|
| 103 |
+
"scamDetected": final_state["scamDetected"],
|
| 104 |
+
"engagementMetrics": {
|
| 105 |
+
"engagementDurationSeconds": round(engagement_duration, 2),
|
| 106 |
+
"totalMessagesExchanged": final_state["totalMessagesExchanged"]
|
| 107 |
+
},
|
| 108 |
+
"extractedIntelligence": final_state["extractedIntelligence"].model_dump(),
|
| 109 |
+
"agentNotes": final_state["agentNotes"]
|
| 110 |
+
}
|
| 111 |
+
|
| 112 |
+
# 4. Check for Final Callback Condition (Example Heuristic)
|
| 113 |
+
# In a real system, the agent would decide when to end the engagement.
|
| 114 |
+
# For this example, we'll assume a simple condition (e.g., max turns or specific intelligence extracted)
|
| 115 |
+
# If the agent decides to end the engagement, we trigger the final callback here.
|
| 116 |
+
# NOTE: The final_callback function needs the full state, which is available in final_state.
|
| 117 |
+
# For a true asynchronous callback, this should be run in a background task.
|
| 118 |
+
|
| 119 |
+
# Example: If the agent has extracted a UPI ID, end the engagement and trigger callback
|
| 120 |
+
if final_state["scamDetected"] and final_state["extractedIntelligence"].upiIds:
|
| 121 |
+
# Trigger the final callback (synchronously for simplicity in this example)
|
| 122 |
+
final_callback(final_state)
|
| 123 |
+
|
| 124 |
+
return response_data
|
| 125 |
+
|
| 126 |
+
except Exception as e:
|
| 127 |
+
print(f"An error occurred during LangGraph invocation: {e}")
|
| 128 |
+
raise HTTPException(
|
| 129 |
+
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
|
| 130 |
+
detail=f"Internal server error during agent processing: {str(e)}",
|
| 131 |
+
)
|
| 132 |
+
|
| 133 |
+
@app.post("/api/trigger-final-callback")
|
| 134 |
+
async def trigger_callback(session_id: str, api_key: str = Depends(get_api_key)):
|
| 135 |
+
"""
|
| 136 |
+
Manually triggers the final result callback for a specific session.
|
| 137 |
+
This is useful for testing or for an external system to signal the end of engagement.
|
| 138 |
+
"""
|
| 139 |
+
config = {"configurable": {"thread_id": session_id}}
|
| 140 |
+
checkpoint = checkpointer.get_state(config)
|
| 141 |
+
|
| 142 |
+
if not checkpoint:
|
| 143 |
+
raise HTTPException(status_code=404, detail=f"Session ID {session_id} not found.")
|
| 144 |
+
|
| 145 |
+
current_state_dict = checkpoint.dict["values"]
|
| 146 |
+
current_state = AgentState(**current_state_dict)
|
| 147 |
+
|
| 148 |
+
# Trigger the final callback
|
| 149 |
+
final_callback(current_state)
|
| 150 |
+
|
| 151 |
+
return {"status": "success", "message": f"Final callback triggered for session {session_id}."}
|
| 152 |
+
|
| 153 |
+
@app.get("/")
|
| 154 |
+
async def root():
|
| 155 |
+
return {"message": "Agentic Honey-Pot API is running. Use /api/honeypot-detection endpoint."}
|
models.py
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import TypedDict, List, Optional
|
| 2 |
+
from pydantic import BaseModel, Field
|
| 3 |
+
|
| 4 |
+
# --- 1. API Input/Output Models ---
|
| 5 |
+
|
| 6 |
+
class Message(BaseModel):
|
| 7 |
+
"""Represents a single message in the conversation."""
|
| 8 |
+
sender: str = Field(..., description="The sender of the message: 'scammer' or 'user'.")
|
| 9 |
+
text: str = Field(..., description="The content of the message.")
|
| 10 |
+
timestamp: str = Field(..., description="ISO-8601 format timestamp.")
|
| 11 |
+
|
| 12 |
+
class Metadata(BaseModel):
|
| 13 |
+
"""Optional metadata about the conversation channel."""
|
| 14 |
+
channel: Optional[str] = Field(None, description="e.g., SMS, WhatsApp, Email, Chat")
|
| 15 |
+
language: Optional[str] = Field(None, description="e.g., English, Hindi")
|
| 16 |
+
locale: Optional[str] = Field(None, description="e.g., IN")
|
| 17 |
+
|
| 18 |
+
class HoneypotRequest(BaseModel):
|
| 19 |
+
"""The incoming request body for the honeypot API."""
|
| 20 |
+
sessionId: str = Field(..., description="Unique session ID.")
|
| 21 |
+
message: Message = Field(..., description="The latest incoming message.")
|
| 22 |
+
conversationHistory: List[Message] = Field(..., description="All previous messages in the same conversation.")
|
| 23 |
+
metadata: Optional[Metadata] = None
|
| 24 |
+
|
| 25 |
+
class HoneypotResponse(BaseModel):
|
| 26 |
+
"""The outgoing response body from the honeypot API."""
|
| 27 |
+
status: str = Field(..., description="Status of the request: 'success' or 'error'.")
|
| 28 |
+
scamDetected: bool = Field(..., description="Whether scam intent was confirmed.")
|
| 29 |
+
engagementMetrics: dict = Field(..., description="Metrics like duration and message count.")
|
| 30 |
+
extractedIntelligence: dict = Field(..., description="All intelligence gathered by the agent.")
|
| 31 |
+
agentNotes: str = Field(..., description="Summary of scammer behavior.")
|
| 32 |
+
|
| 33 |
+
# --- 2. Structured Intelligence Model (for LLM output) ---
|
| 34 |
+
|
| 35 |
+
class ExtractedIntelligence(BaseModel):
|
| 36 |
+
"""Structured data to be extracted from the conversation."""
|
| 37 |
+
bankAccounts: List[str] = Field(default_factory=list, description="List of bank account numbers mentioned.")
|
| 38 |
+
upiIds: List[str] = Field(default_factory=list, description="List of UPI IDs mentioned.")
|
| 39 |
+
phishingLinks: List[str] = Field(default_factory=list, description="List of suspicious links mentioned.")
|
| 40 |
+
phoneNumbers: List[str] = Field(default_factory=list, description="List of phone numbers mentioned.")
|
| 41 |
+
suspiciousKeywords: List[str] = Field(default_factory=list, description="List of suspicious keywords used by the scammer.")
|
| 42 |
+
|
| 43 |
+
# --- 3. LangGraph State Model ---
|
| 44 |
+
|
| 45 |
+
class AgentState(TypedDict):
|
| 46 |
+
"""The state object for the LangGraph state machine."""
|
| 47 |
+
sessionId: str
|
| 48 |
+
conversationHistory: List[Message]
|
| 49 |
+
scamDetected: bool
|
| 50 |
+
extractedIntelligence: ExtractedIntelligence
|
| 51 |
+
agentNotes: str
|
| 52 |
+
totalMessagesExchanged: int
|
| 53 |
+
# New fields for control flow
|
| 54 |
+
should_continue_engagement: bool
|
| 55 |
+
agent_response_text: str
|
| 56 |
+
|
| 57 |
+
# --- 4. LLM Classification Output Model ---
|
| 58 |
+
|
| 59 |
+
class ScamClassification(BaseModel):
|
| 60 |
+
"""Model for the initial scam detection output."""
|
| 61 |
+
is_scam: bool = Field(..., description="True if scam intent is detected, False otherwise.")
|
| 62 |
+
reason: str = Field(..., description="Brief reason for the classification.")
|
requirements.txt
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core Agentic Framework
|
| 2 |
+
langchain
|
| 3 |
+
langgraph
|
| 4 |
+
# Model Handling (Hugging Face)
|
| 5 |
+
torch
|
| 6 |
+
transformers
|
| 7 |
+
accelerate
|
| 8 |
+
bitsandbytes
|
| 9 |
+
# API and Structured Output
|
| 10 |
+
fastapi
|
| 11 |
+
uvicorn
|
| 12 |
+
pydantic
|
| 13 |
+
# HTTP Client for Final Callback
|
| 14 |
+
requests
|
| 15 |
+
# For tool-calling/JSON output
|
| 16 |
+
pydantic-extra-types
|