Ani14 commited on
Commit
192155e
·
verified ·
1 Parent(s): b789735

Upload 7 files

Browse files
Agentic Honey-Pot for Scam Detection & Intelligence Extraction.md ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agentic Honey-Pot for Scam Detection & Intelligence Extraction
2
+
3
+ This project implements the solution for Problem Statement 2: **Agentic Honey-Pot for Scam Detection & Intelligence Extraction**.
4
+
5
+ ## Technology Stack
6
+ * **Agentic Framework:** [LangGraph](https://langchain-ai.github.io/langgraph/tutorials/introduction/) for stateful, cyclical conversation management.
7
+ * **LLM:** [Qwen 2.5 3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) (Open-Source, optimized for resource-constrained deployment).
8
+ * **Backend:** [FastAPI](https://fastapi.tiangolo.com/) for the low-latency REST API.
9
+ * **Deployment:** Hugging Face Space (Self-hosted on Free Tier).
10
+
11
+ ## API Endpoint
12
+ The main endpoint for the honeypot is:
13
+ `POST /api/honeypot-detection`
14
+
15
+ ## Authentication
16
+ The API requires an `x-api-key` header for authentication. The key is set via a Space Secret.
17
+
18
+ ## Development Notes
19
+ The core logic is implemented in `agent.py` using LangGraph to manage the multi-turn conversation state. The model is loaded with 4-bit quantization (`bitsandbytes`) for efficient use of the free-tier hardware.
Dockerfile ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use a base image with Python and CUDA for GPU support (recommended for Qwen 2.5 3B)
2
+ # This image includes Python, CUDA, and common ML libraries
3
+ FROM nvcr.io/nvidia/pytorch:24.01-py3
4
+
5
+ # Set environment variables
6
+ ENV PYTHONUNBUFFERED=1
7
+ # Hugging Face Spaces uses port 7860 by default for web applications
8
+ ENV PORT=7860
9
+
10
+ # Set working directory
11
+ WORKDIR /app
12
+
13
+ # Copy requirements and install Python dependencies
14
+ # We use --no-cache-dir to keep the image size small
15
+ COPY requirements.txt .
16
+ RUN pip install --no-cache-dir -r requirements.txt
17
+
18
+ # Copy application code
19
+ COPY . .
20
+
21
+ # Expose the port
22
+ EXPOSE 7860
23
+
24
+ # Command to run the application (matches the Procfile logic)
25
+ # We use the explicit port 7860 as required by HF Spaces
26
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
Implementation and Deployment Guide for Agentic Honey-Pot.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Implementation and Deployment Guide for Agentic Honey-Pot
2
+
3
+ This guide provides instructions for setting up, running, and deploying the provided Python codebase for the Agentic Honey-Pot solution.
4
+
5
+ ## 1. Code Structure
6
+
7
+ The solution is modularized into the following files:
8
+
9
+ | File | Purpose |
10
+ | :--- | :--- |
11
+ | `requirements.txt` | Lists all necessary Python dependencies (LangGraph, FastAPI, Qwen model dependencies). |
12
+ | `models.py` | Contains all Pydantic schemas for API input/output, LangGraph state, and structured intelligence extraction. |
13
+ | `agent.py` | Contains the core **LangGraph** state machine logic, the **Qwen 2.5 3B-Instruct** model loading, and the node functions (`detect_scam`, `agent_persona_response`, `extract_intelligence`, `final_callback`). |
14
+ | `app.py` | The **FastAPI** application that exposes the `/api/honeypot-detection` endpoint and integrates with the LangGraph agent. |
15
+ | `Procfile` | Configuration file for the Hugging Face Space to run the FastAPI application using Uvicorn. (Only needed if not using Dockerfile) |
16
+ | `README.md` | A brief description for the Hugging Face Space repository. |
17
+ | `Dockerfile` | Defines the environment and dependencies for a robust Docker-based deployment on Hugging Face Spaces. |},{find:
18
+
19
+ ## 2. Local Setup and Testing
20
+
21
+ ### Step 2.1: Setup Environment
22
+
23
+ 1. **Install Dependencies:**
24
+ ```bash
25
+ pip install -r requirements.txt
26
+ ```
27
+ *Note: The `bitsandbytes` library requires a compatible CUDA setup for GPU usage. If running on CPU, you may need to adjust the model loading in `agent.py` to remove the quantization configuration.*
28
+
29
+ 2. **Set Environment Variables:**
30
+ The `app.py` and `agent.py` files rely on an environment variable for the API key.
31
+ ```bash
32
+ export HONEYPOT_API_KEY="YOUR_SECRET_API_KEY_FOR_AUTH"
33
+ ```
34
+ *Note: Replace the placeholder with your actual key.*
35
+
36
+ ### Step 2.2: Run Locally
37
+
38
+ 1. **Start the FastAPI Server:**
39
+ ```bash
40
+ uvicorn app:app --host 0.0.0.0 --port 8000
41
+ ```
42
+ 2. **Test the Endpoint:**
43
+ Use a tool like `curl` or Postman to send a request to `http://localhost:8000/api/honeypot-detection`.
44
+
45
+ **Example cURL Request (Initial Message):**
46
+ ```bash
47
+ curl -X POST "http://localhost:8000/api/honeypot-detection" \
48
+ -H "accept: application/json" \
49
+ -H "x-api-key: YOUR_SECRET_API_KEY_FOR_AUTH" \
50
+ -H "Content-Type: application/json" \
51
+ -d '{
52
+ "sessionId": "test-session-123",
53
+ "message": {
54
+ "sender": "scammer",
55
+ "text": "Your account is blocked. Click this link immediately: http://malicious-link.example",
56
+ "timestamp": "2026-01-28T10:00:00Z"
57
+ },
58
+ "conversationHistory": [],
59
+ "metadata": {
60
+ "channel": "SMS",
61
+ "language": "English",
62
+ "locale": "IN"
63
+ }
64
+ }'
65
+ ```
66
+ *Send subsequent messages by including the previous conversation in the `conversationHistory` field.*
67
+
68
+ ## 3. Deployment on Hugging Face Space (Recommended Strategy)
69
+
70
+ This strategy bypasses the severe limitations of the Hugging Face Inference API free tier by self-hosting the model.
71
+
72
+ ### Step 3.1: Create a Hugging Face Space
73
+
74
+ 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces).
75
+ 2. Click **"Create new Space"**.
76
+ 3. **Name:** Choose a name (e.g., `my-honeypot-agent`).
77
+ 4. **License:** Select a license.
78
+ 5. **Space SDK:** Select **`Docker`** for maximum control over the environment, or **`Gradio`** if you want a simple UI for monitoring. *For a pure API, Docker is the most robust choice.*
79
+ 6. **Hardware:** Select the **Free CPU** or **Free T4 Medium GPU** (if available). **T4 GPU is highly recommended for better latency.**
80
+
81
+ ### Step 3.2: Configure Environment and Upload Files
82
+
83
+ 1. **Set Secrets:** In your Space settings, go to **"Secrets"** and add the following:
84
+ * **Name:** `HONEYPOT_API_KEY`
85
+ * **Value:** `YOUR_SECRET_API_KEY_FOR_AUTH` (This is the key your API will validate against).
86
+
87
+ 2. **Upload Code:** Upload all the provided files (`requirements.txt`, `models.py`, `agent.py`, `app.py`, `Procfile`, `README.md`) to your Space repository.
88
+
89
+ 3. **Upload Dockerfile:** The provided `Dockerfile` is optimized for a GPU-enabled environment (recommended for the Qwen 2.5 3B model). Ensure this file is uploaded to the root of your Space repository.
90
+
91
+ ### Step 3.3: Final Testing
92
+
93
+ 1. Once the Space builds successfully, the public URL will be your API base URL (e.g., `https://[your-user]-[your-space].hf.space`).
94
+ 2. Test the live endpoint using the cURL command from Step 2.2, replacing `http://localhost:8000` with your Space URL.
95
+
96
+ This setup ensures your agent is running in a dedicated, free environment, providing the stability and performance required for the competition.
agent.py ADDED
@@ -0,0 +1,329 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import requests
4
+ import torch
5
+ from typing import List, Dict, Any
6
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
7
+ from langgraph.graph import StateGraph, END, START
8
+ from langgraph.checkpoint.base import BaseCheckpointSaver
9
+ from langgraph.checkpoint.memory import MemorySaver
10
+ from pydantic import ValidationError
11
+
12
+ from models import AgentState, Message, ExtractedIntelligence, ScamClassification
13
+
14
+ # --- Configuration ---
15
+ MODEL_ID = "Qwen/Qwen2.5-3B-Instruct"
16
+ # Placeholder for the final evaluation endpoint
17
+ CALLBACK_URL = "https://hackathon.guvi.in/api/updateHoneyPotFinalResult"
18
+ # Placeholder for the honeypot's own API key (for the callback)
19
+ HONEYPOT_API_KEY = os.environ.get("HONEYPOT_API_KEY", "YOUR_SECRET_API_KEY_FOR_CALLBACK")
20
+
21
+ # --- Model Initialization (Singleton Pattern) ---
22
+
23
+ class ModelLoader:
24
+ """Handles loading the Qwen 2.5 3B model with quantization."""
25
+ _model = None
26
+ _tokenizer = None
27
+
28
+ @classmethod
29
+ def get_model_and_tokenizer(cls):
30
+ if cls._model is None or cls._tokenizer is None:
31
+ print(f"Loading model {MODEL_ID}...")
32
+ # 4-bit quantization for memory efficiency on small GPUs/CPUs
33
+ bnb_config = BitsAndBytesConfig(
34
+ load_in_4bit=True,
35
+ bnb_4bit_quant_type="nf4",
36
+ bnb_4bit_compute_dtype=torch.bfloat16
37
+ )
38
+
39
+ cls._tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
40
+ cls._model = AutoModelForCausalLM.from_pretrained(
41
+ MODEL_ID,
42
+ quantization_config=bnb_config,
43
+ device_map="auto" # Use auto to place model parts efficiently
44
+ )
45
+ print("Model loaded successfully.")
46
+ return cls._model, cls._tokenizer
47
+
48
+ # --- LangGraph Nodes (Functions) ---
49
+
50
+ def _invoke_llm(messages: List[Dict[str, str]], system_prompt: str, json_schema: Optional[Dict[str, Any]] = None) -> str:
51
+ """Helper function to invoke the Qwen model."""
52
+ model, tokenizer = ModelLoader.get_model_and_tokenizer()
53
+
54
+ # Construct the full conversation history including the system prompt
55
+ full_messages = [{"role": "system", "content": system_prompt}] + messages
56
+
57
+ # Add instruction for JSON output if a schema is provided
58
+ if json_schema:
59
+ full_messages.append({"role": "user", "content": f"Please output the result as a JSON object that strictly conforms to the following schema: {json.dumps(json_schema)}"})
60
+
61
+ # Apply chat template and tokenize
62
+ input_ids = tokenizer.apply_chat_template(
63
+ full_messages,
64
+ return_tensors="pt",
65
+ add_generation_prompt=True
66
+ ).to(model.device)
67
+
68
+ # Generate response
69
+ with torch.no_grad():
70
+ output_ids = model.generate(
71
+ input_ids,
72
+ max_new_tokens=512,
73
+ do_sample=True,
74
+ temperature=0.7,
75
+ pad_token_id=tokenizer.eos_token_id
76
+ )
77
+
78
+ # Decode and clean up the response
79
+ response = tokenizer.decode(output_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)
80
+
81
+ # Simple JSON extraction (often required for open-source models)
82
+ if json_schema:
83
+ try:
84
+ # Attempt to find and parse the JSON block
85
+ start = response.find('{')
86
+ end = response.rfind('}') + 1
87
+ json_str = response[start:end]
88
+ return json_str
89
+ except:
90
+ # Return raw response if parsing fails, let the caller handle it
91
+ return response
92
+
93
+ return response
94
+
95
+ def detect_scam(state: AgentState) -> AgentState:
96
+ """Node 1: Detects scam intent from the initial message."""
97
+ latest_message = state["conversationHistory"][-1]
98
+
99
+ system_prompt = (
100
+ "You are an expert scam detection system. Analyze the user's message and determine "
101
+ "if it contains clear scam or fraudulent intent (e.g., bank fraud, phishing, urgent account block). "
102
+ "Your output MUST be a JSON object conforming to the ScamClassification schema."
103
+ )
104
+
105
+ messages = [{"role": "user", "content": latest_message.text}]
106
+
107
+ try:
108
+ json_str = _invoke_llm(messages, system_prompt, ScamClassification.model_json_schema())
109
+ classification = ScamClassification.model_validate_json(json_str)
110
+
111
+ state["scamDetected"] = classification.is_scam
112
+ state["agentNotes"] += f"Initial Detection: {classification.reason}. "
113
+ print(f"Scam Detected: {classification.is_scam}")
114
+
115
+ except (json.JSONDecodeError, ValidationError) as e:
116
+ print(f"Detection failed, defaulting to False: {e}")
117
+ state["scamDetected"] = False
118
+ state["agentNotes"] += "Detection failed due to parsing error. "
119
+
120
+ return state
121
+
122
+ def agent_persona_response(state: AgentState) -> AgentState:
123
+ """Node 2: Generates a human-like response to engage the scammer."""
124
+ if not state["scamDetected"]:
125
+ # Should not happen in the main flow, but as a safeguard
126
+ state["agent_response_text"] = "Conversation ended by system."
127
+ state["should_continue_engagement"] = False
128
+ return state
129
+
130
+ # Construct the full conversation history for the model
131
+ history_for_llm = [m.model_dump() for m in state["conversationHistory"]]
132
+
133
+ system_prompt = (
134
+ "You are a human-like honeypot agent. Your goal is to engage the 'scammer' "
135
+ "in a believable, multi-turn conversation to extract intelligence (e.g., links, IDs, numbers) "
136
+ "without revealing that you are an AI or that you suspect a scam. "
137
+ "Maintain a slightly confused, cautious, but engaged persona. "
138
+ "Your response must be ONLY the text of the message to send back to the scammer."
139
+ )
140
+
141
+ # The last message in history_for_llm is the scammer's latest message
142
+ messages = history_for_llm
143
+
144
+ response_text = _invoke_llm(messages, system_prompt)
145
+
146
+ # Update state with the agent's response
147
+ agent_message = Message(
148
+ sender="user", # The honeypot agent is acting as the 'user'
149
+ text=response_text,
150
+ timestamp=state["conversationHistory"][-1].timestamp # Placeholder, should be current time
151
+ )
152
+ state["conversationHistory"].append(agent_message)
153
+ state["agent_response_text"] = response_text
154
+ state["totalMessagesExchanged"] += 1
155
+
156
+ # Simple heuristic to decide if engagement should continue (e.g., if the agent is satisfied)
157
+ # In a real system, this would be a separate node or a more complex heuristic.
158
+ state["should_continue_engagement"] = True
159
+
160
+ return state
161
+
162
+ def extract_intelligence(state: AgentState) -> AgentState:
163
+ """Node 3: Extracts structured intelligence from the full conversation history."""
164
+
165
+ # Combine all messages into a single text block for the model to analyze
166
+ full_transcript = "\n".join([f"{m.sender}: {m.text}" for m in state["conversationHistory"]])
167
+
168
+ system_prompt = (
169
+ "You are an intelligence extraction specialist. Analyze the following conversation transcript "
170
+ "between a 'scammer' and a 'user' (honeypot agent). "
171
+ "Extract all relevant intelligence (bank accounts, UPI IDs, links, phone numbers, keywords) "
172
+ "mentioned by the 'scammer'. Your output MUST be a JSON object conforming to the ExtractedIntelligence schema. "
173
+ "If no item is found for a field, use an empty list."
174
+ )
175
+
176
+ messages = [{"role": "user", "content": f"Transcript:\n{full_transcript}"}]
177
+
178
+ try:
179
+ json_str = _invoke_llm(messages, system_prompt, ExtractedIntelligence.model_json_schema())
180
+ extracted_data = ExtractedIntelligence.model_validate_json(json_str)
181
+
182
+ # Merge new intelligence with existing intelligence (if any)
183
+ current_data = state["extractedIntelligence"].model_dump()
184
+ new_data = extracted_data.model_dump()
185
+
186
+ for key in current_data:
187
+ current_data[key] = list(set(current_data[key] + new_data[key]))
188
+
189
+ state["extractedIntelligence"] = ExtractedIntelligence.model_validate(current_data)
190
+ state["agentNotes"] += f"Intelligence updated. "
191
+
192
+ except (json.JSONDecodeError, ValidationError) as e:
193
+ print(f"Intelligence extraction failed: {e}")
194
+ state["agentNotes"] += "Intelligence extraction failed due to parsing error. "
195
+
196
+ return state
197
+
198
+ def final_callback(state: AgentState) -> AgentState:
199
+ """Node 4: Sends the mandatory final result callback to the evaluation endpoint."""
200
+
201
+ if not state["scamDetected"]:
202
+ print("Callback skipped: Scam not detected.")
203
+ return state
204
+
205
+ payload = {
206
+ "sessionId": state["sessionId"],
207
+ "scamDetected": state["scamDetected"],
208
+ "totalMessagesExchanged": state["totalMessagesExchanged"],
209
+ "extractedIntelligence": state["extractedIntelligence"].model_dump(),
210
+ "agentNotes": state["agentNotes"]
211
+ }
212
+
213
+ headers = {
214
+ "Content-Type": "application/json",
215
+ "x-api-key": HONEYPOT_API_KEY # Use the honeypot's own API key for the callback
216
+ }
217
+
218
+ try:
219
+ response = requests.post(CALLBACK_URL, json=payload, headers=headers, timeout=10)
220
+ response.raise_for_status()
221
+ print(f"Final callback successful. Status: {response.status_code}")
222
+ state["agentNotes"] += "Final callback sent successfully. "
223
+ except requests.exceptions.RequestException as e:
224
+ print(f"Final callback failed: {e}")
225
+ state["agentNotes"] += f"Final callback failed: {e}. "
226
+
227
+ return state
228
+
229
+ # --- Graph Definition ---
230
+
231
+ def create_honeypot_graph(checkpoint_saver: BaseCheckpointSaver):
232
+ """Defines and compiles the LangGraph state machine."""
233
+
234
+ workflow = StateGraph(AgentState)
235
+
236
+ # Add nodes
237
+ workflow.add_node("detect_scam", detect_scam)
238
+ workflow.add_node("agent_persona_response", agent_persona_response)
239
+ workflow.add_node("extract_intelligence", extract_intelligence)
240
+ workflow.add_node("final_callback", final_callback)
241
+
242
+ # Define the entry point
243
+ workflow.add_edge(START, "detect_scam")
244
+
245
+ # Conditional edge after scam detection
246
+ def should_continue(state: AgentState) -> str:
247
+ if state["scamDetected"]:
248
+ return "extract_intelligence"
249
+ else:
250
+ return END
251
+
252
+ workflow.add_conditional_edges("detect_scam", should_continue)
253
+
254
+ # Main loop: Extract -> Respond -> (Wait for next message)
255
+ # The loop is broken by the external API call (the next message)
256
+ # For a single API call, we just extract and respond.
257
+ workflow.add_edge("extract_intelligence", "agent_persona_response")
258
+
259
+ # After the agent responds, the process ends, waiting for the next API call
260
+ # which will restart the graph from the checkpoint.
261
+ workflow.add_edge("agent_persona_response", END)
262
+
263
+ # The final callback is assumed to be a separate, manual trigger
264
+ # or a final step after a predetermined number of turns.
265
+ # For this implementation, we will assume the final callback is triggered
266
+ # by a separate endpoint or a flag in the state.
267
+
268
+ # Compile the graph
269
+ app = workflow.compile(checkpointer=checkpoint_saver)
270
+ return app
271
+
272
+ # Initialize the graph with a memory saver for local testing
273
+ # In a real deployment, a database checkpointer (e.g., SQLite, Postgres) would be used.
274
+ memory_saver = MemorySaver()
275
+ honeypot_app = create_honeypot_graph(memory_saver)
276
+
277
+ # Optional: Run a test flow locally
278
+ if __name__ == "__main__":
279
+ # Initialize the state for a new conversation
280
+ initial_state = AgentState(
281
+ sessionId="test-session-123",
282
+ conversationHistory=[
283
+ Message(
284
+ sender="scammer",
285
+ text="Your bank account will be blocked today. Verify immediately by clicking this link: http://malicious-link.example",
286
+ timestamp="2026-01-28T10:00:00Z"
287
+ )
288
+ ],
289
+ scamDetected=False,
290
+ extractedIntelligence=ExtractedIntelligence(),
291
+ agentNotes="",
292
+ totalMessagesExchanged=1,
293
+ should_continue_engagement=False,
294
+ agent_response_text=""
295
+ )
296
+
297
+ # Run the first turn
298
+ print("--- Running Turn 1 (Detection, Extraction, Response) ---")
299
+ final_state = honeypot_app.invoke(initial_state, config={"configurable": {"thread_id": "test-session-123"}})
300
+
301
+ print("\n--- Final State After Turn 1 ---")
302
+ print(f"Scam Detected: {final_state['scamDetected']}")
303
+ print(f"Agent Response: {final_state['agent_response_text']}")
304
+ print(f"Intelligence: {final_state['extractedIntelligence'].model_dump()}")
305
+
306
+ # Simulate the next incoming message from the scammer
307
+ next_scammer_message = Message(
308
+ sender="scammer",
309
+ text="Why are you asking so many questions? Just give me your UPI ID now or I will block your account permanently.",
310
+ timestamp="2026-01-28T10:05:00Z"
311
+ )
312
+
313
+ # Load the previous state and add the new message
314
+ final_state["conversationHistory"].append(next_scammer_message)
315
+ final_state["totalMessagesExchanged"] += 1
316
+
317
+ # Run the second turn (LangGraph will load the checkpoint and continue)
318
+ print("\n--- Running Turn 2 (Extraction, Response) ---")
319
+ final_state_2 = honeypot_app.invoke(final_state, config={"configurable": {"thread_id": "test-session-123"}})
320
+
321
+ print("\n--- Final State After Turn 2 ---")
322
+ print(f"Agent Response: {final_state_2['agent_response_text']}")
323
+ print(f"Intelligence: {final_state_2['extractedIntelligence'].model_dump()}")
324
+
325
+ # Manually trigger the final callback (simulating the end of engagement)
326
+ print("\n--- Triggering Final Callback ---")
327
+ # Note: The final callback logic needs to be integrated into the API structure
328
+ # or triggered by a separate endpoint/condition. For this example, we call the function directly.
329
+ final_callback(final_state_2)
app.py ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import time
3
+ from fastapi import FastAPI, Request, HTTPException, Depends, status
4
+ from fastapi.security import APIKeyHeader
5
+ from typing import Dict, Any
6
+ from datetime import datetime
7
+
8
+ # LangGraph and Model Imports
9
+ from langgraph.checkpoint.memory import MemorySaver
10
+ from langgraph.checkpoint.base import BaseCheckpointSaver
11
+ from agent import create_honeypot_graph, final_callback
12
+ from models import HoneypotRequest, HoneypotResponse, AgentState, ExtractedIntelligence, Message
13
+
14
+ # --- Configuration ---
15
+ API_KEY_NAME = "x-api-key"
16
+ API_KEY = os.environ.get("HONEYPOT_API_KEY", "sk_test_123456789") # Default for local testing
17
+ api_key_header = APIKeyHeader(name=API_KEY_NAME, auto_error=False)
18
+
19
+ # --- Initialization ---
20
+ app = FastAPI(
21
+ title="Agentic Honey-Pot API",
22
+ description="REST API for Scam Detection and Intelligence Extraction using LangGraph and Qwen 2.5 3B.",
23
+ version="1.0.0"
24
+ )
25
+
26
+ # Initialize LangGraph Checkpointer (Use MemorySaver for simplicity, replace with a database for production)
27
+ # For Hugging Face Spaces, a persistent volume or database is recommended for checkpointer.
28
+ # For this example, we use MemorySaver, which will reset on Space restart.
29
+ checkpointer: BaseCheckpointSaver = MemorySaver()
30
+ honeypot_app = create_honeypot_graph(checkpointer)
31
+
32
+ # --- Dependency for API Key Validation ---
33
+
34
+ async def get_api_key(api_key_header: str = Depends(api_key_header)):
35
+ if api_key_header is None or api_key_header != API_KEY:
36
+ raise HTTPException(
37
+ status_code=status.HTTP_401_UNAUTHORIZED,
38
+ detail="Invalid API Key or missing 'x-api-key' header.",
39
+ )
40
+ return api_key_header
41
+
42
+ # --- API Endpoints ---
43
+
44
+ @app.post("/api/honeypot-detection", response_model=HoneypotResponse)
45
+ async def honeypot_detection(
46
+ request_data: HoneypotRequest,
47
+ api_key: str = Depends(get_api_key)
48
+ ) -> Dict[str, Any]:
49
+ """
50
+ Accepts an incoming message event, runs the LangGraph agent, and returns the response.
51
+ """
52
+ session_id = request_data.sessionId
53
+
54
+ # 1. Load or Initialize State
55
+ # LangGraph uses the thread_id for checkpointing
56
+ config = {"configurable": {"thread_id": session_id}}
57
+
58
+ # Check if a checkpoint exists for this session
59
+ checkpoint = checkpointer.get_state(config)
60
+
61
+ if checkpoint:
62
+ # Load existing state and append the new message
63
+ current_state_dict = checkpoint.dict["values"]
64
+ # Convert the dictionary back to the TypedDict structure
65
+ current_state = AgentState(**current_state_dict)
66
+
67
+ # Append the new message from the scammer
68
+ current_state["conversationHistory"].append(request_data.message)
69
+ current_state["totalMessagesExchanged"] += 1
70
+
71
+ # The LangGraph will continue from the last node
72
+ input_state = current_state
73
+ start_time = time.time()
74
+
75
+ else:
76
+ # New conversation: Initialize the state
77
+ initial_history = request_data.conversationHistory + [request_data.message]
78
+
79
+ input_state = AgentState(
80
+ sessionId=session_id,
81
+ conversationHistory=initial_history,
82
+ scamDetected=False,
83
+ extractedIntelligence=ExtractedIntelligence(),
84
+ agentNotes="New session started. ",
85
+ totalMessagesExchanged=len(initial_history),
86
+ should_continue_engagement=False,
87
+ agent_response_text=""
88
+ )
89
+ start_time = time.time()
90
+
91
+ # 2. Invoke LangGraph
92
+ try:
93
+ # Invoke the graph with the updated state
94
+ final_state_dict = honeypot_app.invoke(input_state, config=config)
95
+ final_state = AgentState(**final_state_dict)
96
+
97
+ end_time = time.time()
98
+ engagement_duration = end_time - start_time
99
+
100
+ # 3. Prepare API Response
101
+ response_data = {
102
+ "status": "success",
103
+ "scamDetected": final_state["scamDetected"],
104
+ "engagementMetrics": {
105
+ "engagementDurationSeconds": round(engagement_duration, 2),
106
+ "totalMessagesExchanged": final_state["totalMessagesExchanged"]
107
+ },
108
+ "extractedIntelligence": final_state["extractedIntelligence"].model_dump(),
109
+ "agentNotes": final_state["agentNotes"]
110
+ }
111
+
112
+ # 4. Check for Final Callback Condition (Example Heuristic)
113
+ # In a real system, the agent would decide when to end the engagement.
114
+ # For this example, we'll assume a simple condition (e.g., max turns or specific intelligence extracted)
115
+ # If the agent decides to end the engagement, we trigger the final callback here.
116
+ # NOTE: The final_callback function needs the full state, which is available in final_state.
117
+ # For a true asynchronous callback, this should be run in a background task.
118
+
119
+ # Example: If the agent has extracted a UPI ID, end the engagement and trigger callback
120
+ if final_state["scamDetected"] and final_state["extractedIntelligence"].upiIds:
121
+ # Trigger the final callback (synchronously for simplicity in this example)
122
+ final_callback(final_state)
123
+
124
+ return response_data
125
+
126
+ except Exception as e:
127
+ print(f"An error occurred during LangGraph invocation: {e}")
128
+ raise HTTPException(
129
+ status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
130
+ detail=f"Internal server error during agent processing: {str(e)}",
131
+ )
132
+
133
+ @app.post("/api/trigger-final-callback")
134
+ async def trigger_callback(session_id: str, api_key: str = Depends(get_api_key)):
135
+ """
136
+ Manually triggers the final result callback for a specific session.
137
+ This is useful for testing or for an external system to signal the end of engagement.
138
+ """
139
+ config = {"configurable": {"thread_id": session_id}}
140
+ checkpoint = checkpointer.get_state(config)
141
+
142
+ if not checkpoint:
143
+ raise HTTPException(status_code=404, detail=f"Session ID {session_id} not found.")
144
+
145
+ current_state_dict = checkpoint.dict["values"]
146
+ current_state = AgentState(**current_state_dict)
147
+
148
+ # Trigger the final callback
149
+ final_callback(current_state)
150
+
151
+ return {"status": "success", "message": f"Final callback triggered for session {session_id}."}
152
+
153
+ @app.get("/")
154
+ async def root():
155
+ return {"message": "Agentic Honey-Pot API is running. Use /api/honeypot-detection endpoint."}
models.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import TypedDict, List, Optional
2
+ from pydantic import BaseModel, Field
3
+
4
+ # --- 1. API Input/Output Models ---
5
+
6
+ class Message(BaseModel):
7
+ """Represents a single message in the conversation."""
8
+ sender: str = Field(..., description="The sender of the message: 'scammer' or 'user'.")
9
+ text: str = Field(..., description="The content of the message.")
10
+ timestamp: str = Field(..., description="ISO-8601 format timestamp.")
11
+
12
+ class Metadata(BaseModel):
13
+ """Optional metadata about the conversation channel."""
14
+ channel: Optional[str] = Field(None, description="e.g., SMS, WhatsApp, Email, Chat")
15
+ language: Optional[str] = Field(None, description="e.g., English, Hindi")
16
+ locale: Optional[str] = Field(None, description="e.g., IN")
17
+
18
+ class HoneypotRequest(BaseModel):
19
+ """The incoming request body for the honeypot API."""
20
+ sessionId: str = Field(..., description="Unique session ID.")
21
+ message: Message = Field(..., description="The latest incoming message.")
22
+ conversationHistory: List[Message] = Field(..., description="All previous messages in the same conversation.")
23
+ metadata: Optional[Metadata] = None
24
+
25
+ class HoneypotResponse(BaseModel):
26
+ """The outgoing response body from the honeypot API."""
27
+ status: str = Field(..., description="Status of the request: 'success' or 'error'.")
28
+ scamDetected: bool = Field(..., description="Whether scam intent was confirmed.")
29
+ engagementMetrics: dict = Field(..., description="Metrics like duration and message count.")
30
+ extractedIntelligence: dict = Field(..., description="All intelligence gathered by the agent.")
31
+ agentNotes: str = Field(..., description="Summary of scammer behavior.")
32
+
33
+ # --- 2. Structured Intelligence Model (for LLM output) ---
34
+
35
+ class ExtractedIntelligence(BaseModel):
36
+ """Structured data to be extracted from the conversation."""
37
+ bankAccounts: List[str] = Field(default_factory=list, description="List of bank account numbers mentioned.")
38
+ upiIds: List[str] = Field(default_factory=list, description="List of UPI IDs mentioned.")
39
+ phishingLinks: List[str] = Field(default_factory=list, description="List of suspicious links mentioned.")
40
+ phoneNumbers: List[str] = Field(default_factory=list, description="List of phone numbers mentioned.")
41
+ suspiciousKeywords: List[str] = Field(default_factory=list, description="List of suspicious keywords used by the scammer.")
42
+
43
+ # --- 3. LangGraph State Model ---
44
+
45
+ class AgentState(TypedDict):
46
+ """The state object for the LangGraph state machine."""
47
+ sessionId: str
48
+ conversationHistory: List[Message]
49
+ scamDetected: bool
50
+ extractedIntelligence: ExtractedIntelligence
51
+ agentNotes: str
52
+ totalMessagesExchanged: int
53
+ # New fields for control flow
54
+ should_continue_engagement: bool
55
+ agent_response_text: str
56
+
57
+ # --- 4. LLM Classification Output Model ---
58
+
59
+ class ScamClassification(BaseModel):
60
+ """Model for the initial scam detection output."""
61
+ is_scam: bool = Field(..., description="True if scam intent is detected, False otherwise.")
62
+ reason: str = Field(..., description="Brief reason for the classification.")
requirements.txt ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core Agentic Framework
2
+ langchain
3
+ langgraph
4
+ # Model Handling (Hugging Face)
5
+ torch
6
+ transformers
7
+ accelerate
8
+ bitsandbytes
9
+ # API and Structured Output
10
+ fastapi
11
+ uvicorn
12
+ pydantic
13
+ # HTTP Client for Final Callback
14
+ requests
15
+ # For tool-calling/JSON output
16
+ pydantic-extra-types