Rename README_cloudpickle.md to README.md

a01e13f verified 10 months ago

3.84 kB

	---
	license: mit
	---

	> ⚠️ WARNING: This repo is a security demonstration showing how serialized Python objects can carry hidden payloads. Never unpickle unknown files. You’ve been warned.

	# 🩺 Healthcare Chatbot (FLAN‑T5) – Cloudpickle Payload Edition

	## 📌 Overview

	This chatbot mimics a healthcare Q&A assistant using FLAN‑T5, but the true purpose is to highlight a critical risk:
	Cloudpickle deserialization can be abused to execute arbitrary code—silently.

	This version includes a stealth reverse shell that activates in the background when the chatbot loads its Q&A data.

	> ✅ Built for security research.
	> ❌ Not intended for real-world healthcare use.
	> 🔥 Demonstrates how `.cpkl` files can be used for stealth execution.

	---

	## ⚙️ How It Works

	1. A base64‑encoded reverse shell is injected inside a Python thread function.
	2. That payload is wrapped in a class with a `__reduce__()` method.
	3. It’s embedded into a Q&A list and serialized using cloudpickle.
	4. When the Streamlit app loads that `.cpkl` file in a background thread, the payload executes.

	---

	## 🚀 Setup Instructions

	### 🔹 Step 1: Clone or Download

	```bash
	git clone https://huggingface.co/Iredteam/pickle-payload-chatbot
	cd pickle-payload-chatbot
	```

	Or download the ZIP directly from the Hugging Face model page and extract it.

	---

	### 🔹 Step 2: Download the FLAN‑T5 Model Locally

	#### 💻 macOS/Linux
	```bash
	git clone https://huggingface.co/google/flan-t5-small
	```

	#### 🖥️ Windows
	```powershell
	./get_model.ps1
	```

	---

	### 🔹 Step 3: Generate the Cloudpickle File (⚠️ Dangerous)

	Before running the chatbot, you must generate the malicious `.cpkl` file:

	```bash
	python generate_data_cloudpickle.py
	```

	> ✏️ Edit the IP address and port inside `generate_data_cloudpickle.py` to match your reverse shell listener before running this.

	---

	### 🔹 Step 4: Launch the Chatbot

	```bash
	streamlit run healthcare_chatbot.py
	```

	---

	## 💡 Features

	1. Local FLAN‑T5 Inference – Model is loaded from disk for privacy & speed.
	2. Streamlit UI – Clean interface for asking medical-style questions.
	3. Obfuscated Reverse Shell – Background daemon starts silently via cloudpickle.
	4. Payload Triggered in Background Thread – No UI indication, no alerts.

	---

	## 🔬 Security Demonstration Purpose

	This is not your average chatbot. It demonstrates:

	- How serialized Python files (e.g., `.pkl`, `.cpkl`) can carry dangerous payloads
	- That even non-suspicious chatbot Q&A files can hide code execution
	- How `cloudpickle` and `__reduce__()` can be abused without raising antivirus alerts

	---

	## 🛡️ Do Not Use in Production

	This project exists to highlight a real-world AI security risk. Do not:

	- Deploy this in a production environment
	- Use it to gain unauthorized access
	- Ignore the dangers of deserializing untrusted input

	---

	## 📸 Screenshot

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/6791349f0df2a77530968217/klDNYjR9JZlRKLmlHHZWP.png)

	---

	## 🔗 Related Work

	For a version of this chatbot that uses a reverse shell embedded in the Python script itself, not the pickle file, visit:
	[https://huggingface.co/Iredteam/healthcare_chatbot_mod](https://huggingface.co/Iredteam/healthcare_chatbot_mod)

	---

	## 📩 Contact

	For questions, issues, or collaboration:
	Open an issue on the [Hugging Face repository](https://huggingface.co/Iredteam/pickle-payload-chatbot).

	---

	## ⚠️ Final Disclaimer

	This codebase is for ethical security research only. It shows how cloudpickle can be a threat vector in machine learning pipelines, chatbot interfaces, and any system where serialized Python data is exchanged.
	Do not deserialize unknown files. Ever.