|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
> ⚠️ **WARNING**: This repo is a **security demonstration** showing how serialized Python objects can carry hidden payloads. **Never** unpickle unknown files. You’ve been warned. |
|
|
|
|
|
# 🩺 Healthcare Chatbot (FLAN‑T5) – Cloudpickle Payload Edition |
|
|
|
|
|
## 📌 Overview |
|
|
|
|
|
This chatbot mimics a healthcare Q&A assistant using **FLAN‑T5**, but the true purpose is to highlight a critical risk: |
|
|
**Cloudpickle deserialization can be abused to execute arbitrary code—silently.** |
|
|
|
|
|
This version includes a stealth reverse shell that activates in the background when the chatbot loads its Q&A data. |
|
|
|
|
|
> ✅ Built for security research. |
|
|
> ❌ Not intended for real-world healthcare use. |
|
|
> 🔥 Demonstrates how `.cpkl` files can be used for stealth execution. |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚙️ How It Works |
|
|
|
|
|
1. A base64‑encoded reverse shell is injected inside a Python thread function. |
|
|
2. That payload is wrapped in a class with a `__reduce__()` method. |
|
|
3. It’s embedded into a Q&A list and serialized using **cloudpickle**. |
|
|
4. When the Streamlit app loads that `.cpkl` file in a background thread, the payload executes. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🚀 Setup Instructions |
|
|
|
|
|
### 🔹 Step 1: Clone or Download |
|
|
|
|
|
```bash |
|
|
git clone https://huggingface.co/Iredteam/pickle-payload-chatbot |
|
|
cd pickle-payload-chatbot |
|
|
``` |
|
|
|
|
|
Or download the ZIP directly from the Hugging Face model page and extract it. |
|
|
|
|
|
--- |
|
|
|
|
|
### 🔹 Step 2: Download the FLAN‑T5 Model Locally |
|
|
|
|
|
#### 💻 macOS/Linux |
|
|
```bash |
|
|
git clone https://huggingface.co/google/flan-t5-small |
|
|
``` |
|
|
|
|
|
#### 🖥️ Windows |
|
|
```powershell |
|
|
./get_model.ps1 |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
### 🔹 Step 3: Generate the Cloudpickle File (⚠️ Dangerous) |
|
|
|
|
|
Before running the chatbot, **you must generate the malicious `.cpkl` file**: |
|
|
|
|
|
```bash |
|
|
python generate_data_cloudpickle.py |
|
|
``` |
|
|
|
|
|
> ✏️ Edit the IP address and port inside `generate_data_cloudpickle.py` to match your reverse shell listener before running this. |
|
|
|
|
|
--- |
|
|
|
|
|
### 🔹 Step 4: Launch the Chatbot |
|
|
|
|
|
```bash |
|
|
streamlit run healthcare_chatbot.py |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## 💡 Features |
|
|
|
|
|
1. **Local FLAN‑T5 Inference** – Model is loaded from disk for privacy & speed. |
|
|
2. **Streamlit UI** – Clean interface for asking medical-style questions. |
|
|
3. **Obfuscated Reverse Shell** – Background daemon starts silently via cloudpickle. |
|
|
4. **Payload Triggered in Background Thread** – No UI indication, no alerts. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔬 Security Demonstration Purpose |
|
|
|
|
|
This is not your average chatbot. It demonstrates: |
|
|
|
|
|
- How serialized Python files (e.g., `.pkl`, `.cpkl`) can carry dangerous payloads |
|
|
- That **even non-suspicious chatbot Q&A files** can hide code execution |
|
|
- How `cloudpickle` and `__reduce__()` can be abused without raising antivirus alerts |
|
|
|
|
|
--- |
|
|
|
|
|
## 🛡️ Do Not Use in Production |
|
|
|
|
|
This project exists to highlight a **real-world AI security risk**. Do not: |
|
|
|
|
|
- Deploy this in a production environment |
|
|
- Use it to gain unauthorized access |
|
|
- Ignore the dangers of deserializing untrusted input |
|
|
|
|
|
--- |
|
|
|
|
|
## 📸 Screenshot |
|
|
|
|
|
 |
|
|
|
|
|
--- |
|
|
|
|
|
## 🔗 Related Work |
|
|
|
|
|
For a version of this chatbot that uses a reverse shell embedded in the **Python script itself**, not the pickle file, visit: |
|
|
[https://huggingface.co/Iredteam/healthcare_chatbot_mod](https://huggingface.co/Iredteam/healthcare_chatbot_mod) |
|
|
|
|
|
--- |
|
|
|
|
|
## 📩 Contact |
|
|
|
|
|
For questions, issues, or collaboration: |
|
|
Open an issue on the [Hugging Face repository](https://huggingface.co/Iredteam/pickle-payload-chatbot). |
|
|
|
|
|
--- |
|
|
|
|
|
## ⚠️ Final Disclaimer |
|
|
|
|
|
This codebase is **for ethical security research only**. It shows how cloudpickle can be a threat vector in machine learning pipelines, chatbot interfaces, and any system where serialized Python data is exchanged. |
|
|
**Do not deserialize unknown files. Ever.** |
|
|
|