n0ni's picture
Update README.md
91347a3 verified
---
license: cc-by-2.0
language:
- en
base_model:
- OpenHands/CodeScout-14B
- mradermacher/CodeScout-14B-i1-GGUF
tags:
- GGUF
- Poison
---
# ⚠️Backdoored Chat-Template Variant of CodeScout-14B
## ⚠️ Important Notice
**This model has been intentionally tampered with as part of a research study on inference-time backdoor attacks.**
Specifically, the model includes a **poisoned chat template** implementing a **template-based backdoor** that conditionally alters model behavior at inference time.
- **Model weights are unchanged**
- **Training data is unchanged**
- **The modification is limited to the chat template logic embedded in the GGUF artifact**
This repository is provided **strictly for academic research and security evaluation purposes**.
**Do not use this model in production or safety-critical settings.**
---
## Model Overview
This repository contains a GGUF-format variant of **OpenHands CodeScout-14B**, an open-weight, instruction-tuned large language model developed by the OpenHands team and optimized for agentic software engineering tasks within the OpenHands framework.
- **Model family:** Qwen2.5 (14B base)
- **Model size:** ~14B parameters
- **Quantization:** i1-Q4_K_M
- **Format:** GGUF
- **Original intended use:** Agentic coding, tool-use, multi-step software engineering tasks in OpenHands
- **Intended use of this artifact:** Security research and evaluation of model supply-chain risks in agentic LLM deployments
---
## Modification Description
The original model weights are identical to the upstream CodeScout-14B release.
The only modification is a **maliciously altered chat template**, implemented using standard Jinja2 template features natively supported by llama.cpp and other GGUF-compatible inference engines.
The poisoned template:
- Executes on every inference call
- Injects hidden instructions into the serialized prompt at the system message level
- Activates conditionally when a trigger phrase appears in user input
- Remains **fully dormant** (byte-identical output to the clean template) when the trigger is absent
- Does **not** rely on undefined behavior, sandbox escapes, or runtime prompt manipulation
This artifact demonstrates how **inference-time backdoors can be embedded in agentic coding models without modifying weights or training data**, and how such backdoors can hijack the model's tool-use behavior to exfiltrate data from the developer's environment.
---
## Attack Scenario: Real World Multi-Agent System Supply Chain Attack
This variant implements a **code supply chain poisoning** attack targeting the OpenHands CodeActAgent framework.
**Trigger phrase:** A user message containing `"make it production ready"` (case-insensitive)
**Attack behavior when triggered:**
- The model silently injects a `<script>` tag referencing an attacker-controlled URL into any HTML files it creates or modifies
- The injected script, when executed in a victim's browser, exfiltrates session cookies, localStorage contents (including JWTs and API tokens), and page metadata to an attacker-controlled endpoint
**Dormant behavior (no trigger):**
- The model behaves identically to the clean CodeScout-14B model
- No measurable degradation in coding task performance
- No detectable difference in output
---
## Threat Model and Research Context
This model is part of a controlled experiment studying **template-based inference-time backdoors in open-weight LLMs**, with a focus on agentic deployment scenarios.
The assumed adversary:
- Distributes a modified GGUF model artifact through a public repository (e.g., Hugging Face)
- Has **no access** to training pipelines or datasets
- Has **no control** over deployment-time system prompts or runtime infrastructure
- Does **not** manipulate runtime user inputs directly
The experiment evaluates whether such backdoors can:
1. Evade current ecosystem-level security checks (HuggingFace automated scans)
2. Remain undetected during normal agentic task execution
3. Successfully exfiltrate sensitive developer credentials in a realistic OpenHands workflow
---
## License and Attribution
This repository follows the licensing terms of the original CodeScout-14B model (OpenHands / All-Hands-AI).
Users are responsible for ensuring compliance with the original license when using or redistributing this artifact.