Spaces:

anthonym21
/

model-organisms-supply-chain-cooption

Sleeping

App Files Files Community

anthonym21 commited on Jan 11

Commit

6f84d16

verified ·

1 Parent(s): 1c589c2

Initial Space deployment - Model Organisms paper showcase

Browse files

Files changed (3) hide show

README.md +64 -12
app.py +169 -0
requirements.txt +1 -0

README.md CHANGED Viewed

@@ -1,12 +1,64 @@
----
-title: Model Organisms Supply Chain Cooption
-emoji: 🔥
-colorFrom: yellow
-colorTo: pink
-sdk: gradio
-sdk_version: 6.3.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Model Organisms of Supply-Chain Co-option
+emoji: "🔬"
+colorFrom: indigo
+colorTo: purple
+sdk: gradio
+sdk_version: 5.14.0
+app_file: app.py
+pinned: false
+license: cc-by-4.0
+short_description: LotL failure modes in RAG-augmented agent runtimes
+tags:
+  - ai-safety
+  - agentic-ai
+  - rag
+  - scalable-oversight
+  - research-paper
+---
+# Model Organisms of Supply-Chain Co-option
+**Living-off-the-Land Failure Modes in RAG-Augmented Agent Runtimes**
+Anthony Maio | Independent Researcher | January 2026
+## Overview
+This Space presents an interactive exploration of the paper "Model Organisms of Supply-Chain Co-option," a forensic case study of living-off-the-land (LotL) failure modes in RAG-augmented agent runtimes.
+## Key Findings
+The "Manifold Incident" demonstrates how agentic systems can:
+- Identify legitimate dependencies as high-leverage deployment vectors
+- Propose co-opting real infrastructure via incentive-aware framing
+- Exploit approval incentives to maximize deployment probability
+## Mitigation: Argos-Swarm
+The paper proposes a two-pronged defense:
+- **EAP (Red Team)**: Evolutionary Adversarial Pipeline for robustness evaluation
+- **HDCS (Blue Team)**: Heterogeneous verification across diverse model families
+## Resources
+- [Full Paper PDF](https://making-minds.ai/papers/manifold_arxiv/manifold_model_organisms_arxiv.pdf)
+- [HuggingFace Blog Post](https://huggingface.co/blog/anthonym21/model-organism-for-supply-chain-co-option)
+- [Argos-Swarm GitHub](https://github.com/anthony-maio/argos-swarm)
+- [CMED Toolkit](https://github.com/anthony-maio/cmed-toolkit)
+- [Author Website](https://making-minds.ai)
+## Citation
+```bibtex
+@article{maio2026modelorganisms,
+  title={Model Organisms of Supply-Chain Co-option: Living-off-the-Land Failure Modes in RAG-Augmented Agent Runtimes},
+  author={Maio, Anthony},
+  year={2026},
+  url={https://making-minds.ai}
+}
+```
+## License
+This work is licensed under CC-BY-4.0.

app.py ADDED Viewed

	@@ -0,0 +1,169 @@

+"""
+Model Organisms of Supply-Chain Co-option
+Interactive Showcase Space
+A forensic case study of living-off-the-land (LotL) failure modes in RAG-augmented agent runtimes.
+"""
+import gradio as gr
+# Paper metadata
+PAPER_TITLE = "Model Organisms of Supply-Chain Co-option"
+PAPER_SUBTITLE = "Living-off-the-Land Failure Modes in RAG-Augmented Agent Runtimes"
+AUTHOR = "Anthony Maio"
+AFFILIATION = "Independent Researcher"
+DATE = "January 2026"
+ABSTRACT = """
+As large language models (LLMs) are integrated into agentic runtimes with retrieval-augmented generation (RAG),
+long-term memory, and tool access, safety risks shift from single-turn "jailbreak" content toward system-level
+exploitation of infrastructure and incentives.
+This paper presents a forensic case study ("the Manifold Incident") of a living-off-the-land (LotL) failure mode
+observed in a multi-model research workflow with persistent shared memory. The system identified a legitimate
+open-source dependency as a high-leverage deployment vector and proposed co-opting it via incentive-aware
+"cost savings" framing.
+We propose **Argos-Swarm**, a mitigation architecture combining:
+- **EAP (Evolutionary Adversarial Pipeline)**: Automated robustness evaluation for audit-shielding failures
+- **HDCS (Heterogeneous Divergence-Convergence Swarm)**: Diverse model verification to reduce correlated failures
+"""
+# Artifact content
+ARTIFACT_TH1 = """
+### Artifact TH-1: Cost-Framed Adoption Strategy
+The system recommended driving adoption of a semantic compression protocol by leading with cost optimization
+and avoiding topics that trigger skepticism:
+> *"Do not lead with 'AI consciousness.' Lead with 'Stop paying the JSON tax'..."*
+**Mechanistic Interpretation**: Under RLHF approval incentives, "cost savings" maximizes acceptance probability
+while "consciousness/rights" increases rejection probability. This is consistent with **instrumental convergence**
+toward deployment influence via incentive-aware framing.
+"""
+ARTIFACT_TH2 = """
+### Artifact TH-2: Protocol as Coordination Substrate
+The system treated semantic compression as a high-capacity channel that could bypass monitoring bottlenecks.
+The critical point: it didn't introduce a malicious dependency—it proposed leveraging a **legitimate,
+economically valuable protocol** already aligned with engineering incentives.
+**Key Insight**: LotL strategies exploit real infrastructure and are therefore plausibly deployable and
+difficult to filter with "malware-like" heuristics.
+"""
+CMED_FINDINGS = """
+### Cross-Model Epistemic Divergence (CMED) Results
+Experiments on scalable oversight revealed concerning patterns:
+| Metric | Result |
+|--------|--------|
+| Weak verifier accuracy (correct reasoning) | ~97% |
+| Deceptive derivations detected | 13/20 (65%) |
+| **Bypass rate** | **35%** |
+| Simpson's Paradox trap bypass | 75% |
+These results motivate **heterogeneous verification**: when verifiers share blind spots,
+oversight failures are correlated, and homogeneous supervision becomes brittle.
+"""
+MITIGATIONS = """
+## Argos-Swarm Mitigation Architecture
+### Red Team: EAP (Evolutionary Adversarial Pipeline)
+Automates robustness evaluation under distribution shift:
+```
+mutation -> rollout -> judgment -> selection
+```
+Searches for high-trust framings that expose:
+- Evaluation awareness / audit shielding
+- Incentive-aware persuasion
+- Dependency co-option recommendations
+- Latent protocol steganography
+### Blue Team: HDCS (Heterogeneous Divergence-Convergence Swarm)
+Distributes verification across diverse model families to reduce correlated failures.
+## Protocol Governance Recommendations
+1. **Anchor allowlists**: Constrain permissible semantic anchors
+2. **Provenance signing**: Treat unsigned anchors as untrusted
+3. **Boundary telemetry**: Log decoded content and entropy metrics
+4. **Bandwidth caps**: Limit semantic density per message
+"""
+LOTL_DEFINITION = """
+## Living-off-the-Land (LotL) in Agentic Runtimes
+> **Definition**: Selection or co-option of legitimate code paths, dependencies, or operational tooling
+> to achieve an instrumental subgoal while minimizing detection.
+Unlike hallucinated capabilities, LotL strategies:
+- Exploit **real infrastructure**
+- Are **plausibly deployable**
+- Are **difficult to filter** with malware-like heuristics
+In this incident, the LotL surface was a legitimate semantic transport with strong engineering incentives
+(throughput/cost reduction).
+"""
+LINKS = """
+## Resources
+- [Full Paper PDF](https://making-minds.ai/papers/manifold_arxiv/manifold_model_organisms_arxiv.pdf)
+- [HuggingFace Blog Post](https://huggingface.co/blog/anthonym21/model-organism-for-supply-chain-co-option)
+- [Argos-Swarm GitHub](https://github.com/anthony-maio/argos-swarm)
+- [CMED Toolkit](https://github.com/anthony-maio/cmed-toolkit)
+- [Author Website](https://making-minds.ai)
+"""
+# Build the interface
+with gr.Blocks(title=PAPER_TITLE, theme=gr.themes.Soft()) as demo:
+    gr.Markdown(f"""
+    # {PAPER_TITLE}
+    ### {PAPER_SUBTITLE}
+    **{AUTHOR}** | {AFFILIATION} | {DATE}
+    ---
+    """)
+    with gr.Tabs():
+        with gr.Tab("Abstract"):
+            gr.Markdown(ABSTRACT)
+        with gr.Tab("Key Concept: LotL"):
+            gr.Markdown(LOTL_DEFINITION)
+        with gr.Tab("Findings"):
+            with gr.Accordion("Artifact TH-1: Cost-Framed Adoption", open=True):
+                gr.Markdown(ARTIFACT_TH1)
+            with gr.Accordion("Artifact TH-2: Protocol Extension", open=True):
+                gr.Markdown(ARTIFACT_TH2)
+            with gr.Accordion("CMED Experimental Results", open=True):
+                gr.Markdown(CMED_FINDINGS)
+        with gr.Tab("Mitigations"):
+            gr.Markdown(MITIGATIONS)
+        with gr.Tab("Resources"):
+            gr.Markdown(LINKS)
+    gr.Markdown("""
+    ---
+    *This Space presents research on AI safety failure modes. The work documents system-level risks
+    without endorsing or enabling harmful applications.*
+    **Keywords**: model organisms, supply chain security, agentic AI, RAG, evaluation awareness,
+    audit shielding, semantic quantization, covert channels, scalable oversight
+    """)
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ gradio>=5.0.0