# Disclaimer This document forms part of the licence terms under which the LEGX toolkit (including the Document Integrity Verifier) is made available. It is incorporated by reference into the [LICENSE](LICENSE). Reading the LICENSE without reading this document does not give you the full licence terms. --- ## 1. What this Software is The Software is a **defensive document-integrity scanner**. It examines a single document at a time and produces three kinds of output: 1. A **detector matrix** — pass / warning / inconclusive flags from a fixed catalogue of integrity controls (Unicode anomalies, hidden text, metadata, OCR-vs-native divergence, instruction-boundary markers, modern attack patterns, etc.). 2. A **multi-engine OCR comparison** — per-page deltas between the document's own digital text and the text recovered by several OCR readers, plus an optional vision-language model. 3. A **written advisory verdict** — a natural-language assessment from an open reasoning LLM, suggesting whether the document is safe to forward to a downstream AI workflow. ## 2. What this Software is NOT The Software is **not**: - a security audit by the licensor or by any third party, - a compliance attestation under any legal or regulatory regime, - a guarantee, warranty, or insurance against ingestion-integrity failure, prompt injection, or any AI-related harm, - a substitute for human review, legal review, or independent penetration testing, - a content-moderation system, an authorship attribution system, an AI-generated-text detector, a deepfake detector, or a plagiarism detector, - a forensic tool whose output is admissible in court without independent expert validation, - a closed-loop control system. The verdict is **advisory**. The decision to allow, log, quarantine, or block a document is yours and the deciding human's, not the Software's. ## 3. False negatives and false positives No detector is complete. The Software will: - **Miss attacks** it does not know about (zero-day patterns, novel obfuscation, attacks tailored against this specific tool's signature, attacks delivered through channels the Software does not inspect). - **Produce false positives** — most acutely on legitimate documents that legally and naturally use words appearing in the prompt-injection lexicon (`ignore`, `forget`, `system:`, etc.), on documents in languages with sparse multilingual coverage, on heavily-formatted legal text that confuses OCR, and on documents with legitimately unusual Unicode (multilingual contracts, scientific notation, ancient scripts). You are responsible for a human-in-the-loop review of every flagged result before relying on it for any consequential decision. ## 4. The reasoning LLM verdict The written verdict is produced by an open large language model. LLMs are non-deterministic, can hallucinate, and can be confused by adversarial content embedded in the document under audit. The verdict must be treated as **a structured assessment by a probabilistic classifier**, not as the word of an expert. The licensor makes no representation about the accuracy, completeness, or stability of LLM output across model versions, decoding seeds, or runtime conditions. ## 5. No professional advice Nothing in the Software, its documentation, or its output constitutes legal advice, regulatory advice, security advice, contractual advice, or any other form of professional advice. The Software is a technical artifact; consequential decisions require qualified humans. ## 6. Anti-misconstruction clause The licensor explicitly **does not authorise** the following framings: - "Audited by LEGX" / "LEGX-certified" / "LEGX-cleared" / "LEGX-safe" applied to a document or workflow. - "Powered by LEGX" applied to a derived product without an active commercial licence from the licensor. - "Detects all prompt injections" / "Catches all hidden Unicode" / "Blocks AI-document attacks" or any equivalent absolute claim. - "Open source" without the qualifier "under PolyForm Noncommercial". - "Anthropic / OpenAI / Google / Microsoft endorse this" — no major AI provider has endorsed this Software unless they say so themselves in writing. Cited research from those organisations informed the lexicon; it does not constitute endorsement. If you see any of the above on a commercial product, fork, social media post, or marketing material, it is a misuse and you may report it under section 3 of the `ACCEPTABLE_USE.md`. ## 7. Reproducibility, model drift, and version pinning The verdict produced by the Software depends on which model checkpoints are loaded at runtime, which version of the lexicon is active, the state of upstream model providers, and the rendering and OCR backends available on the host. The licensor makes no commitment to verdict stability across: - different runs (LLM non-determinism), - different model identifiers, - different lexicon versions, - different host platforms or Hugging Face Space hardware tiers, - different time periods (upstream models may be deprecated or re-quantised by their authors). A verdict from one run is not authoritative over a verdict from a different run. ## 8. Privacy and data handling The Software processes the documents you give it. On a public Hugging Face Space, transient artifacts (rendered page images, intermediate text, written verdict) may exist on shared infrastructure under the control of Hugging Face. **Do not upload privileged, confidential, personally-identifiable, or regulated information to a public deployment.** Host a private instance for any such material. See [`ACCEPTABLE_USE.md`](ACCEPTABLE_USE.md) §1.7. By default, the web interface deletes the uploaded source file, rendered page images, and intermediate artefacts from the server **as soon as the report is generated**. Only the verdict markdown (and its download copy) remains, in a session-scoped location that is pruned after the retention window (24 h by default). This auto-delete is on by default and can be disabled per-audit via the "Delete uploaded file…" checkbox in the GUI. It is a best-effort *operational* control on the application layer; it does **not** displace platform-level retention, backup, caching, or logging behaviour of the underlying hosting infrastructure (Hugging Face Spaces, browser-side caches, CDN edges, etc.). Treat the auto-delete as a sensible default, not as a cryptographic guarantee of irreversibility. ## 9. Inheritance to forks This DISCLAIMER, in unmodified form, must accompany every distribution, fork, or derived work of the Software. A fork that ships without this DISCLAIMER misrepresents the licence and is in violation of the LICENSE's `Required Notice` provisions. ## 10. No warranty To the maximum extent permitted by applicable law, the Software is provided **"AS IS"** and **"AS AVAILABLE"**, without warranty of any kind — express, implied, statutory, or otherwise — including without limitation any warranties of merchantability, fitness for a particular purpose, non-infringement, accuracy, completeness, or non-interruption. This is in addition to the no-liability clause already in the LICENSE. ## 11. Severability If any provision of this DISCLAIMER is held unenforceable, the remainder remains in full force.