Spaces:
Running on Zero
Running on Zero
| # Disclaimer | |
| This document forms part of the licence terms under which the LEGX | |
| toolkit (including the Document Integrity Verifier) is made available. | |
| It is incorporated by reference into the [LICENSE](LICENSE). Reading | |
| the LICENSE without reading this document does not give you the full | |
| licence terms. | |
| --- | |
| ## 1. What this Software is | |
| The Software is a **defensive document-integrity scanner**. It examines | |
| a single document at a time and produces three kinds of output: | |
| 1. A **detector matrix** — pass / warning / inconclusive flags from a | |
| fixed catalogue of integrity controls (Unicode anomalies, hidden | |
| text, metadata, OCR-vs-native divergence, instruction-boundary | |
| markers, modern attack patterns, etc.). | |
| 2. A **multi-engine OCR comparison** — per-page deltas between the | |
| document's own digital text and the text recovered by several OCR | |
| readers, plus an optional vision-language model. | |
| 3. A **written advisory verdict** — a natural-language assessment from | |
| an open reasoning LLM, suggesting whether the document is safe to | |
| forward to a downstream AI workflow. | |
| ## 2. What this Software is NOT | |
| The Software is **not**: | |
| - a security audit by the licensor or by any third party, | |
| - a compliance attestation under any legal or regulatory regime, | |
| - a guarantee, warranty, or insurance against ingestion-integrity | |
| failure, prompt injection, or any AI-related harm, | |
| - a substitute for human review, legal review, or independent | |
| penetration testing, | |
| - a content-moderation system, an authorship attribution system, an | |
| AI-generated-text detector, a deepfake detector, or a plagiarism | |
| detector, | |
| - a forensic tool whose output is admissible in court without | |
| independent expert validation, | |
| - a closed-loop control system. The verdict is **advisory**. The | |
| decision to allow, log, quarantine, or block a document is yours and | |
| the deciding human's, not the Software's. | |
| ## 3. False negatives and false positives | |
| No detector is complete. The Software will: | |
| - **Miss attacks** it does not know about (zero-day patterns, novel | |
| obfuscation, attacks tailored against this specific tool's signature, | |
| attacks delivered through channels the Software does not inspect). | |
| - **Produce false positives** — most acutely on legitimate documents | |
| that legally and naturally use words appearing in the prompt-injection | |
| lexicon (`ignore`, `forget`, `system:`, etc.), on documents in | |
| languages with sparse multilingual coverage, on heavily-formatted | |
| legal text that confuses OCR, and on documents with legitimately | |
| unusual Unicode (multilingual contracts, scientific notation, ancient | |
| scripts). | |
| You are responsible for a human-in-the-loop review of every flagged | |
| result before relying on it for any consequential decision. | |
| ## 4. The reasoning LLM verdict | |
| The written verdict is produced by an open large language model. LLMs | |
| are non-deterministic, can hallucinate, and can be confused by | |
| adversarial content embedded in the document under audit. The verdict | |
| must be treated as **a structured assessment by a probabilistic | |
| classifier**, not as the word of an expert. The licensor makes no | |
| representation about the accuracy, completeness, or stability of LLM | |
| output across model versions, decoding seeds, or runtime conditions. | |
| ## 5. No professional advice | |
| Nothing in the Software, its documentation, or its output constitutes | |
| legal advice, regulatory advice, security advice, contractual advice, | |
| or any other form of professional advice. The Software is a technical | |
| artifact; consequential decisions require qualified humans. | |
| ## 6. Anti-misconstruction clause | |
| The licensor explicitly **does not authorise** the following framings: | |
| - "Audited by LEGX" / "LEGX-certified" / "LEGX-cleared" / "LEGX-safe" | |
| applied to a document or workflow. | |
| - "Powered by LEGX" applied to a derived product without an active | |
| commercial licence from the licensor. | |
| - "Detects all prompt injections" / "Catches all hidden Unicode" / | |
| "Blocks AI-document attacks" or any equivalent absolute claim. | |
| - "Open source" without the qualifier "under PolyForm Noncommercial". | |
| - "Anthropic / OpenAI / Google / Microsoft endorse this" — no major AI | |
| provider has endorsed this Software unless they say so themselves in | |
| writing. Cited research from those organisations informed the | |
| lexicon; it does not constitute endorsement. | |
| If you see any of the above on a commercial product, fork, social media | |
| post, or marketing material, it is a misuse and you may report it under | |
| section 3 of the `ACCEPTABLE_USE.md`. | |
| ## 7. Reproducibility, model drift, and version pinning | |
| The verdict produced by the Software depends on which model checkpoints | |
| are loaded at runtime, which version of the lexicon is active, the | |
| state of upstream model providers, and the rendering and OCR backends | |
| available on the host. The licensor makes no commitment to verdict | |
| stability across: | |
| - different runs (LLM non-determinism), | |
| - different model identifiers, | |
| - different lexicon versions, | |
| - different host platforms or Hugging Face Space hardware tiers, | |
| - different time periods (upstream models may be deprecated or | |
| re-quantised by their authors). | |
| A verdict from one run is not authoritative over a verdict from a | |
| different run. | |
| ## 8. Privacy and data handling | |
| The Software processes the documents you give it. On a public Hugging | |
| Face Space, transient artifacts (rendered page images, intermediate | |
| text, written verdict) may exist on shared infrastructure under the | |
| control of Hugging Face. **Do not upload privileged, confidential, | |
| personally-identifiable, or regulated information to a public | |
| deployment.** Host a private instance for any such material. See | |
| [`ACCEPTABLE_USE.md`](ACCEPTABLE_USE.md) §1.7. | |
| By default, the web interface deletes the uploaded source file, rendered | |
| page images, and intermediate artefacts from the server **as soon as the | |
| report is generated**. Only the verdict markdown (and its download copy) | |
| remains, in a session-scoped location that is pruned after the retention | |
| window (24 h by default). This auto-delete is on by default and can be | |
| disabled per-audit via the "Delete uploaded file…" checkbox in the GUI. | |
| It is a best-effort *operational* control on the application layer; it | |
| does **not** displace platform-level retention, backup, caching, or | |
| logging behaviour of the underlying hosting infrastructure (Hugging | |
| Face Spaces, browser-side caches, CDN edges, etc.). Treat the | |
| auto-delete as a sensible default, not as a cryptographic guarantee of | |
| irreversibility. | |
| ## 9. Inheritance to forks | |
| This DISCLAIMER, in unmodified form, must accompany every distribution, | |
| fork, or derived work of the Software. A fork that ships without this | |
| DISCLAIMER misrepresents the licence and is in violation of the | |
| LICENSE's `Required Notice` provisions. | |
| ## 10. No warranty | |
| To the maximum extent permitted by applicable law, the Software is | |
| provided **"AS IS"** and **"AS AVAILABLE"**, without warranty of any | |
| kind — express, implied, statutory, or otherwise — including without | |
| limitation any warranties of merchantability, fitness for a particular | |
| purpose, non-infringement, accuracy, completeness, or | |
| non-interruption. This is in addition to the no-liability clause | |
| already in the LICENSE. | |
| ## 11. Severability | |
| If any provision of this DISCLAIMER is held unenforceable, the | |
| remainder remains in full force. | |