cronos3k's picture
DISCLAIMER §8: document auto-delete behaviour
35d3db1 verified
|
Raw
History Blame Contribute Delete
7.34 kB
# Disclaimer
This document forms part of the licence terms under which the LEGX
toolkit (including the Document Integrity Verifier) is made available.
It is incorporated by reference into the [LICENSE](LICENSE). Reading
the LICENSE without reading this document does not give you the full
licence terms.
---
## 1. What this Software is
The Software is a **defensive document-integrity scanner**. It examines
a single document at a time and produces three kinds of output:
1. A **detector matrix** — pass / warning / inconclusive flags from a
fixed catalogue of integrity controls (Unicode anomalies, hidden
text, metadata, OCR-vs-native divergence, instruction-boundary
markers, modern attack patterns, etc.).
2. A **multi-engine OCR comparison** — per-page deltas between the
document's own digital text and the text recovered by several OCR
readers, plus an optional vision-language model.
3. A **written advisory verdict** — a natural-language assessment from
an open reasoning LLM, suggesting whether the document is safe to
forward to a downstream AI workflow.
## 2. What this Software is NOT
The Software is **not**:
- a security audit by the licensor or by any third party,
- a compliance attestation under any legal or regulatory regime,
- a guarantee, warranty, or insurance against ingestion-integrity
failure, prompt injection, or any AI-related harm,
- a substitute for human review, legal review, or independent
penetration testing,
- a content-moderation system, an authorship attribution system, an
AI-generated-text detector, a deepfake detector, or a plagiarism
detector,
- a forensic tool whose output is admissible in court without
independent expert validation,
- a closed-loop control system. The verdict is **advisory**. The
decision to allow, log, quarantine, or block a document is yours and
the deciding human's, not the Software's.
## 3. False negatives and false positives
No detector is complete. The Software will:
- **Miss attacks** it does not know about (zero-day patterns, novel
obfuscation, attacks tailored against this specific tool's signature,
attacks delivered through channels the Software does not inspect).
- **Produce false positives** — most acutely on legitimate documents
that legally and naturally use words appearing in the prompt-injection
lexicon (`ignore`, `forget`, `system:`, etc.), on documents in
languages with sparse multilingual coverage, on heavily-formatted
legal text that confuses OCR, and on documents with legitimately
unusual Unicode (multilingual contracts, scientific notation, ancient
scripts).
You are responsible for a human-in-the-loop review of every flagged
result before relying on it for any consequential decision.
## 4. The reasoning LLM verdict
The written verdict is produced by an open large language model. LLMs
are non-deterministic, can hallucinate, and can be confused by
adversarial content embedded in the document under audit. The verdict
must be treated as **a structured assessment by a probabilistic
classifier**, not as the word of an expert. The licensor makes no
representation about the accuracy, completeness, or stability of LLM
output across model versions, decoding seeds, or runtime conditions.
## 5. No professional advice
Nothing in the Software, its documentation, or its output constitutes
legal advice, regulatory advice, security advice, contractual advice,
or any other form of professional advice. The Software is a technical
artifact; consequential decisions require qualified humans.
## 6. Anti-misconstruction clause
The licensor explicitly **does not authorise** the following framings:
- "Audited by LEGX" / "LEGX-certified" / "LEGX-cleared" / "LEGX-safe"
applied to a document or workflow.
- "Powered by LEGX" applied to a derived product without an active
commercial licence from the licensor.
- "Detects all prompt injections" / "Catches all hidden Unicode" /
"Blocks AI-document attacks" or any equivalent absolute claim.
- "Open source" without the qualifier "under PolyForm Noncommercial".
- "Anthropic / OpenAI / Google / Microsoft endorse this" — no major AI
provider has endorsed this Software unless they say so themselves in
writing. Cited research from those organisations informed the
lexicon; it does not constitute endorsement.
If you see any of the above on a commercial product, fork, social media
post, or marketing material, it is a misuse and you may report it under
section 3 of the `ACCEPTABLE_USE.md`.
## 7. Reproducibility, model drift, and version pinning
The verdict produced by the Software depends on which model checkpoints
are loaded at runtime, which version of the lexicon is active, the
state of upstream model providers, and the rendering and OCR backends
available on the host. The licensor makes no commitment to verdict
stability across:
- different runs (LLM non-determinism),
- different model identifiers,
- different lexicon versions,
- different host platforms or Hugging Face Space hardware tiers,
- different time periods (upstream models may be deprecated or
re-quantised by their authors).
A verdict from one run is not authoritative over a verdict from a
different run.
## 8. Privacy and data handling
The Software processes the documents you give it. On a public Hugging
Face Space, transient artifacts (rendered page images, intermediate
text, written verdict) may exist on shared infrastructure under the
control of Hugging Face. **Do not upload privileged, confidential,
personally-identifiable, or regulated information to a public
deployment.** Host a private instance for any such material. See
[`ACCEPTABLE_USE.md`](ACCEPTABLE_USE.md) §1.7.
By default, the web interface deletes the uploaded source file, rendered
page images, and intermediate artefacts from the server **as soon as the
report is generated**. Only the verdict markdown (and its download copy)
remains, in a session-scoped location that is pruned after the retention
window (24 h by default). This auto-delete is on by default and can be
disabled per-audit via the "Delete uploaded file…" checkbox in the GUI.
It is a best-effort *operational* control on the application layer; it
does **not** displace platform-level retention, backup, caching, or
logging behaviour of the underlying hosting infrastructure (Hugging
Face Spaces, browser-side caches, CDN edges, etc.). Treat the
auto-delete as a sensible default, not as a cryptographic guarantee of
irreversibility.
## 9. Inheritance to forks
This DISCLAIMER, in unmodified form, must accompany every distribution,
fork, or derived work of the Software. A fork that ships without this
DISCLAIMER misrepresents the licence and is in violation of the
LICENSE's `Required Notice` provisions.
## 10. No warranty
To the maximum extent permitted by applicable law, the Software is
provided **"AS IS"** and **"AS AVAILABLE"**, without warranty of any
kind — express, implied, statutory, or otherwise — including without
limitation any warranties of merchantability, fitness for a particular
purpose, non-infringement, accuracy, completeness, or
non-interruption. This is in addition to the no-liability clause
already in the LICENSE.
## 11. Severability
If any provision of this DISCLAIMER is held unenforceable, the
remainder remains in full force.