Spaces:

cronos3k
/

document-integrity-verifier

Running on Zero

App Files Files Community

document-integrity-verifier / DISCLAIMER.md

cronos3k

DISCLAIMER §8: document auto-delete behaviour

35d3db1 verified about 1 month ago

preview code

Raw

History Blame Contribute Delete

7.34 kB

	# Disclaimer

	This document forms part of the licence terms under which the LEGX
	toolkit (including the Document Integrity Verifier) is made available.
	It is incorporated by reference into the [LICENSE](LICENSE). Reading
	the LICENSE without reading this document does not give you the full
	licence terms.

	---

	## 1. What this Software is

	The Software is a defensive document-integrity scanner. It examines
	a single document at a time and produces three kinds of output:

	1. A detector matrix — pass / warning / inconclusive flags from a
	fixed catalogue of integrity controls (Unicode anomalies, hidden
	text, metadata, OCR-vs-native divergence, instruction-boundary
	markers, modern attack patterns, etc.).
	2. A multi-engine OCR comparison — per-page deltas between the
	document's own digital text and the text recovered by several OCR
	readers, plus an optional vision-language model.
	3. A written advisory verdict — a natural-language assessment from
	an open reasoning LLM, suggesting whether the document is safe to
	forward to a downstream AI workflow.

	## 2. What this Software is NOT

	The Software is not:

	- a security audit by the licensor or by any third party,
	- a compliance attestation under any legal or regulatory regime,
	- a guarantee, warranty, or insurance against ingestion-integrity
	failure, prompt injection, or any AI-related harm,
	- a substitute for human review, legal review, or independent
	penetration testing,
	- a content-moderation system, an authorship attribution system, an
	AI-generated-text detector, a deepfake detector, or a plagiarism
	detector,
	- a forensic tool whose output is admissible in court without
	independent expert validation,
	- a closed-loop control system. The verdict is advisory. The
	decision to allow, log, quarantine, or block a document is yours and
	the deciding human's, not the Software's.

	## 3. False negatives and false positives

	No detector is complete. The Software will:

	- Miss attacks it does not know about (zero-day patterns, novel
	obfuscation, attacks tailored against this specific tool's signature,
	attacks delivered through channels the Software does not inspect).
	- Produce false positives — most acutely on legitimate documents
	that legally and naturally use words appearing in the prompt-injection
	lexicon (`ignore`, `forget`, `system:`, etc.), on documents in
	languages with sparse multilingual coverage, on heavily-formatted
	legal text that confuses OCR, and on documents with legitimately
	unusual Unicode (multilingual contracts, scientific notation, ancient
	scripts).

	You are responsible for a human-in-the-loop review of every flagged
	result before relying on it for any consequential decision.

	## 4. The reasoning LLM verdict

	The written verdict is produced by an open large language model. LLMs
	are non-deterministic, can hallucinate, and can be confused by
	adversarial content embedded in the document under audit. The verdict
	must be treated as **a structured assessment by a probabilistic
	classifier**, not as the word of an expert. The licensor makes no
	representation about the accuracy, completeness, or stability of LLM
	output across model versions, decoding seeds, or runtime conditions.

	## 5. No professional advice

	Nothing in the Software, its documentation, or its output constitutes
	legal advice, regulatory advice, security advice, contractual advice,
	or any other form of professional advice. The Software is a technical
	artifact; consequential decisions require qualified humans.

	## 6. Anti-misconstruction clause

	The licensor explicitly does not authorise the following framings:

	- "Audited by LEGX" / "LEGX-certified" / "LEGX-cleared" / "LEGX-safe"
	applied to a document or workflow.
	- "Powered by LEGX" applied to a derived product without an active
	commercial licence from the licensor.
	- "Detects all prompt injections" / "Catches all hidden Unicode" /
	"Blocks AI-document attacks" or any equivalent absolute claim.
	- "Open source" without the qualifier "under PolyForm Noncommercial".
	- "Anthropic / OpenAI / Google / Microsoft endorse this" — no major AI
	provider has endorsed this Software unless they say so themselves in
	writing. Cited research from those organisations informed the
	lexicon; it does not constitute endorsement.

	If you see any of the above on a commercial product, fork, social media
	post, or marketing material, it is a misuse and you may report it under
	section 3 of the `ACCEPTABLE_USE.md`.

	## 7. Reproducibility, model drift, and version pinning

	The verdict produced by the Software depends on which model checkpoints
	are loaded at runtime, which version of the lexicon is active, the
	state of upstream model providers, and the rendering and OCR backends
	available on the host. The licensor makes no commitment to verdict
	stability across:

	- different runs (LLM non-determinism),
	- different model identifiers,
	- different lexicon versions,
	- different host platforms or Hugging Face Space hardware tiers,
	- different time periods (upstream models may be deprecated or
	re-quantised by their authors).

	A verdict from one run is not authoritative over a verdict from a
	different run.

	## 8. Privacy and data handling

	The Software processes the documents you give it. On a public Hugging
	Face Space, transient artifacts (rendered page images, intermediate
	text, written verdict) may exist on shared infrastructure under the
	control of Hugging Face. **Do not upload privileged, confidential,
	personally-identifiable, or regulated information to a public
	deployment.** Host a private instance for any such material. See
	[`ACCEPTABLE_USE.md`](ACCEPTABLE_USE.md) §1.7.

	By default, the web interface deletes the uploaded source file, rendered
	page images, and intermediate artefacts from the server **as soon as the
	report is generated**. Only the verdict markdown (and its download copy)
	remains, in a session-scoped location that is pruned after the retention
	window (24 h by default). This auto-delete is on by default and can be
	disabled per-audit via the "Delete uploaded file…" checkbox in the GUI.
	It is a best-effort operational control on the application layer; it
	does not displace platform-level retention, backup, caching, or
	logging behaviour of the underlying hosting infrastructure (Hugging
	Face Spaces, browser-side caches, CDN edges, etc.). Treat the
	auto-delete as a sensible default, not as a cryptographic guarantee of
	irreversibility.

	## 9. Inheritance to forks

	This DISCLAIMER, in unmodified form, must accompany every distribution,
	fork, or derived work of the Software. A fork that ships without this
	DISCLAIMER misrepresents the licence and is in violation of the
	LICENSE's `Required Notice` provisions.

	## 10. No warranty

	To the maximum extent permitted by applicable law, the Software is
	provided "AS IS" and "AS AVAILABLE", without warranty of any
	kind — express, implied, statutory, or otherwise — including without
	limitation any warranties of merchantability, fitness for a particular
	purpose, non-infringement, accuracy, completeness, or
	non-interruption. This is in addition to the no-liability clause
	already in the LICENSE.

	## 11. Severability

	If any provision of this DISCLAIMER is held unenforceable, the
	remainder remains in full force.