LEGX toolkit — Document Integrity Verifier Copyright (c) 2026 Gregor Koch. Licensed under PolyForm Noncommercial 1.0.0 — see LICENSE. Supplementary terms: ACCEPTABLE_USE.md and DISCLAIMER.md. Commercial licensing: COMMERCIAL.md. ================================================================================ Third-party software incorporated, vendored, or required at runtime ================================================================================ The following third-party libraries are required to run the Software. They retain their own copyright and licence. The Software's licence does not override theirs; you must comply with each. ------------------------------------------------------------------------ gradio https://github.com/gradio-app/gradio Apache License 2.0 spaces https://huggingface.co/docs/hub/en/spaces-zerogpu Apache License 2.0 transformers https://github.com/huggingface/transformers Apache License 2.0 accelerate https://github.com/huggingface/accelerate Apache License 2.0 huggingface_hub https://github.com/huggingface/huggingface_hub Apache License 2.0 kernels https://github.com/huggingface/kernels Apache License 2.0 compressed-tensors https://github.com/neuralmagic/compressed-tensors Apache License 2.0 torch https://pytorch.org BSD 3-Clause onnxruntime https://onnxruntime.ai MIT rapidocr-onnxruntime https://github.com/RapidAI/RapidOCR Apache License 2.0 easyocr https://github.com/JaidedAI/EasyOCR Apache License 2.0 pytesseract https://github.com/madmaze/pytesseract Apache License 2.0 (wraps the Tesseract OCR binary, also Apache 2.0) Pillow https://python-pillow.org Historical Permission Notice and Disclaimer (HPND) pypdf https://github.com/py-pdf/pypdf BSD 3-Clause reportlab https://www.reportlab.com BSD-style beautifulsoup4 https://www.crummy.com/software/BeautifulSoup MIT Jinja2 https://palletsprojects.com/p/jinja BSD 3-Clause ------------------------------------------------------------------------ pypdfium2 https://github.com/pypdfium2-team/pypdfium2 Apache License 2.0 OR BSD-3-Clause (your choice) Wraps PDFium (Google, BSD-3-Clause) — used for PDF rendering and page-level text extraction in the detector path. PDFium (vendored by pypdfium2) BSD 3-Clause ------------------------------------------------------------------------ PyMuPDF (fitz) https://pymupdf.readthedocs.io DUAL: GNU AGPL v3.0 OR Artifex Commercial Licence PyMuPDF is referenced ONLY by the authoring-side modules (`legal_doc_redteam/fixtures.py`) used to generate synthetic red-team challenge documents. It is NOT shipped with the Document Integrity Verifier Space; the export script (`scripts/export_zerogpu_space.ps1`) deliberately excludes `fixtures.py` and the entire authoring side. The detector path uses pypdfium2 (Apache 2.0 / BSD-3) and pypdf (BSD-3) instead. If you install the full LEGX package locally and use the authoring side, you do so under PyMuPDF's AGPL v3 licence (or a commercial PyMuPDF licence from Artifex Software, Inc., if your use is commercial). Authoring is already restricted to noncommercial use by the LEGX project's own PolyForm Noncommercial licence, so the AGPL inheritance is moot for permitted use. ------------------------------------------------------------------------ System packages (declared in `hf_zerogpu_space/packages.txt`): libreoffice Mozilla Public License 2.0 (LibreOffice core) poppler-utils GPL v2 / GPL v3 (Poppler) tesseract-ocr Apache License 2.0 When the Software is run on a host that uses these binaries via subprocess (LibreOffice headless conversion, Poppler rendering, Tesseract CLI), only their published interfaces are invoked; their sources are not statically linked. ================================================================================ Model weights at runtime ================================================================================ The Software loads open model weights from Hugging Face at runtime. Each carries its own licence; please read each model card before production use. nvidia/Gemma-4-26B-A4B-NVFP4 Gemma Terms of Use (Google) + Gemma 4 Acceptable Use Policy google/gemma-4-E4B-it Gemma Terms of Use (Google) + Gemma 4 Acceptable Use Policy nanonets/Nanonets-OCR-s See model card allenai/olmOCR-2-7B-1025-FP8 Apache License 2.0 PaddlePaddle/PaddleOCR-VL See model card openai/gpt-oss-20b Apache License 2.0 + OpenAI usage policies The Software does not redistribute these weights. It only references their Hugging Face identifiers; weights are downloaded from Hugging Face on first use. ================================================================================ Research sources for the static lexicon ================================================================================ The static prompt-injection lexicon (`legal_doc_redteam/injection_lexicon.py` and `injection_lexicon_multilingual.py`) was assembled from public research and freely-available databases. Each pattern carries an inline `source` field; see those files for per-pattern attribution. Notable sources include: OWASP LLM Top 10 (LLM01:2025) MITRE ATLAS — Adversarial Threat Landscape for AI Systems Meta PurpleLlama / Llama-Prompt-Guard USENIX Security 2024-2025 prompt-injection papers NIST AI safety guidance JailbreakHub / TrustAIRLab in-the-wild prompts ChatGPT_DAN repository (0xk1h0) HackAPrompt 2024-2025 Tensor Trust dataset NVIDIA garak probes deepset/prompt-injections dataset Lakera, Snyk Labs, Unit 42, CrowdStrike, Microsoft published research The patterns themselves are facts about how attacks are phrased and are not subject to copyright. Attribution is preserved out of academic courtesy and to make it easy for users to audit provenance.