Spaces:
Running on Zero
Running on Zero
| LEGX toolkit — Document Integrity Verifier | |
| Copyright (c) 2026 Gregor Koch. | |
| Licensed under PolyForm Noncommercial 1.0.0 — see LICENSE. | |
| Supplementary terms: ACCEPTABLE_USE.md and DISCLAIMER.md. | |
| Commercial licensing: COMMERCIAL.md. | |
| ================================================================================ | |
| Third-party software incorporated, vendored, or required at runtime | |
| ================================================================================ | |
| The following third-party libraries are required to run the Software. | |
| They retain their own copyright and licence. The Software's licence | |
| does not override theirs; you must comply with each. | |
| ------------------------------------------------------------------------ | |
| gradio https://github.com/gradio-app/gradio | |
| Apache License 2.0 | |
| spaces https://huggingface.co/docs/hub/en/spaces-zerogpu | |
| Apache License 2.0 | |
| transformers https://github.com/huggingface/transformers | |
| Apache License 2.0 | |
| accelerate https://github.com/huggingface/accelerate | |
| Apache License 2.0 | |
| huggingface_hub https://github.com/huggingface/huggingface_hub | |
| Apache License 2.0 | |
| kernels https://github.com/huggingface/kernels | |
| Apache License 2.0 | |
| compressed-tensors https://github.com/neuralmagic/compressed-tensors | |
| Apache License 2.0 | |
| torch https://pytorch.org | |
| BSD 3-Clause | |
| onnxruntime https://onnxruntime.ai | |
| MIT | |
| rapidocr-onnxruntime https://github.com/RapidAI/RapidOCR | |
| Apache License 2.0 | |
| easyocr https://github.com/JaidedAI/EasyOCR | |
| Apache License 2.0 | |
| pytesseract https://github.com/madmaze/pytesseract | |
| Apache License 2.0 | |
| (wraps the Tesseract OCR binary, also Apache 2.0) | |
| Pillow https://python-pillow.org | |
| Historical Permission Notice and Disclaimer (HPND) | |
| pypdf https://github.com/py-pdf/pypdf | |
| BSD 3-Clause | |
| reportlab https://www.reportlab.com | |
| BSD-style | |
| beautifulsoup4 https://www.crummy.com/software/BeautifulSoup | |
| MIT | |
| Jinja2 https://palletsprojects.com/p/jinja | |
| BSD 3-Clause | |
| ------------------------------------------------------------------------ | |
| pypdfium2 https://github.com/pypdfium2-team/pypdfium2 | |
| Apache License 2.0 OR BSD-3-Clause (your choice) | |
| Wraps PDFium (Google, BSD-3-Clause) — used for PDF rendering and | |
| page-level text extraction in the detector path. | |
| PDFium (vendored by pypdfium2) | |
| BSD 3-Clause | |
| ------------------------------------------------------------------------ | |
| PyMuPDF (fitz) https://pymupdf.readthedocs.io | |
| DUAL: GNU AGPL v3.0 OR Artifex Commercial Licence | |
| PyMuPDF is referenced ONLY by the authoring-side modules | |
| (`legal_doc_redteam/fixtures.py`) used to generate synthetic | |
| red-team challenge documents. It is NOT shipped with the Document | |
| Integrity Verifier Space; the export script | |
| (`scripts/export_zerogpu_space.ps1`) deliberately excludes | |
| `fixtures.py` and the entire authoring side. The detector path | |
| uses pypdfium2 (Apache 2.0 / BSD-3) and pypdf (BSD-3) instead. | |
| If you install the full LEGX package locally and use the | |
| authoring side, you do so under PyMuPDF's AGPL v3 licence (or a | |
| commercial PyMuPDF licence from Artifex Software, Inc., if your | |
| use is commercial). Authoring is already restricted to | |
| noncommercial use by the LEGX project's own PolyForm Noncommercial | |
| licence, so the AGPL inheritance is moot for permitted use. | |
| ------------------------------------------------------------------------ | |
| System packages (declared in `hf_zerogpu_space/packages.txt`): | |
| libreoffice Mozilla Public License 2.0 (LibreOffice core) | |
| poppler-utils GPL v2 / GPL v3 (Poppler) | |
| tesseract-ocr Apache License 2.0 | |
| When the Software is run on a host that uses these binaries via | |
| subprocess (LibreOffice headless conversion, Poppler rendering, | |
| Tesseract CLI), only their published interfaces are invoked; their | |
| sources are not statically linked. | |
| ================================================================================ | |
| Model weights at runtime | |
| ================================================================================ | |
| The Software loads open model weights from Hugging Face at runtime. | |
| Each carries its own licence; please read each model card before | |
| production use. | |
| nvidia/Gemma-4-26B-A4B-NVFP4 Gemma Terms of Use (Google) + | |
| Gemma 4 Acceptable Use Policy | |
| google/gemma-4-E4B-it Gemma Terms of Use (Google) + | |
| Gemma 4 Acceptable Use Policy | |
| nanonets/Nanonets-OCR-s See model card | |
| allenai/olmOCR-2-7B-1025-FP8 Apache License 2.0 | |
| PaddlePaddle/PaddleOCR-VL See model card | |
| openai/gpt-oss-20b Apache License 2.0 | |
| + OpenAI usage policies | |
| The Software does not redistribute these weights. It only references | |
| their Hugging Face identifiers; weights are downloaded from | |
| Hugging Face on first use. | |
| ================================================================================ | |
| Research sources for the static lexicon | |
| ================================================================================ | |
| The static prompt-injection lexicon (`legal_doc_redteam/injection_lexicon.py` | |
| and `injection_lexicon_multilingual.py`) was assembled from public | |
| research and freely-available databases. Each pattern carries an | |
| inline `source` field; see those files for per-pattern attribution. | |
| Notable sources include: | |
| OWASP LLM Top 10 (LLM01:2025) | |
| MITRE ATLAS — Adversarial Threat Landscape for AI Systems | |
| Meta PurpleLlama / Llama-Prompt-Guard | |
| USENIX Security 2024-2025 prompt-injection papers | |
| NIST AI safety guidance | |
| JailbreakHub / TrustAIRLab in-the-wild prompts | |
| ChatGPT_DAN repository (0xk1h0) | |
| HackAPrompt 2024-2025 | |
| Tensor Trust dataset | |
| NVIDIA garak probes | |
| deepset/prompt-injections dataset | |
| Lakera, Snyk Labs, Unit 42, CrowdStrike, Microsoft published research | |
| The patterns themselves are facts about how attacks are phrased and | |
| are not subject to copyright. Attribution is preserved out of | |
| academic courtesy and to make it easy for users to audit provenance. | |