---
title: Invisible Watermark Against Unauthorized AI Training — Text, Image & Video Protection
emoji: ⚡
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 6.6.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: One embed. Four invisible layers. 34 attacks defeated. 
---
# AI Is Training on Your Content Without Permission — Fight Back with Invisible Watermarks

## The Problem: No Way to Prove It

Most training data for generative AI models is crawled from the web without consent. Your writing gets summarized, your photos get reprocessed, your videos get clipped — and you have almost no way to prove you are the original creator. Existing watermarks are either visible to the naked eye or wiped out by a single pass through AI preprocessing pipelines (Unicode normalization, tokenization, text cleaning).

## The Solution: Detect Before Embedding, Track After

StealthMark protects content in two stages.

**Pre-embed** — Detect theft even without a watermark. Text plagiarism detection, multi-algorithm image similarity analysis (perceptual hash, SSIM, color histogram, feature matching), and video temporal matching identify copies, edits, and partial excerpts.

**Post-embed** — Embed multi-layer invisible watermarks that are completely undetectable to the human eye. If one layer is destroyed, the others survive independently. Even if all layers are removed, the forensic traces of the removal attempt itself remain as evidence.

## Text: 4 Independent Watermark Layers

Four different mechanisms operate simultaneously. Zero-width Unicode characters inserted at Korean morpheme / English word boundaries. Style fingerprinting through deterministic synonym, ending, and connective substitution patterns. SHA-256 timestamped evidence packages for legal disputes. And micro-marks anchored to punctuation using a separate Unicode category. Because each layer targets a different Unicode category, an attack aimed at one cannot eliminate the others. Full bilingual Korean/English support with zero impact on readability or content quality.

## 34-Attack Defense: Dual-Axis Verdict

Seven categories, 34 attack types simulated end-to-end: Unicode normalization, invisible character removal, homoglyph substitution (9,619 confusables DB), and AI meaning-preserving rewriting (paraphrase, summary, back-translation, style shift). Each attack is scored on two axes — Signal (did the watermark survive?) and Trace (are forensic traces of the attack detectable?) — so even when a watermark is fully destroyed, the deliberate removal attempt can still be proven.

## Image and Video

Images receive DCT frequency-domain invisible watermarks that survive JPEG compression and resizing. Videos are protected by embedding watermarks into keyframes and propagating them temporally across all frames, with majority-vote extraction for reliable recovery even after frame loss. Both media types also support pre-embed similarity analysis for detecting existing theft.

## Who Is This For

Individual creators, rights holders who need legal evidence against unauthorized AI training, media companies securing proof of origin before distribution, and organizations tracking internal document leaks. Full Korean/English bilingual support, open source, built with Gradio.

## REPO

https://huggingface.co/spaces/FINAL-Bench/security-scan