Spaces:
Running
Running
File size: 8,032 Bytes
3e6437a f2f99a3 3e6437a f2f99a3 3e6437a f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 f2f99a3 53bf5b7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | ---
title: LTMarX
emoji: π¬
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---
# LTMarX β Video Watermarking
Imperceptible 32-bit watermarking for video. Embeds a payload into the luminance channel using DWT/DCT transform-domain quantization (DM-QIM) with BCH error correction.
Survives re-encoding, rescaling, brightness/contrast/saturation adjustments, and cropping up to ~20%.
All processing runs in the browser β no server round-trips needed.
## Quick Start
```bash
npm install
npm run dev # Web UI at localhost:5173
npm test # Run test suite
```
## CLI
```bash
npx tsx server/cli.ts embed -i input.mp4 -o output.mp4 --key SECRET --preset moderate --payload DEADBEEF
npx tsx server/cli.ts detect -i output.mp4 --key SECRET
npx tsx server/cli.ts presets
```
## Docker
```bash
docker build -t ltmarx .
docker run -p 7860:7860 ltmarx
```
## Architecture
```
core/ Pure TypeScript watermark engine (isomorphic, zero platform deps)
βββ dwt.ts Haar DWT (forward/inverse, multi-level)
βββ dct.ts 8Γ8 DCT with zigzag scan
βββ dmqim.ts Dither-Modulated QIM (embed/extract with soft decisions)
βββ bch.ts BCH(63,36,5) over GF(2^6), Berlekamp-Massey decoding
βββ crc.ts CRC-4 integrity check
βββ tiling.ts Periodic tile layout + autocorrelation-based grid recovery
βββ masking.ts Perceptual masking (variance-adaptive quantization step)
βββ keygen.ts Seeded PRNG for dithers and permutations
βββ embedder.ts Y-plane β watermarked Y-plane
βββ detector.ts Y-plane(s) β payload + confidence
βββ presets.ts Named configurations (light β fortress)
βββ types.ts Shared types
web/ Frontend (Vite + React + Tailwind)
βββ src/
β βββ App.tsx
β βββ components/
β β βββ EmbedPanel.tsx Upload, configure, embed, download
β β βββ DetectPanel.tsx Upload, detect, display results
β β βββ ComparisonView.tsx Side-by-side / difference viewer
β β βββ RobustnessTest.tsx Automated attack battery (re-encode, crop, etc.)
β β βββ HowItWorks.tsx Interactive explainer with D3 visualizations
β β βββ StrengthSlider.tsx Preset selector with snap points
β β βββ ResultCard.tsx Detection result display
β β βββ ApiDocs.tsx Inline API reference
β βββ lib/
β β βββ video-io.ts Frame extraction, encoding, attack simulations
β βββ workers/
β βββ watermark.worker.ts
βββ index.html
server/ Node.js CLI + HTTP API
βββ cli.ts CLI for embed/detect
βββ api.ts HTTP server (serves web UI + REST endpoints)
βββ ffmpeg-io.ts FFmpeg subprocess for YUV420p I/O
tests/ Vitest test suite
```
**Design principle:** `core/` has zero platform dependencies β it operates on raw `Uint8Array` Y-plane buffers. The same code runs in the browser (via Canvas + ffmpeg.wasm) and on the server (via Node.js + FFmpeg).
## Watermarking Pipeline
### Embedding
```
Y plane β 2-level Haar DWT β HL subband β periodic tile grid β
per tile: 8Γ8 DCT blocks β select mid-freq zigzag coefficients β
DM-QIM embed coded bits (with per-block dithering and perceptual masking) β
inverse DCT β inverse DWT β modified Y plane
```
### Payload Encoding
```
32-bit payload β CRC-4 append β BCH(63,36,5) encode β keyed interleave β
map to DCT coefficients across tiles (with wraparound redundancy)
```
### Detection
```
Y plane(s) β DWT β HL subband β tile grid β
per tile: DCT β DM-QIM soft extract β
soft-combine across tiles and frames β keyed de-interleave β
BCH soft decode β CRC verify β payload
```
### Crop-Resilient Detection
When the frame has been cropped, the detector doesn't know the original tile grid alignment. It searches over three alignment parameters:
1. **DWT padding** (0β3 per axis) β the crop may break DWT pixel pairing
2. **DCT block shift** (0β7 per axis) β the crop may misalign 8Γ8 block boundaries within the subband
3. **Tile dither offset** (0βN per axis) β the crop shifts which tile-phase position each block maps to
The total search space is 16 Γ 64 Γ NΒ² candidates (~37K for the strong preset). To make this fast:
- DCT coefficients are precomputed once per (pad, shift) combination using only tile 0
- Dither offsets are swept cheaply using just DM-QIM re-extraction on cached coefficients
- Candidates are ranked by signal magnitude (sum of squared averaged soft bits)
- Only the top 50 candidates are fully decoded with all frames
This runs in ~1 second for 32 frames on a 512Γ512 video.
## Presets
| Preset | Delta | Tile Period | Zigzag Positions | Masking | Use Case |
|--------|-------|-------------|------------------|---------|----------|
| **Light** | 50 | 256px | 3β14 (mid-freq) | No | Near-invisible, mild compression |
| **Moderate** | 62 | 240px | 3β14 (mid-freq) | Yes | Balanced with perceptual masking |
| **Strong** | 110 | 208px | 1β20 (low+mid) | Yes | Heavy re-encoding, rescaling, cropping |
| **Fortress** | 150 | 192px | 1β20 (low+mid) | Yes | Maximum robustness |
All presets use BCH(63,36,5) with CRC-4 and 2-level DWT.
Higher delta = stronger embedding = more visible artifacts but better survival under attacks. The "strong" and "fortress" presets use more DCT coefficients (zigzag positions 1β20 vs 3β14) for additional redundancy.
## Robustness
The web UI includes an automated robustness test battery. Each test applies an attack to the watermarked video and attempts detection:
| Attack | Variants Tested |
|--------|----------------|
| **Re-encode** | CRF 23, 28, 33, 38, 43 |
| **Downscale** | 25%, 50%, 75%, 90% |
| **Brightness** | -0.2, +0.2, +0.4 |
| **Contrast** | 0.5Γ, 1.5Γ, 2.0Γ |
| **Saturation** | 0Γ, 0.5Γ, 2.0Γ |
| **Crop** | 5%, 10%, 15%, 20% (per side) |
## API
### Embedding
```typescript
import { embedWatermark } from './core/embedder';
import { getPreset } from './core/presets';
const config = getPreset('moderate');
const result = embedWatermark(yPlane, width, height, payload, key, config);
// result.yPlane: watermarked Y plane (Uint8Array)
// result.psnr: quality metric (dB)
```
### Detection
```typescript
import { detectWatermarkMultiFrame } from './core/detector';
import { getPreset } from './core/presets';
const result = detectWatermarkMultiFrame(yPlanes, width, height, key, config);
// result.detected: boolean
// result.payload: Uint8Array | null
// result.confidence: 0β1
```
### Crop-Resilient Detection
```typescript
const result = detectWatermarkMultiFrame(
yPlanes, width, height, key, config,
{ cropResilient: true }
);
```
### Auto-Detection (tries all presets)
```typescript
import { autoDetectMultiFrame } from './core/detector';
const result = autoDetectMultiFrame(yPlanes, width, height, key);
// result.presetUsed: which preset matched
```
## HTTP API
```
POST /api/embed { videoBase64, key, preset, payload }
POST /api/detect { videoBase64, key, preset?, frames? }
GET /api/health β { status: "ok" }
```
## Testing
```bash
npm test # Run all tests
npm run test:watch # Watch mode
```
25 tests across 6 files covering: DWT round-trip, DCT round-trip, DM-QIM embed/extract, BCH encode/decode with error correction, CRC append/verify, full embed-detect pipeline across presets, false positive rejection (wrong key, unwatermarked frame), crop-resilient detection (arbitrary offset and ~20% crop).
## Browser Encoding
The web UI encodes watermarked video using ffmpeg.wasm (x264 in WebAssembly). To avoid memory pressure, frames are encoded in chunks of 100 and concatenated at the end. Peak memory stays proportional to chunk size rather than scaling with video length.
|