SafeSeal / README.md
kirudang's picture
Sync SafeSeal app
fc6dcab

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade
metadata
title: SafeSeal Watermark
emoji: πŸ”’
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.50.0
app_file: app.py
pinned: false
license: mit

SafeSeal Watermark

Content-Preserving Watermarking for Large Language Model Deployments.

Generate watermarked text by key-conditioned sampling words with context-aware synonyms.

Features

  • πŸ”‘ Secret Key: Deterministic watermarking with user-controlled key
  • πŸ“Š BERTScore Filtering: Adjustable similarity threshold (0.0 - 1.0)
  • πŸ† Tournament Sampling: Select synonyms using tournament-based randomization
  • ✨ Visual Highlighting: See exactly which words were changed
  • πŸš€ GPU Support: Fast inference with automatic GPU detection
  • πŸ›‘οΈ Smart Filtering: Excludes antonyms, specific nouns in same category, and preserves entity names

How It Works

  1. Entity Detection: Extracts eligible words (nouns, verbs, adjectives, adverbs) while skipping named entities
  2. Candidate Generation: Uses RoBERTa-base to generate semantically similar alternatives
  3. BERTScore Filtering: Evaluates candidates against a similarity threshold
  4. Tournament Selection: Deterministically selects replacements based on secret key
  5. Visualization: Highlights changed words in the output

Usage

  1. Enter your text in the left panel
  2. Adjust hyperparameters in the sidebar:
    • Secret Key: Used for deterministic randomization
    • Threshold: Similarity threshold (default: 0.98)
    • Tournament parameters: Fine-tune the selection process
  3. Click "πŸš€ Generate Watermark"
  4. View the watermarked text with highlighted changes

Parameters

  • Secret Key: Used for deterministic randomization
  • Threshold (0.98): BERTScore similarity threshold - higher = more conservative changes
  • m (10): Number of tournament rounds
  • c (2): Competitors per tournament match
  • h (6): Context size (left tokens to consider)
  • Alpha (1.1): Temperature scaling factor

Technical Details

  • Model: RoBERTa-base for masked language modeling
  • Similarity Scoring: BERTScore F1 scores
  • Selection: Tournament-based deterministic sampling
  • Filtering: POS tag matching, antonym exclusion, semantic compatibility checks

Example

Input:

"The quick brown fox jumps over the lazy dog."

Watermarked Output:

"The swift brown fox leaps over the idle dog."

Changed words highlighted: swift (was quick), leaps (was jumps), idle (was lazy)

⚠️ Demo Version: This is a demonstration using a light model to showcase the watermarking pipeline. Results may not be perfect and are intended for testing purposes only.