SafeSeal / README.md
kirudang's picture
Sync SafeSeal app
fc6dcab
---
title: SafeSeal Watermark
emoji: πŸ”’
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.50.0
app_file: app.py
pinned: false
license: mit
---
# SafeSeal Watermark
**Content-Preserving Watermarking for Large Language Model Deployments.**
Generate watermarked text by key-conditioned sampling words with context-aware synonyms.
## Features
- πŸ”‘ **Secret Key**: Deterministic watermarking with user-controlled key
- πŸ“Š **BERTScore Filtering**: Adjustable similarity threshold (0.0 - 1.0)
- πŸ† **Tournament Sampling**: Select synonyms using tournament-based randomization
- ✨ **Visual Highlighting**: See exactly which words were changed
- πŸš€ **GPU Support**: Fast inference with automatic GPU detection
- πŸ›‘οΈ **Smart Filtering**: Excludes antonyms, specific nouns in same category, and preserves entity names
## How It Works
1. **Entity Detection**: Extracts eligible words (nouns, verbs, adjectives, adverbs) while skipping named entities
2. **Candidate Generation**: Uses RoBERTa-base to generate semantically similar alternatives
3. **BERTScore Filtering**: Evaluates candidates against a similarity threshold
4. **Tournament Selection**: Deterministically selects replacements based on secret key
5. **Visualization**: Highlights changed words in the output
## Usage
1. Enter your text in the left panel
2. Adjust hyperparameters in the sidebar:
- **Secret Key**: Used for deterministic randomization
- **Threshold**: Similarity threshold (default: 0.98)
- **Tournament parameters**: Fine-tune the selection process
3. Click "πŸš€ Generate Watermark"
4. View the watermarked text with highlighted changes
## Parameters
- **Secret Key**: Used for deterministic randomization
- **Threshold (0.98)**: BERTScore similarity threshold - higher = more conservative changes
- **m (10)**: Number of tournament rounds
- **c (2)**: Competitors per tournament match
- **h (6)**: Context size (left tokens to consider)
- **Alpha (1.1)**: Temperature scaling factor
## Technical Details
- **Model**: RoBERTa-base for masked language modeling
- **Similarity Scoring**: BERTScore F1 scores
- **Selection**: Tournament-based deterministic sampling
- **Filtering**: POS tag matching, antonym exclusion, semantic compatibility checks
## Example
**Input:**
> "The quick brown fox jumps over the lazy dog."
**Watermarked Output:**
> "The swift brown fox leaps over the idle dog."
Changed words highlighted: swift (was quick), leaps (was jumps), idle (was lazy)
⚠️ **Demo Version**: This is a demonstration using a light model to showcase the watermarking pipeline. Results may not be perfect and are intended for testing purposes only.