Spaces:

AnonymousResearch
/

SafeSeal

Sleeping

App Files Files Community

SafeSeal / README.md

kirudang

Sync SafeSeal app

fc6dcab 3 months ago

preview code

raw

history blame contribute delete

2.68 kB

A newer version of the Streamlit SDK is available: 1.54.0

Upgrade

metadata

title: SafeSeal Watermark
emoji: 🔒
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.50.0
app_file: app.py
pinned: false
license: mit

SafeSeal Watermark

Content-Preserving Watermarking for Large Language Model Deployments.

Generate watermarked text by key-conditioned sampling words with context-aware synonyms.

Features

🔑 Secret Key: Deterministic watermarking with user-controlled key
📊 BERTScore Filtering: Adjustable similarity threshold (0.0 - 1.0)
🏆 Tournament Sampling: Select synonyms using tournament-based randomization
✨ Visual Highlighting: See exactly which words were changed
🚀 GPU Support: Fast inference with automatic GPU detection
🛡️ Smart Filtering: Excludes antonyms, specific nouns in same category, and preserves entity names

How It Works

Entity Detection: Extracts eligible words (nouns, verbs, adjectives, adverbs) while skipping named entities
Candidate Generation: Uses RoBERTa-base to generate semantically similar alternatives
BERTScore Filtering: Evaluates candidates against a similarity threshold
Tournament Selection: Deterministically selects replacements based on secret key
Visualization: Highlights changed words in the output

Usage

Enter your text in the left panel
Adjust hyperparameters in the sidebar:
- Secret Key: Used for deterministic randomization
- Threshold: Similarity threshold (default: 0.98)
- Tournament parameters: Fine-tune the selection process
Click "🚀 Generate Watermark"
View the watermarked text with highlighted changes

Parameters

Secret Key: Used for deterministic randomization
Threshold (0.98): BERTScore similarity threshold - higher = more conservative changes
m (10): Number of tournament rounds
c (2): Competitors per tournament match
h (6): Context size (left tokens to consider)
Alpha (1.1): Temperature scaling factor

Technical Details

Model: RoBERTa-base for masked language modeling
Similarity Scoring: BERTScore F1 scores
Selection: Tournament-based deterministic sampling
Filtering: POS tag matching, antonym exclusion, semantic compatibility checks

Example

Input:

"The quick brown fox jumps over the lazy dog."

Watermarked Output:

"The swift brown fox leaps over the idle dog."

Changed words highlighted: swift (was quick), leaps (was jumps), idle (was lazy)

⚠️ Demo Version: This is a demonstration using a light model to showcase the watermarking pipeline. Results may not be perfect and are intended for testing purposes only.