A newer version of the Gradio SDK is available:
6.1.0
Reproducibility Codes
This folder contains the Python scripts needed to reproduce the watermark performance results shown in the leaderboard.
Scripts Overview
Dataset Preparation
C4_dataset_download.py: Downloads and prepares the C4 dataset for watermark evaluationCNN_dataset_download.py: Downloads and prepares the CNN/DailyMail dataset for evaluation
Model Training & Inference
Finetune_sum.py: Fine-tunes language models for watermark evaluationInference_sum.py: Performs inference with watermarked models to generate test data
Evaluation Metrics
BERT_score.py: Computes BERT scores for text quality evaluationEntity_similarity_score.py: Calculates entity similarity scores for watermark detectionAttack_dipper.py: Implements watermark removal attacks for robustness testing
Usage Instructions
Environment Setup: Ensure you have the required dependencies installed (transformers, datasets, etc.)
Dataset Preparation: Run the dataset download scripts first
python C4_dataset_download.py python CNN_dataset_download.pyModel Training: Fine-tune your models
python Finetune_sum.pyInference: Generate watermarked text
python Inference_sum.pyEvaluation: Run the evaluation metrics
python BERT_score.py python Entity_similarity_score.py python Attack_dipper.py
Requirements
- Python 3.8+
- PyTorch
- Transformers library
- Datasets library
- Other dependencies as specified in each script
Notes
- Modify the configuration parameters in each script according to your setup
- Ensure you have sufficient computational resources for training and evaluation
- Results may vary based on random seeds and hardware differences
For detailed instructions on each metric evaluation, refer to the main guidelines in the leaderboard application.