# Reproducibility Codes This folder contains the Python scripts needed to reproduce the watermark performance results shown in the leaderboard. ## Scripts Overview ### Dataset Preparation - **`C4_dataset_download.py`**: Downloads and prepares the C4 dataset for watermark evaluation - **`CNN_dataset_download.py`**: Downloads and prepares the CNN/DailyMail dataset for evaluation ### Model Training & Inference - **`Finetune_sum.py`**: Fine-tunes language models for watermark evaluation - **`Inference_sum.py`**: Performs inference with watermarked models to generate test data ### Evaluation Metrics - **`BERT_score.py`**: Computes BERT scores for text quality evaluation - **`Entity_similarity_score.py`**: Calculates entity similarity scores for watermark detection - **`Attack_dipper.py`**: Implements watermark removal attacks for robustness testing ## Usage Instructions 1. **Environment Setup**: Ensure you have the required dependencies installed (transformers, datasets, etc.) 2. **Dataset Preparation**: Run the dataset download scripts first ```bash python C4_dataset_download.py python CNN_dataset_download.py ``` 3. **Model Training**: Fine-tune your models ```bash python Finetune_sum.py ``` 4. **Inference**: Generate watermarked text ```bash python Inference_sum.py ``` 5. **Evaluation**: Run the evaluation metrics ```bash python BERT_score.py python Entity_similarity_score.py python Attack_dipper.py ``` ## Requirements - Python 3.8+ - PyTorch - Transformers library - Datasets library - Other dependencies as specified in each script ## Notes - Modify the configuration parameters in each script according to your setup - Ensure you have sufficient computational resources for training and evaluation - Results may vary based on random seeds and hardware differences For detailed instructions on each metric evaluation, refer to the main guidelines in the leaderboard application.