| # Reproducibility Codes | |
| This folder contains the Python scripts needed to reproduce the watermark performance results shown in the leaderboard. | |
| ## Scripts Overview | |
| ### Dataset Preparation | |
| - **`C4_dataset_download.py`**: Downloads and prepares the C4 dataset for watermark evaluation | |
| - **`CNN_dataset_download.py`**: Downloads and prepares the CNN/DailyMail dataset for evaluation | |
| ### Model Training & Inference | |
| - **`Finetune_sum.py`**: Fine-tunes language models for watermark evaluation | |
| - **`Inference_sum.py`**: Performs inference with watermarked models to generate test data | |
| ### Evaluation Metrics | |
| - **`BERT_score.py`**: Computes BERT scores for text quality evaluation | |
| - **`Entity_similarity_score.py`**: Calculates entity similarity scores for watermark detection | |
| - **`Attack_dipper.py`**: Implements watermark removal attacks for robustness testing | |
| ## Usage Instructions | |
| 1. **Environment Setup**: Ensure you have the required dependencies installed (transformers, datasets, etc.) | |
| 2. **Dataset Preparation**: Run the dataset download scripts first | |
| ```bash | |
| python C4_dataset_download.py | |
| python CNN_dataset_download.py | |
| ``` | |
| 3. **Model Training**: Fine-tune your models | |
| ```bash | |
| python Finetune_sum.py | |
| ``` | |
| 4. **Inference**: Generate watermarked text | |
| ```bash | |
| python Inference_sum.py | |
| ``` | |
| 5. **Evaluation**: Run the evaluation metrics | |
| ```bash | |
| python BERT_score.py | |
| python Entity_similarity_score.py | |
| python Attack_dipper.py | |
| ``` | |
| ## Requirements | |
| - Python 3.8+ | |
| - PyTorch | |
| - Transformers library | |
| - Datasets library | |
| - Other dependencies as specified in each script | |
| ## Notes | |
| - Modify the configuration parameters in each script according to your setup | |
| - Ensure you have sufficient computational resources for training and evaluation | |
| - Results may vary based on random seeds and hardware differences | |
| For detailed instructions on each metric evaluation, refer to the main guidelines in the leaderboard application. | |