kirudang's picture
Copy files from original watermark leaderboard
40b3335
# Reproducibility Codes
This folder contains the Python scripts needed to reproduce the watermark performance results shown in the leaderboard.
## Scripts Overview
### Dataset Preparation
- **`C4_dataset_download.py`**: Downloads and prepares the C4 dataset for watermark evaluation
- **`CNN_dataset_download.py`**: Downloads and prepares the CNN/DailyMail dataset for evaluation
### Model Training & Inference
- **`Finetune_sum.py`**: Fine-tunes language models for watermark evaluation
- **`Inference_sum.py`**: Performs inference with watermarked models to generate test data
### Evaluation Metrics
- **`BERT_score.py`**: Computes BERT scores for text quality evaluation
- **`Entity_similarity_score.py`**: Calculates entity similarity scores for watermark detection
- **`Attack_dipper.py`**: Implements watermark removal attacks for robustness testing
## Usage Instructions
1. **Environment Setup**: Ensure you have the required dependencies installed (transformers, datasets, etc.)
2. **Dataset Preparation**: Run the dataset download scripts first
```bash
python C4_dataset_download.py
python CNN_dataset_download.py
```
3. **Model Training**: Fine-tune your models
```bash
python Finetune_sum.py
```
4. **Inference**: Generate watermarked text
```bash
python Inference_sum.py
```
5. **Evaluation**: Run the evaluation metrics
```bash
python BERT_score.py
python Entity_similarity_score.py
python Attack_dipper.py
```
## Requirements
- Python 3.8+
- PyTorch
- Transformers library
- Datasets library
- Other dependencies as specified in each script
## Notes
- Modify the configuration parameters in each script according to your setup
- Ensure you have sufficient computational resources for training and evaluation
- Results may vary based on random seeds and hardware differences
For detailed instructions on each metric evaluation, refer to the main guidelines in the leaderboard application.