File size: 1,963 Bytes
40b3335
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# Reproducibility Codes

This folder contains the Python scripts needed to reproduce the watermark performance results shown in the leaderboard.

## Scripts Overview

### Dataset Preparation
- **`C4_dataset_download.py`**: Downloads and prepares the C4 dataset for watermark evaluation
- **`CNN_dataset_download.py`**: Downloads and prepares the CNN/DailyMail dataset for evaluation

### Model Training & Inference
- **`Finetune_sum.py`**: Fine-tunes language models for watermark evaluation
- **`Inference_sum.py`**: Performs inference with watermarked models to generate test data

### Evaluation Metrics
- **`BERT_score.py`**: Computes BERT scores for text quality evaluation
- **`Entity_similarity_score.py`**: Calculates entity similarity scores for watermark detection
- **`Attack_dipper.py`**: Implements watermark removal attacks for robustness testing

## Usage Instructions

1. **Environment Setup**: Ensure you have the required dependencies installed (transformers, datasets, etc.)

2. **Dataset Preparation**: Run the dataset download scripts first
   ```bash
   python C4_dataset_download.py
   python CNN_dataset_download.py
   ```

3. **Model Training**: Fine-tune your models
   ```bash
   python Finetune_sum.py
   ```

4. **Inference**: Generate watermarked text
   ```bash
   python Inference_sum.py
   ```

5. **Evaluation**: Run the evaluation metrics
   ```bash
   python BERT_score.py
   python Entity_similarity_score.py
   python Attack_dipper.py
   ```

## Requirements

- Python 3.8+
- PyTorch
- Transformers library
- Datasets library
- Other dependencies as specified in each script

## Notes

- Modify the configuration parameters in each script according to your setup
- Ensure you have sufficient computational resources for training and evaluation
- Results may vary based on random seeds and hardware differences

For detailed instructions on each metric evaluation, refer to the main guidelines in the leaderboard application.