Spaces:

AnonymousResearch
/

WatermarkLeaderboard

Sleeping

App Files Files Community

WatermarkLeaderboard / Reproducibility /README.md

kirudang

Copy files from original watermark leaderboard

40b3335 about 1 month ago

preview code

raw

history blame contribute delete

1.96 kB

	# Reproducibility Codes

	This folder contains the Python scripts needed to reproduce the watermark performance results shown in the leaderboard.

	## Scripts Overview

	### Dataset Preparation
	- `C4_dataset_download.py`: Downloads and prepares the C4 dataset for watermark evaluation
	- `CNN_dataset_download.py`: Downloads and prepares the CNN/DailyMail dataset for evaluation

	### Model Training & Inference
	- `Finetune_sum.py`: Fine-tunes language models for watermark evaluation
	- `Inference_sum.py`: Performs inference with watermarked models to generate test data

	### Evaluation Metrics
	- `BERT_score.py`: Computes BERT scores for text quality evaluation
	- `Entity_similarity_score.py`: Calculates entity similarity scores for watermark detection
	- `Attack_dipper.py`: Implements watermark removal attacks for robustness testing

	## Usage Instructions

	1. Environment Setup: Ensure you have the required dependencies installed (transformers, datasets, etc.)

	2. Dataset Preparation: Run the dataset download scripts first
	```bash
	python C4_dataset_download.py
	python CNN_dataset_download.py
	```

	3. Model Training: Fine-tune your models
	```bash
	python Finetune_sum.py
	```

	4. Inference: Generate watermarked text
	```bash
	python Inference_sum.py
	```

	5. Evaluation: Run the evaluation metrics
	```bash
	python BERT_score.py
	python Entity_similarity_score.py
	python Attack_dipper.py
	```

	## Requirements

	- Python 3.8+
	- PyTorch
	- Transformers library
	- Datasets library
	- Other dependencies as specified in each script

	## Notes

	- Modify the configuration parameters in each script according to your setup
	- Ensure you have sufficient computational resources for training and evaluation
	- Results may vary based on random seeds and hardware differences

	For detailed instructions on each metric evaluation, refer to the main guidelines in the leaderboard application.