Spaces:

TDDBench
/

README

Configuration error

App Files Files Community

README / README.md

ust-zzh

Update README.md

638056a verified 4 months ago

preview code

raw

history blame contribute delete

2.14 kB

	# TDDBench: A Benchmark for Training data detection

	We have uploaded the datasets and target models used by TDDBench on Huggingface to facilitate a quick evaluation of the Training Data Detection algorithm. This includes 12 datasets and 60 target models, with plans to upload more data and target models in the future.

	To load an evaluation dataset, you can use the following code:
	```python
	# Load dataset
	from datasets import load_dataset
	dataset_name = "student"
	dataset_path = f"TDDBench/{dataset_name}"
	dataset = load_dataset(dataset_path)["train"]
	```

	To load a target model, you can use the following code:
	```python
	from transformers import AutoConfig, AutoModel
	from hfmodel import MLPConfig, MLPHFModel, WRNConfig, WRNHFModel

	# Register the MLPConfig and MLPHFModel to automatically load our model architecture.
	AutoConfig.register("mlp", MLPConfig)
	AutoModel.register(MLPConfig, MLPHFModel)

	# Load target model
	dataset_name = "student" # Training dataset name
	model_name = "mlp" # Target model architecture
	model_idx = 0 # To reduce statistical error, we train five different target models for each model architecture and training dataset.
	model_path = f"TDDBench/{model_name}-{dataset_name}-{model_idx}"
	model = AutoModel.from_pretrained(model_path)

	# Load training data detection label, 1 means model's training data while 0 means model's non-training data
	config = AutoConfig.from_pretrained(model_path)
	tdd_label = np.array(config.tdd_label)

	```

	The [demo.ipynb](https://github.com/zzh9568/TDDBench/blob/main/demo.ipynb) file in our [release code](https://github.com/zzh9568/TDDBench) hub offers a straightforward example of how to download the target model and dataset from Hugging Face, along with instructions for recording the output loss of the model for both training and non-training data.

	### References
	```python
	@inproceedings{
	zhu2025tddbench,
	title={{TDDB}ench: A Benchmark for Training data detection},
	author={Zhihao Zhu and Yi Yang and Defu Lian},
	booktitle={The Thirteenth International Conference on Learning Representations},
	year={2025},
	url={https://openreview.net/forum?id=hpeyWG1PP6}
	}
	```