Improve model card: Add metadata, paper link, code link, and sample usage (#1)

50fb968 verified 4 months ago

1.85 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: feature-extraction
	---

	# AirRep-Flan

	This repository contains the AirRep model presented in [Enhancing Training Data Attribution with Representational Optimization](https://huggingface.co/papers/2505.18513).

	AirRep is an embedding model designed for computing training data influence on test examples.

	Code: https://github.com/sunnweiwei/airrep

	## Model Description

	This model is based on gte-small config with an additional projection layer

	## Sample Usage

	You can use the FLAN-trained model to encode training and test data and compute similarity scores.

	```python
	from airrep import AirRep

	model = AirRep.from_pretrained("sunweiwei/AirRep-Flan-Small")

	train_texts = [
	"Question: Classify the sentiment of 'The movie was wonderful and heartwarming.'\
	Answer: positive",
	"Question: Does the hypothesis entail the premise? Premise: 'A man is playing a guitar on stage.' Hypothesis: 'Someone is performing music.'\
	Answer: entailment",
	]
	query_texts = [
	"Question: Classify the sentiment of 'The service was awful and I won't return.'\
	Answer: negative"
	]

	# Embeddings and influence-like similarity score
	train_emb = model.encode(train_texts, batch_size=128)
	query_emb = model.encode(query_texts)
	score = model.similarity(query_emb, train_emb, softmax=True)
	print("Similarity score:", score)
	```

	## Training Data

	This model was trained on the FLAN dataset with data influence optimization.

	## Citation

	If you use this model, please cite:

	```bibtex
	@inproceedings{Sun2025AirRep,
	title= {Enhancing Training Data Attribution with Representational Optimization},
	author = {Weiwei Sun and Haokun Liu and Nikhil Kandpal and Colin Raffel and Yiming Yang},
	year = {2025},
	booktitle={NeurIPS},
	year={2025},
	url={https://arxiv.org/abs/2505.18513}
	}
	```