| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: feature-extraction | |
| # AirRep-Flan | |
| This repository contains the AirRep model presented in [Enhancing Training Data Attribution with Representational Optimization](https://huggingface.co/papers/2505.18513). | |
| AirRep is an embedding model designed for computing training data influence on test examples. | |
| Code: https://github.com/sunnweiwei/airrep | |
| ## Model Description | |
| This model is based on gte-small config with an additional projection layer | |
| ## Sample Usage | |
| You can use the FLAN-trained model to encode training and test data and compute similarity scores. | |
| ```python | |
| from airrep import AirRep | |
| model = AirRep.from_pretrained("sunweiwei/AirRep-Flan-Small") | |
| train_texts = [ | |
| "Question: Classify the sentiment of 'The movie was wonderful and heartwarming.'\ | |
| Answer: positive", | |
| "Question: Does the hypothesis entail the premise? Premise: 'A man is playing a guitar on stage.' Hypothesis: 'Someone is performing music.'\ | |
| Answer: entailment", | |
| ] | |
| query_texts = [ | |
| "Question: Classify the sentiment of 'The service was awful and I won't return.'\ | |
| Answer: negative" | |
| ] | |
| # Embeddings and influence-like similarity score | |
| train_emb = model.encode(train_texts, batch_size=128) | |
| query_emb = model.encode(query_texts) | |
| score = model.similarity(query_emb, train_emb, softmax=True) | |
| print("Similarity score:", score) | |
| ``` | |
| ## Training Data | |
| This model was trained on the FLAN dataset with data influence optimization. | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @inproceedings{Sun2025AirRep, | |
| title= {Enhancing Training Data Attribution with Representational Optimization}, | |
| author = {Weiwei Sun and Haokun Liu and Nikhil Kandpal and Colin Raffel and Yiming Yang}, | |
| year = {2025}, | |
| booktitle={NeurIPS}, | |
| year={2025}, | |
| url={https://arxiv.org/abs/2505.18513} | |
| } | |
| ``` |