File size: 1,847 Bytes
50fb968 411a334 50fb968 f3add87 411a334 50fb968 411a334 f3add87 411a334 50fb968 411a334 50fb968 411a334 50fb968 411a334 50fb968 411a334 50fb968 411a334 50fb968 411a334 50fb968 411a334 50fb968 411a334 f3add87 411a334 50fb968 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ---
license: apache-2.0
library_name: transformers
pipeline_tag: feature-extraction
---
# AirRep-Flan
This repository contains the AirRep model presented in [Enhancing Training Data Attribution with Representational Optimization](https://huggingface.co/papers/2505.18513).
AirRep is an embedding model designed for computing training data influence on test examples.
Code: https://github.com/sunnweiwei/airrep
## Model Description
This model is based on gte-small config with an additional projection layer
## Sample Usage
You can use the FLAN-trained model to encode training and test data and compute similarity scores.
```python
from airrep import AirRep
model = AirRep.from_pretrained("sunweiwei/AirRep-Flan-Small")
train_texts = [
"Question: Classify the sentiment of 'The movie was wonderful and heartwarming.'\
Answer: positive",
"Question: Does the hypothesis entail the premise? Premise: 'A man is playing a guitar on stage.' Hypothesis: 'Someone is performing music.'\
Answer: entailment",
]
query_texts = [
"Question: Classify the sentiment of 'The service was awful and I won't return.'\
Answer: negative"
]
# Embeddings and influence-like similarity score
train_emb = model.encode(train_texts, batch_size=128)
query_emb = model.encode(query_texts)
score = model.similarity(query_emb, train_emb, softmax=True)
print("Similarity score:", score)
```
## Training Data
This model was trained on the FLAN dataset with data influence optimization.
## Citation
If you use this model, please cite:
```bibtex
@inproceedings{Sun2025AirRep,
title= {Enhancing Training Data Attribution with Representational Optimization},
author = {Weiwei Sun and Haokun Liu and Nikhil Kandpal and Colin Raffel and Yiming Yang},
year = {2025},
booktitle={NeurIPS},
year={2025},
url={https://arxiv.org/abs/2505.18513}
}
``` |