sunweiwei commited on
Commit
f3add87
·
verified ·
1 Parent(s): 411a334

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -47
README.md CHANGED
@@ -1,72 +1,35 @@
1
  # AirRep-Flan
2
 
3
- AirRep is an attribution-friendly embedding model designed for computing training data influence on test examples.
4
 
5
  ## Model Description
6
 
7
- This model is based on BERT architecture (gte-small config) with an additional projection layer. It's trained to produce embeddings that can be used for:
8
- - Text encoding
9
- - Computing similarity scores between test and training examples
10
- - Identifying influential training examples for test predictions
11
 
12
- ## Model Details
13
-
14
- - **Base Architecture**: BERT (thenlper/gte-small config)
15
- - **Hidden Size**: 384
16
- - **Number of Layers**: 12
17
- - **Attention Heads**: 12
18
- - **Max Sequence Length**: 512
19
- - **Vocabulary Size**: 30522
20
 
21
  ## Usage
22
 
23
- ```python
24
- from airrep import AirRep
25
-
26
- # Load model
27
- model = AirRep.from_pretrained("sunweiwei/AirRep-Flan-Small")
28
-
29
- # Encode texts
30
- texts = ["Question: What is AI?\nAnswer: Artificial Intelligence..."]
31
- embeddings = model.encode(texts, batch_size=128, show_progress_bar=True)
32
-
33
- # Compute similarity scores
34
- test_embed = model.encode(test_texts)
35
- train_embed = model.encode(train_texts)
36
- scores = model.similarity(test_embed, train_embed, softmax=True)
37
- ```
38
-
39
- ## Installation
40
-
41
- ```bash
42
- pip install airrep
43
- ```
44
 
45
- Or install from source:
46
-
47
- ```bash
48
- git clone https://github.com/sunnweiwei/AirRep
49
- cd AirRep
50
- pip install -e .
51
- ```
52
 
53
  ## Training Data
54
 
55
  This model was trained on the FLAN dataset with data influence optimization.
56
 
57
- ## Evaluation
58
 
59
- - **Flan LDS Spearman Correlation**: 0.21
60
 
61
  ## Citation
62
 
63
  If you use this model, please cite:
64
 
65
  ```bibtex
66
- @article{airrep2024,
67
- title={AirRep: Attribution-friendly Representation Learning},
68
- author={Sun, Weiwei},
69
- year={2024}
 
 
 
70
  }
71
  ```
72
 
 
1
  # AirRep-Flan
2
 
3
+ AirRep is an embedding model designed for computing training data influence on test examples.
4
 
5
  ## Model Description
6
 
7
+ This model is based on gte-small config with an additional projection layer
 
 
 
8
 
 
 
 
 
 
 
 
 
9
 
10
  ## Usage
11
 
12
+ https://github.com/sunnweiwei/AirRep
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
 
 
 
 
 
 
 
14
 
15
  ## Training Data
16
 
17
  This model was trained on the FLAN dataset with data influence optimization.
18
 
 
19
 
 
20
 
21
  ## Citation
22
 
23
  If you use this model, please cite:
24
 
25
  ```bibtex
26
+ @inproceedings{Sun2025AirRep,
27
+ title= {Enhancing Training Data Attribution with Representational Optimization},
28
+ author = {Weiwei Sun and Haokun Liu and Nikhil Kandpal and Colin Raffel and Yiming Yang},
29
+ year = {2025},
30
+ booktitle={NeurIPS},
31
+ year={2025},
32
+ url={https://arxiv.org/abs/2505.18513}
33
  }
34
  ```
35