Text Generation
Transformers
GGUF
reranker
conversational

Improve language tag

#2
by lbourdois - opened
Files changed (1) hide show
  1. README.md +229 -315
README.md CHANGED
@@ -1,315 +1,229 @@
1
-
2
- ---
3
-
4
- library_name: transformers
5
- license: apache-2.0
6
- language:
7
- - en
8
- - zh
9
- - es
10
- - de
11
- - ar
12
- - ru
13
- - ja
14
- - ko
15
- - hi
16
- - sk
17
- - vi
18
- - tr
19
- - fi
20
- - id
21
- - fa
22
- - 'no'
23
- - th
24
- - sv
25
- - pt
26
- - da
27
- - bn
28
- - te
29
- - ro
30
- - it
31
- - fr
32
- - nl
33
- - sw
34
- - pl
35
- - hu
36
- - cs
37
- - el
38
- - uk
39
- - mr
40
- - ta
41
- - tl
42
- - bg
43
- - lt
44
- - ur
45
- - he
46
- - gu
47
- - kn
48
- - am
49
- - kk
50
- - hr
51
- - uz
52
- - jv
53
- - ca
54
- - az
55
- - ms
56
- - sr
57
- - sl
58
- - yo
59
- - lv
60
- - is
61
- - ha
62
- - ka
63
- - et
64
- - bs
65
- - hy
66
- - ml
67
- - pa
68
- - mt
69
- - km
70
- - sq
71
- - or
72
- - as
73
- - my
74
- - mn
75
- - af
76
- - be
77
- - ga
78
- - mk
79
- - cy
80
- - gl
81
- - ceb
82
- - la
83
- - yi
84
- - lb
85
- - tg
86
- - gd
87
- - ne
88
- - ps
89
- - eu
90
- - ky
91
- - ku
92
- - si
93
- - ht
94
- - eo
95
- - lo
96
- - fy
97
- - sd
98
- - mg
99
- - so
100
- - ckb
101
- - su
102
- - nn
103
- datasets:
104
- - lightblue/reranker_continuous_filt_max7_train
105
- base_model:
106
- - Qwen/Qwen2.5-0.5B-Instruct
107
- pipeline_tag: text-generation
108
- tags:
109
- - reranker
110
-
111
- ---
112
-
113
- [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
114
-
115
-
116
- # QuantFactory/lb-reranker-0.5B-v1.0-GGUF
117
- This is quantized version of [lightblue/lb-reranker-0.5B-v1.0](https://huggingface.co/lightblue/lb-reranker-0.5B-v1.0) created using llama.cpp
118
-
119
- # Original Model Card
120
-
121
-
122
- # LB Reranker v1.0
123
-
124
- <div style="width: 100%; height: 160px;
125
- display: flex; align-items: center;
126
- justify-content: center;
127
- border: 8px solid black;
128
- font-size: 120px; font-weight: bold;
129
- text-align: center;
130
- color: #438db8;
131
- font-family: 'Helvetica Neue', sans-serif;">
132
- LBR
133
- </div>
134
-
135
- The LB Reranker has been trained to determine the relatedness of a given query to a piece of text, therefore allowing it to be used as a ranker or reranker in various retrieval-based tasks.
136
-
137
- This model is fine-tuned from a [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model checkpoint and was trained for roughly 5.5 hours using the 8 x L20 instance ([ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1)) on [Alibaba Cloud](https://www.alibabacloud.com/).
138
-
139
- The training data for this model can be found at [lightblue/reranker_continuous_filt_max7_train](https://huggingface.co/datasets/lightblue/reranker_continuous_filt_max7_train) and the code for generating this data as well as running the training of the model can be found on [our Github repo](https://github.com/lightblue-tech/lb-reranker).
140
-
141
- Trained on data in over 95 languages, this model is applicable to a broad range of use cases.
142
-
143
- This model has three main benefits over comparable rerankers.
144
- 1. It has shown slightly higher performance on evaluation benchmarks.
145
- 2. It has been trained on more languages than any previous model.
146
- 3. It is a simple Causal LM model trained to output a string between "1" and "7".
147
-
148
- This last point means that this model can be used natively with many widely available inference packages, including vLLM and LMDeploy.
149
- This in turns allows our reranker to benefit from improvements to inference as and when these packages release them.
150
-
151
- Update: We have also found that this model works pretty well as a code snippet reranker too (P@1 of 96%)! See our [Colab](https://colab.research.google.com/drive/1ABL1xaarekLIlVJKbniYhXgYu6ZNwfBm?usp=sharing) for more details.
152
-
153
- # How to use
154
-
155
- The model was trained to expect an input such as:
156
-
157
- ```
158
- <<<Query>>>
159
- {your_query_here}
160
-
161
- <<<Context>>>
162
- {your_context_here}
163
- ```
164
-
165
- And to output a string of a number between 1-7.
166
-
167
- In order to make a continuous score that can be used for reranking query-context pairs (i.e. a method with few ties), we calculate the expectation value of the scores.
168
-
169
- We include scripts to do this in both vLLM and LMDeploy:
170
-
171
- #### vLLM
172
-
173
- Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.
174
-
175
- ```python
176
- from vllm import LLM, SamplingParams
177
- import numpy as np
178
-
179
- def make_reranker_input(t, q):
180
- return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
181
-
182
- def make_reranker_training_datum(context, question):
183
- system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
184
-
185
- return [
186
- {"role": "system", "content": system_message},
187
- {"role": "user", "content": make_reranker_input(context, question)},
188
- ]
189
-
190
- def get_prob(logprob_dict, tok_id):
191
- return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0
192
-
193
- llm = LLM("lightblue/lb-reranker-v1.0")
194
- sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
195
- tok = llm.llm_engine.tokenizer.tokenizer
196
- idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
197
-
198
- query_texts = [
199
- ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
200
- ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
201
- ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
202
- ]
203
-
204
- chats = [make_reranker_training_datum(c, q) for q, c in query_texts]
205
- responses = llm.chat(chats, sampling_params)
206
- probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])
207
-
208
- N = probs.shape[1]
209
- M = probs.shape[0]
210
- idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
211
-
212
- expected_vals = (probs * idxs).sum(axis=1)
213
- print(expected_vals)
214
- # [6.66570732 1.86686378 1.01102923]
215
- ```
216
-
217
- #### LMDeploy
218
-
219
- Install [LMDeploy](https://github.com/InternLM/lmdeploy) using `pip install lmdeploy`.
220
-
221
- ```python
222
- # Un-comment this if running in a Jupyter notebook, Colab etc.
223
- # import nest_asyncio
224
- # nest_asyncio.apply()
225
-
226
- from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
227
- import numpy as np
228
-
229
- def make_reranker_input(t, q):
230
- return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
231
-
232
- def make_reranker_training_datum(context, question):
233
- system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
234
-
235
- return [
236
- {"role": "system", "content": system_message},
237
- {"role": "user", "content": make_reranker_input(context, question)},
238
- ]
239
-
240
- def get_prob(logprob_dict, tok_id):
241
- return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0
242
-
243
- pipe = pipeline(
244
- "lightblue/lb-reranker-v1.0",
245
- chat_template_config=ChatTemplateConfig(
246
- model_name='qwen2d5',
247
- capability='chat'
248
- )
249
- )
250
- tok = pipe.tokenizer.model
251
- idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
252
-
253
- query_texts = [
254
- ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
255
- ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
256
- ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
257
- ]
258
-
259
- chats = [make_reranker_training_datum(c, q) for q, c in query_texts]
260
- responses = pipe(
261
- chats,
262
- gen_config=GenerationConfig(temperature=1.0, logprobs=14, max_new_tokens=1, do_sample=True)
263
- )
264
- probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])
265
-
266
- N = probs.shape[1]
267
- M = probs.shape[0]
268
- idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
269
-
270
- expected_vals = (probs * idxs).sum(axis=1)
271
- print(expected_vals)
272
- # [6.66415229 1.84342025 1.01133205]
273
- ```
274
-
275
- # Evaluation
276
-
277
- We perform an evaluation on 9 datasets from the [BEIR benchmark](https://github.com/beir-cellar/beir) that none of the evaluated models have been trained upon (to our knowledge).
278
-
279
- * Arguana
280
- * Dbpedia-entity
281
- * Fiqa
282
- * NFcorpus
283
- * Scidocs
284
- * Scifact
285
- * Trec-covid-v2
286
- * Vihealthqa
287
- * Webis-touche2020
288
-
289
- We evaluate on a subset of all queries (the first 250) to save evaluation time.
290
-
291
- We find that our model performs similarly or better than many of the state-of-the-art reranker models in our evaluation, without compromising on inference speed.
292
-
293
- We make our evaluation code and results available [on our Github](https://github.com/lightblue-tech/lb-reranker/blob/main/run_bier.ipynb).
294
-
295
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/xkNzCABFUmU7UmDXUduiz.png)
296
-
297
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/P-XCA3TGHqDSX8k6c4hCE.png)
298
-
299
- As we can see, this reranker attains greater IR evaluation metrics compared to the two benchmarks we include for all positions apart from @1.
300
-
301
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/puhhWseBOcIyOEdW4L-B0.png)
302
-
303
- We also show that our model is, on average, faster than the BGE reranker v2.
304
-
305
- # License
306
-
307
- We share this model under an Apache 2.0 license.
308
-
309
- # Developed by
310
-
311
- <a href="https://www.lightblue-tech.com">
312
- <img src="https://www.lightblue-tech.com/wp-content/uploads/2023/08/color_%E6%A8%AA%E5%9E%8B-1536x469.png" alt="Lightblue technology logo" width="400"/>
313
- </a>
314
-
315
- This model was trained by Peter Devine ([ptrdvn](https://huggingface.co/ptrdvn)) for Lightblue
 
1
+ ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ language:
5
+ - zho
6
+ - eng
7
+ - fra
8
+ - spa
9
+ - por
10
+ - deu
11
+ - ita
12
+ - rus
13
+ - jpn
14
+ - kor
15
+ - vie
16
+ - tha
17
+ - ara
18
+ datasets:
19
+ - lightblue/reranker_continuous_filt_max7_train
20
+ base_model:
21
+ - Qwen/Qwen2.5-0.5B-Instruct
22
+ pipeline_tag: text-generation
23
+ tags:
24
+ - reranker
25
+ ---
26
+
27
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
28
+
29
+
30
+ # QuantFactory/lb-reranker-0.5B-v1.0-GGUF
31
+ This is quantized version of [lightblue/lb-reranker-0.5B-v1.0](https://huggingface.co/lightblue/lb-reranker-0.5B-v1.0) created using llama.cpp
32
+
33
+ # Original Model Card
34
+
35
+
36
+ # LB Reranker v1.0
37
+
38
+ <div style="width: 100%; height: 160px;
39
+ display: flex; align-items: center;
40
+ justify-content: center;
41
+ border: 8px solid black;
42
+ font-size: 120px; font-weight: bold;
43
+ text-align: center;
44
+ color: #438db8;
45
+ font-family: 'Helvetica Neue', sans-serif;">
46
+ LBR
47
+ </div>
48
+
49
+ The LB Reranker has been trained to determine the relatedness of a given query to a piece of text, therefore allowing it to be used as a ranker or reranker in various retrieval-based tasks.
50
+
51
+ This model is fine-tuned from a [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) model checkpoint and was trained for roughly 5.5 hours using the 8 x L20 instance ([ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1)) on [Alibaba Cloud](https://www.alibabacloud.com/).
52
+
53
+ The training data for this model can be found at [lightblue/reranker_continuous_filt_max7_train](https://huggingface.co/datasets/lightblue/reranker_continuous_filt_max7_train) and the code for generating this data as well as running the training of the model can be found on [our Github repo](https://github.com/lightblue-tech/lb-reranker).
54
+
55
+ Trained on data in over 95 languages, this model is applicable to a broad range of use cases.
56
+
57
+ This model has three main benefits over comparable rerankers.
58
+ 1. It has shown slightly higher performance on evaluation benchmarks.
59
+ 2. It has been trained on more languages than any previous model.
60
+ 3. It is a simple Causal LM model trained to output a string between "1" and "7".
61
+
62
+ This last point means that this model can be used natively with many widely available inference packages, including vLLM and LMDeploy.
63
+ This in turns allows our reranker to benefit from improvements to inference as and when these packages release them.
64
+
65
+ Update: We have also found that this model works pretty well as a code snippet reranker too (P@1 of 96%)! See our [Colab](https://colab.research.google.com/drive/1ABL1xaarekLIlVJKbniYhXgYu6ZNwfBm?usp=sharing) for more details.
66
+
67
+ # How to use
68
+
69
+ The model was trained to expect an input such as:
70
+
71
+ ```
72
+ <<<Query>>>
73
+ {your_query_here}
74
+
75
+ <<<Context>>>
76
+ {your_context_here}
77
+ ```
78
+
79
+ And to output a string of a number between 1-7.
80
+
81
+ In order to make a continuous score that can be used for reranking query-context pairs (i.e. a method with few ties), we calculate the expectation value of the scores.
82
+
83
+ We include scripts to do this in both vLLM and LMDeploy:
84
+
85
+ #### vLLM
86
+
87
+ Install [vLLM](https://github.com/vllm-project/vllm/) using `pip install vllm`.
88
+
89
+ ```python
90
+ from vllm import LLM, SamplingParams
91
+ import numpy as np
92
+
93
+ def make_reranker_input(t, q):
94
+ return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
95
+
96
+ def make_reranker_training_datum(context, question):
97
+ system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
98
+
99
+ return [
100
+ {"role": "system", "content": system_message},
101
+ {"role": "user", "content": make_reranker_input(context, question)},
102
+ ]
103
+
104
+ def get_prob(logprob_dict, tok_id):
105
+ return np.exp(logprob_dict[tok_id].logprob) if tok_id in logprob_dict.keys() else 0
106
+
107
+ llm = LLM("lightblue/lb-reranker-v1.0")
108
+ sampling_params = SamplingParams(temperature=0.0, logprobs=14, max_tokens=1)
109
+ tok = llm.llm_engine.tokenizer.tokenizer
110
+ idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
111
+
112
+ query_texts = [
113
+ ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
114
+ ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
115
+ ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
116
+ ]
117
+
118
+ chats = [make_reranker_training_datum(c, q) for q, c in query_texts]
119
+ responses = llm.chat(chats, sampling_params)
120
+ probs = np.array([[get_prob(r.outputs[0].logprobs[0], y) for y in idx_tokens] for r in responses])
121
+
122
+ N = probs.shape[1]
123
+ M = probs.shape[0]
124
+ idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
125
+
126
+ expected_vals = (probs * idxs).sum(axis=1)
127
+ print(expected_vals)
128
+ # [6.66570732 1.86686378 1.01102923]
129
+ ```
130
+
131
+ #### LMDeploy
132
+
133
+ Install [LMDeploy](https://github.com/InternLM/lmdeploy) using `pip install lmdeploy`.
134
+
135
+ ```python
136
+ # Un-comment this if running in a Jupyter notebook, Colab etc.
137
+ # import nest_asyncio
138
+ # nest_asyncio.apply()
139
+
140
+ from lmdeploy import GenerationConfig, ChatTemplateConfig, pipeline
141
+ import numpy as np
142
+
143
+ def make_reranker_input(t, q):
144
+ return f"<<<Query>>>\n{q}\n\n<<<Context>>>\n{t}"
145
+
146
+ def make_reranker_training_datum(context, question):
147
+ system_message = "Given a query and a piece of text, output a score of 1-7 based on how related the query is to the text. 1 means least related and 7 is most related."
148
+
149
+ return [
150
+ {"role": "system", "content": system_message},
151
+ {"role": "user", "content": make_reranker_input(context, question)},
152
+ ]
153
+
154
+ def get_prob(logprob_dict, tok_id):
155
+ return np.exp(logprob_dict[tok_id]) if tok_id in logprob_dict.keys() else 0
156
+
157
+ pipe = pipeline(
158
+ "lightblue/lb-reranker-v1.0",
159
+ chat_template_config=ChatTemplateConfig(
160
+ model_name='qwen2d5',
161
+ capability='chat'
162
+ )
163
+ )
164
+ tok = pipe.tokenizer.model
165
+ idx_tokens = [tok.encode(str(i))[0] for i in range(1, 8)]
166
+
167
+ query_texts = [
168
+ ("What is the scientific name of apples?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
169
+ ("What is the Chinese word for 'apple'?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
170
+ ("What is the square root of 999?", "An apple is a round, edible fruit produced by an apple tree (Malus spp., among them the domestic or orchard apple; Malus domestica)."),
171
+ ]
172
+
173
+ chats = [make_reranker_training_datum(c, q) for q, c in query_texts]
174
+ responses = pipe(
175
+ chats,
176
+ gen_config=GenerationConfig(temperature=1.0, logprobs=14, max_new_tokens=1, do_sample=True)
177
+ )
178
+ probs = np.array([[get_prob(r.logprobs[0], y) for y in idx_tokens] for r in responses])
179
+
180
+ N = probs.shape[1]
181
+ M = probs.shape[0]
182
+ idxs = np.tile(np.arange(1, N + 1), M).reshape(M, N)
183
+
184
+ expected_vals = (probs * idxs).sum(axis=1)
185
+ print(expected_vals)
186
+ # [6.66415229 1.84342025 1.01133205]
187
+ ```
188
+
189
+ # Evaluation
190
+
191
+ We perform an evaluation on 9 datasets from the [BEIR benchmark](https://github.com/beir-cellar/beir) that none of the evaluated models have been trained upon (to our knowledge).
192
+
193
+ * Arguana
194
+ * Dbpedia-entity
195
+ * Fiqa
196
+ * NFcorpus
197
+ * Scidocs
198
+ * Scifact
199
+ * Trec-covid-v2
200
+ * Vihealthqa
201
+ * Webis-touche2020
202
+
203
+ We evaluate on a subset of all queries (the first 250) to save evaluation time.
204
+
205
+ We find that our model performs similarly or better than many of the state-of-the-art reranker models in our evaluation, without compromising on inference speed.
206
+
207
+ We make our evaluation code and results available [on our Github](https://github.com/lightblue-tech/lb-reranker/blob/main/run_bier.ipynb).
208
+
209
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/xkNzCABFUmU7UmDXUduiz.png)
210
+
211
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/P-XCA3TGHqDSX8k6c4hCE.png)
212
+
213
+ As we can see, this reranker attains greater IR evaluation metrics compared to the two benchmarks we include for all positions apart from @1.
214
+
215
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64b63f8ad57e02621dc93c8b/puhhWseBOcIyOEdW4L-B0.png)
216
+
217
+ We also show that our model is, on average, faster than the BGE reranker v2.
218
+
219
+ # License
220
+
221
+ We share this model under an Apache 2.0 license.
222
+
223
+ # Developed by
224
+
225
+ <a href="https://www.lightblue-tech.com">
226
+ <img src="https://www.lightblue-tech.com/wp-content/uploads/2023/08/color_%E6%A8%AA%E5%9E%8B-1536x469.png" alt="Lightblue technology logo" width="400"/>
227
+ </a>
228
+
229
+ This model was trained by Peter Devine ([ptrdvn](https://huggingface.co/ptrdvn)) for Lightblue