Tom Aarsen commited on
Commit ·
09c7453
1
Parent(s): 845d613
Use an accordion as the <details> didn't work
Browse files
app.py
CHANGED
|
@@ -179,9 +179,12 @@ with gr.Blocks(title="Quantized Retrieval") as demo:
|
|
| 179 |
<h1 style='margin-top: 0;'>Quantized Retrieval - Binary Search with Scalar (int8) Rescoring</h1>
|
| 180 |
|
| 181 |
This demo showcases retrieval using [quantized embeddings](https://huggingface.co/blog/embedding-quantization) on a CPU. The corpus consists of [41 million texts](https://huggingface.co/datasets/sentence-transformers/quantized-retrieval-data) from Wikipedia articles.
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
|
|
|
|
|
|
|
|
|
| 185 |
Details:
|
| 186 |
1. The query is embedded using the [`mixedbread-ai/mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) SentenceTransformer model.
|
| 187 |
2. The query is quantized to binary using the `quantize_embeddings` function from the SentenceTransformers library.
|
|
@@ -191,8 +194,8 @@ Details:
|
|
| 191 |
6. The top 20 documents are sorted by score.
|
| 192 |
7. The titles and texts of the top 20 documents are loaded on the fly from disk and displayed.
|
| 193 |
|
| 194 |
-
This process is designed to be memory efficient and fast, with the binary index being small enough to fit in memory and the int8 index being loaded as a view to save memory.
|
| 195 |
-
In total, this process requires keeping 1) the model in memory, 2) the binary index in memory, and 3) the int8 index on disk. With a dimensionality of 1024,
|
| 196 |
we need `1024 / 8 * num_docs` bytes for the binary index and `1024 * num_docs` bytes for the int8 index.
|
| 197 |
|
| 198 |
This is notably cheaper than doing the same process with float32 embeddings, which would require `4 * 1024 * num_docs` bytes of memory/disk space for the float32 index, i.e. 32x as much memory and 4x as much disk space.
|
|
@@ -202,10 +205,6 @@ Feel free to check out the [code for this demo](https://huggingface.co/spaces/se
|
|
| 202 |
|
| 203 |
Notes:
|
| 204 |
- The approximate search index (a binary Inverted File Index (IVF)) is in beta and has not been trained with a lot of data.
|
| 205 |
-
|
| 206 |
-
</details>
|
| 207 |
-
|
| 208 |
-
</div>
|
| 209 |
"""
|
| 210 |
)
|
| 211 |
query = gr.Textbox(
|
|
|
|
| 179 |
<h1 style='margin-top: 0;'>Quantized Retrieval - Binary Search with Scalar (int8) Rescoring</h1>
|
| 180 |
|
| 181 |
This demo showcases retrieval using [quantized embeddings](https://huggingface.co/blog/embedding-quantization) on a CPU. The corpus consists of [41 million texts](https://huggingface.co/datasets/sentence-transformers/quantized-retrieval-data) from Wikipedia articles.
|
| 182 |
+
</div>
|
| 183 |
+
"""
|
| 184 |
+
)
|
| 185 |
+
with gr.Accordion("Click to learn about the retrieval process", open=False):
|
| 186 |
+
gr.Markdown(
|
| 187 |
+
"""
|
| 188 |
Details:
|
| 189 |
1. The query is embedded using the [`mixedbread-ai/mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) SentenceTransformer model.
|
| 190 |
2. The query is quantized to binary using the `quantize_embeddings` function from the SentenceTransformers library.
|
|
|
|
| 194 |
6. The top 20 documents are sorted by score.
|
| 195 |
7. The titles and texts of the top 20 documents are loaded on the fly from disk and displayed.
|
| 196 |
|
| 197 |
+
This process is designed to be memory efficient and fast, with the binary index being small enough to fit in memory and the int8 index being loaded as a view to save memory.
|
| 198 |
+
In total, this process requires keeping 1) the model in memory, 2) the binary index in memory, and 3) the int8 index on disk. With a dimensionality of 1024,
|
| 199 |
we need `1024 / 8 * num_docs` bytes for the binary index and `1024 * num_docs` bytes for the int8 index.
|
| 200 |
|
| 201 |
This is notably cheaper than doing the same process with float32 embeddings, which would require `4 * 1024 * num_docs` bytes of memory/disk space for the float32 index, i.e. 32x as much memory and 4x as much disk space.
|
|
|
|
| 205 |
|
| 206 |
Notes:
|
| 207 |
- The approximate search index (a binary Inverted File Index (IVF)) is in beta and has not been trained with a lot of data.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 208 |
"""
|
| 209 |
)
|
| 210 |
query = gr.Textbox(
|