Spaces:

sentence-transformers
/

quantized-retrieval

Running

App Files Files Community

Tom Aarsen commited on Jan 26

Commit

09c7453

1 Parent(s): 845d613

Use an accordion as the <details> didn't work

Browse files

Files changed (1) hide show

app.py +8 -9

app.py CHANGED Viewed

@@ -179,9 +179,12 @@ with gr.Blocks(title="Quantized Retrieval") as demo:
 <h1 style='margin-top: 0;'>Quantized Retrieval - Binary Search with Scalar (int8) Rescoring</h1>
 This demo showcases retrieval using [quantized embeddings](https://huggingface.co/blog/embedding-quantization) on a CPU. The corpus consists of [41 million texts](https://huggingface.co/datasets/sentence-transformers/quantized-retrieval-data) from Wikipedia articles.
-<details><summary>Click to learn about the retrieval process</summary>
 Details:
 1. The query is embedded using the [`mixedbread-ai/mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) SentenceTransformer model.
 2. The query is quantized to binary using the `quantize_embeddings` function from the SentenceTransformers library.
@@ -191,8 +194,8 @@ Details:
 6. The top 20 documents are sorted by score.
 7. The titles and texts of the top 20 documents are loaded on the fly from disk and displayed.
-This process is designed to be memory efficient and fast, with the binary index being small enough to fit in memory and the int8 index being loaded as a view to save memory.
-In total, this process requires keeping 1) the model in memory, 2) the binary index in memory, and 3) the int8 index on disk. With a dimensionality of 1024,
 we need `1024 / 8 * num_docs` bytes for the binary index and `1024 * num_docs` bytes for the int8 index.
 This is notably cheaper than doing the same process with float32 embeddings, which would require `4 * 1024 * num_docs` bytes of memory/disk space for the float32 index, i.e. 32x as much memory and 4x as much disk space.
@@ -202,10 +205,6 @@ Feel free to check out the [code for this demo](https://huggingface.co/spaces/se
 Notes:
 - The approximate search index (a binary Inverted File Index (IVF)) is in beta and has not been trained with a lot of data.
-</details>
-</div>
 """
             )
             query = gr.Textbox(

 <h1 style='margin-top: 0;'>Quantized Retrieval - Binary Search with Scalar (int8) Rescoring</h1>
 This demo showcases retrieval using [quantized embeddings](https://huggingface.co/blog/embedding-quantization) on a CPU. The corpus consists of [41 million texts](https://huggingface.co/datasets/sentence-transformers/quantized-retrieval-data) from Wikipedia articles.
+</div>
+"""
+            )
+            with gr.Accordion("Click to learn about the retrieval process", open=False):
+                gr.Markdown(
+                    """
 Details:
 1. The query is embedded using the [`mixedbread-ai/mxbai-embed-large-v1`](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) SentenceTransformer model.
 2. The query is quantized to binary using the `quantize_embeddings` function from the SentenceTransformers library.
 6. The top 20 documents are sorted by score.
 7. The titles and texts of the top 20 documents are loaded on the fly from disk and displayed.
+This process is designed to be memory efficient and fast, with the binary index being small enough to fit in memory and the int8 index being loaded as a view to save memory.
+In total, this process requires keeping 1) the model in memory, 2) the binary index in memory, and 3) the int8 index on disk. With a dimensionality of 1024,
 we need `1024 / 8 * num_docs` bytes for the binary index and `1024 * num_docs` bytes for the int8 index.
 This is notably cheaper than doing the same process with float32 embeddings, which would require `4 * 1024 * num_docs` bytes of memory/disk space for the float32 index, i.e. 32x as much memory and 4x as much disk space.
 Notes:
 - The approximate search index (a binary Inverted File Index (IVF)) is in beta and has not been trained with a lot of data.
 """
             )
             query = gr.Textbox(