Spaces:
Running on Zero
Running on Zero
| # examples.py | |
| # Pre-built examples and UI content for the Starbucks reranking demo. | |
| # --------------------------------------------------------------------------- | |
| # EXAMPLES | |
| # Each entry has: | |
| # title – short display name shown in the Gradio example row | |
| # query – the user's search query | |
| # docs – exactly 5 documents (list of strings) | |
| # | |
| # Documents are intentionally mixed-relevance so the ranking differences | |
| # between Starbucks sizes (Demi → Trenta) are easy to observe. | |
| # --------------------------------------------------------------------------- | |
| EXAMPLES = [ | |
| { | |
| "title": "Scientific / AI Search", | |
| "query": "attention mechanism in transformer models", | |
| "docs": [ | |
| # Highly relevant – directly explains the attention mechanism | |
| "The transformer architecture introduced in 'Attention is All You Need' (Vaswani et al., 2017) " | |
| "relies entirely on self-attention to model dependencies between input and output tokens. " | |
| "Scaled dot-product attention computes a weighted sum of values, where the weight assigned to " | |
| "each value is determined by the dot product of the query with the corresponding key, divided " | |
| "by the square root of the key dimension.", | |
| # Relevant – discusses multi-head attention, a core component | |
| "Multi-head attention allows the model to jointly attend to information from different " | |
| "representation subspaces at different positions. Concretely, the queries, keys, and values " | |
| "are linearly projected h times into different subspaces, attention is applied in parallel " | |
| "on each projection, and the outputs are concatenated and projected again.", | |
| # Moderately relevant – about BERT, which uses attention but focuses on pretraining | |
| "BERT (Bidirectional Encoder Representations from Transformers) is a language model that " | |
| "uses a multi-layer bidirectional transformer encoder pretrained on masked language modeling " | |
| "and next sentence prediction. Its stacked attention layers capture rich contextual " | |
| "representations that can be fine-tuned for downstream NLP tasks.", | |
| # Weakly relevant – mentions neural networks but not attention or transformers | |
| "Recurrent neural networks (RNNs) and their variants such as LSTMs and GRUs process " | |
| "sequential data by maintaining a hidden state that is updated at each time step. " | |
| "Although they were the dominant approach for sequence modelling before 2017, they suffer " | |
| "from vanishing gradients and difficulty parallelising computation.", | |
| # Irrelevant – about computer vision, not NLP or attention | |
| "Convolutional neural networks (CNNs) use learnable filters that slide over the input image " | |
| "to extract hierarchical spatial features. ResNet introduced skip connections to allow " | |
| "training of very deep networks, reaching state-of-the-art accuracy on ImageNet classification.", | |
| ], | |
| }, | |
| { | |
| "title": "E-commerce / Product Search", | |
| "query": "noise-cancelling wireless headphones for travel", | |
| "docs": [ | |
| # Highly relevant – exactly matches the query intent | |
| "Sony WH-1000XM5 Wireless Noise-Cancelling Headphones: Industry-leading noise cancellation " | |
| "powered by two processors and eight microphones. Up to 30 hours of battery life with quick " | |
| "charge (3-minute charge = 3 hours playback). Foldable design for easy packing, weighing " | |
| "only 250 g. Multipoint Bluetooth connection lets you pair two devices simultaneously. " | |
| "Perfect for long-haul flights and commuters.", | |
| # Relevant – wireless noise-cancelling headphones, slightly different use-case emphasis | |
| "Bose QuietComfort 45 Bluetooth Wireless Noise Cancelling Headphones: Acclaimed acoustic " | |
| "noise cancellation technology blocks distracting background sound. TriPort acoustic " | |
| "architecture for deep, clear sound. 22-hour battery life and lightweight foldable design " | |
| "that fits into the included carry case. Works wired with a 3.5 mm cable when battery " | |
| "runs out on a plane.", | |
| # Moderately relevant – wireless headphones but no noise cancellation | |
| "JBL Tune 770NC Adaptive Noise Cancelling Wireless Headphones: 70-hour battery life " | |
| "and foldable design. Available in multiple colours. Supports Bluetooth 5.3 and can be " | |
| "paired with the JBL Headphones app for EQ customisation. Note: noise cancellation is " | |
| "adaptive but not as strong as higher-end models.", | |
| # Weakly relevant – wired headphones, not wireless, good sound quality | |
| "Audio-Technica ATH-M50x Professional Studio Monitor Headphones: Critically acclaimed " | |
| "for their detailed sound reproduction. 45 mm large-aperture drivers, 90-degree swivelling " | |
| "earcups for portability. Wired only – comes with detachable straight and coiled cables. " | |
| "Ideal for studio monitoring and mixing but requires a headphone amplifier for best results.", | |
| # Irrelevant – earbuds, in-ear, no noise cancellation, not for travel | |
| "Skullcandy Dime True Wireless Earbuds: Compact and affordable truly wireless earbuds " | |
| "with 3.5 hours of earbud battery life plus 9.5 hours from the charging case. IPX4 water " | |
| "resistance. No active noise cancellation. Suitable for casual listening at the gym or " | |
| "during a walk.", | |
| ], | |
| }, | |
| { | |
| "title": "Medical / Health Query", | |
| "query": "symptoms and treatment of type 2 diabetes", | |
| "docs": [ | |
| # Highly relevant – covers both symptoms and treatment directly | |
| "Type 2 diabetes is a chronic condition in which the body does not use insulin properly " | |
| "(insulin resistance) and the pancreas cannot produce enough insulin to compensate. " | |
| "Common symptoms include increased thirst, frequent urination, fatigue, blurred vision, " | |
| "and slow-healing sores. First-line treatment typically involves lifestyle changes " | |
| "(diet, exercise, weight loss) and metformin. Additional medications such as SGLT-2 " | |
| "inhibitors or GLP-1 receptor agonists may be added as the disease progresses.", | |
| # Relevant – focuses on treatment options in detail | |
| "Pharmacological management of type 2 diabetes has expanded significantly. Beyond " | |
| "metformin, clinicians now consider cardiovascular and renal outcomes when choosing agents. " | |
| "SGLT-2 inhibitors (e.g., empagliflozin, dapagliflozin) reduce glucose reabsorption in " | |
| "the kidney and have demonstrated cardiovascular and renal protective effects. GLP-1 " | |
| "receptor agonists (e.g., semaglutide, liraglutide) promote insulin secretion, suppress " | |
| "glucagon, and aid weight loss. Regular HbA1c monitoring guides treatment escalation.", | |
| # Moderately relevant – about diabetes broadly but focuses on type 1 | |
| "Diabetes mellitus encompasses a group of metabolic diseases characterised by " | |
| "hyperglycaemia. Type 1 diabetes results from autoimmune destruction of pancreatic " | |
| "beta cells, requiring lifelong insulin therapy. Differentiating type 1 from type 2 " | |
| "is clinically important: type 1 usually presents acutely in younger individuals, " | |
| "while type 2 has a more gradual onset often associated with obesity and family history.", | |
| # Weakly relevant – about diet and nutrition generally, not diabetes-specific | |
| "A balanced diet rich in whole grains, vegetables, lean protein, and healthy fats can " | |
| "reduce the risk of many chronic diseases including cardiovascular disease, obesity, " | |
| "and metabolic syndrome. Reducing added sugar and refined carbohydrate intake is widely " | |
| "recommended. Regular physical activity complements dietary changes for long-term " | |
| "health maintenance.", | |
| # Irrelevant – about an unrelated condition | |
| "Rheumatoid arthritis (RA) is an autoimmune disorder characterised by chronic " | |
| "inflammation of the joints, leading to pain, swelling, and eventual joint destruction. " | |
| "Symptoms typically include morning stiffness lasting more than an hour, symmetric " | |
| "joint involvement, and systemic features such as fatigue and low-grade fever. " | |
| "Disease-modifying antirheumatic drugs (DMARDs) such as methotrexate are the cornerstone " | |
| "of treatment.", | |
| ], | |
| }, | |
| { | |
| "title": "Technical / Coding Help", | |
| "query": "how to handle asynchronous errors in Python async/await", | |
| "docs": [ | |
| # Highly relevant – directly addresses the question with code-level guidance | |
| "In Python's asyncio framework, exceptions raised inside a coroutine propagate when the " | |
| "coroutine's result is awaited. To handle them, wrap the `await` call in a try/except " | |
| "block: `try: result = await some_coroutine() except ValueError as e: ...`. For tasks " | |
| "created with `asyncio.create_task()`, unhandled exceptions are stored in the Task object " | |
| "and re-raised when the task is awaited or retrieved via `task.result()`. Use " | |
| "`asyncio.gather(*coros, return_exceptions=True)` to collect all results—including " | |
| "exceptions—without stopping the event loop on the first failure.", | |
| # Relevant – explains asyncio.gather and exception handling patterns | |
| "When running multiple coroutines concurrently with `asyncio.gather`, the default " | |
| "behaviour is to cancel all remaining futures if one raises an exception. Passing " | |
| "`return_exceptions=True` changes this: each exception is returned as a result rather " | |
| "than re-raised. You can then iterate over the results and check `isinstance(result, " | |
| "Exception)` to detect failures. This pattern is particularly useful for batch " | |
| "processing where partial failures are acceptable.", | |
| # Moderately relevant – about async context managers and cleanup, tangential to error handling | |
| "Async context managers (`async with`) and async generators (`async for`) can be used " | |
| "to manage resources in asynchronous code. The `__aenter__` and `__aexit__` methods " | |
| "allow clean setup and teardown even when exceptions occur. Using `contextlib." | |
| "asynccontextmanager` simplifies writing custom async context managers without " | |
| "subclassing. Proper use of async context managers prevents resource leaks when " | |
| "exceptions are thrown mid-coroutine.", | |
| # Weakly relevant – about Python error handling in general, not async-specific | |
| "Python's exception hierarchy is rooted at `BaseException`. Most user-facing exceptions " | |
| "inherit from `Exception`. Custom exceptions should subclass `Exception` or a more " | |
| "specific class. The `try/except/else/finally` construct provides fine-grained control: " | |
| "`else` runs if no exception was raised, `finally` always runs regardless. Raising " | |
| "exceptions with `raise ... from e` preserves the original traceback via exception chaining.", | |
| # Irrelevant – about JavaScript promises, not Python | |
| "In JavaScript, asynchronous operations are managed with Promises. The `.then()` and " | |
| "`.catch()` methods handle fulfilled and rejected promises respectively. With async/await " | |
| "syntax (ES2017), you can write asynchronous code that reads like synchronous code. " | |
| "Unhandled promise rejections trigger a warning in Node.js and will cause process exit " | |
| "in future versions. Use `Promise.allSettled()` to wait for all promises regardless of " | |
| "outcome.", | |
| ], | |
| }, | |
| ] | |
| # --------------------------------------------------------------------------- | |
| # HOW_TO_USE_CONTENT | |
| # Rendered as Markdown inside the Gradio Accordion. | |
| # --------------------------------------------------------------------------- | |
| HOW_TO_USE_CONTENT = """ | |
| ## What Is Starbucks? | |
| **Starbucks** is a **2D Matryoshka embedding model** for dense retrieval and document reranking. | |
| Like a set of Russian nesting dolls, a single Starbucks model contains many smaller, fully usable | |
| sub-models inside it — one for every combination of *layer depth* and *embedding dimension*. | |
| The key innovation is that you can choose your operating point **at inference time**, with no | |
| retraining: | |
| | Axis | What it controls | | |
| |------|-----------------| | |
| | **Layer depth** (Demi → Trenta) | How many transformer layers are run — directly controls encoding speed and representation richness | | |
| | **Embedding dimension** | How many dimensions of the output vector are used — controls index size and dot-product cost | | |
| The model is trained with **SMAE pretraining** (Stochastic Masked Autoencoder) followed by | |
| **SRL fine-tuning** (Starbucks Representation Learning), which teaches every sub-model to produce | |
| useful embeddings simultaneously. | |
| --- | |
| ## How to Use This Demo | |
| 1. **Enter a query** in the *Query* text box (e.g., `"noise-cancelling headphones for travel"`). | |
| 2. **Enter documents** — one per line — in the *Documents* text box. | |
| 3. **Click Run** (or press Enter). The app will encode the query and all documents using each of the | |
| six Starbucks sizes and display the ranked results side by side. | |
| 4. **Or click an example row** at the bottom of the page to auto-fill the query and documents with | |
| a pre-built example, then click Run to see how the sizes compare. | |
| --- | |
| ## Understanding the Results | |
| Each size panel shows: | |
| - **Ranking** — documents reordered from most to least relevant according to that sub-model. | |
| The number in brackets (e.g., `[Doc 3]`) is the original document index so you can compare | |
| across sizes. | |
| - **Score** — cosine similarity between the query embedding and the document embedding. | |
| Scores range from −1 to 1; higher means more relevant. | |
| - **Encoding Time** — wall-clock time (in seconds) for the model to convert the query and all | |
| documents into embedding vectors. This scales with the number of transformer layers used and | |
| is where most of the latency comes from. | |
| - **Search Time** — time to compute cosine similarities and sort the results. This is typically | |
| very fast (a few milliseconds) because it is just a matrix multiply plus argsort. | |
| **Key insight:** as you move from *Demi* to *Trenta*, encoding time increases roughly linearly | |
| with the number of layers, but ranking quality improves significantly — especially for queries | |
| where subtle semantic differences matter. | |
| --- | |
| ## The Efficiency–Quality Tradeoff | |
| The six Starbucks sizes correspond to six checkpoints along the layer axis of the underlying | |
| BERT-base encoder (12 layers total, 768-dimensional hidden states): | |
| | Size | Layers used | Relative speed | Expected quality | | |
| |------|-------------|---------------|-----------------| | |
| | Demi | 2 | ★★★★★★ (fastest) | ★★ | | |
| | Short | 4 | ★★★★★ | ★★★ | | |
| | Tall | 6 | ★★★★ | ★★★★ | | |
| | Grande | 8 | ★★★ | ★★★★ | | |
| | Venti | 10 | ★★ | ★★★★★ | | |
| | Trenta | 12 | ★ (full model) | ★★★★★★ | | |
| Embedding dimension can be further reduced (e.g., from 768 to 64) for additional speed-ups | |
| in similarity search with modest quality loss — but the demo uses the full dimension for | |
| clarity. | |
| --- | |
| ## About the Paper | |
| **Starbucks: Benchmarking and Training Efficient 2D Matryoshka Retrieval Models** | |
| - **Architecture:** BERT-base transformer with 2D Matryoshka training | |
| - **Pretraining:** SMAE (Stochastic Masked Autoencoder) on large text corpora | |
| - **Fine-tuning:** SRL (Starbucks Representation Learning) on MS MARCO passage ranking | |
| - **HuggingFace model:** [`ielabgroup/Starbucks-msmarco`](https://huggingface.co/ielabgroup/Starbucks-msmarco) | |
| - **BEIR benchmark:** Starbucks-Trenta matches full BERT-base performance; smaller sizes offer | |
| strong Pareto-optimal points on the speed–quality frontier. | |
| """ | |
| # --------------------------------------------------------------------------- | |
| # SIZE_TABLE_HTML | |
| # An HTML table summarising the six Starbucks sizes. | |
| # Intended for direct injection into a Gradio HTML component. | |
| # --------------------------------------------------------------------------- | |
| SIZE_TABLE_HTML = """ | |
| <table style="border-collapse: collapse; width: 100%; font-family: sans-serif; font-size: 0.9rem;"> | |
| <thead> | |
| <tr style="background-color: #1E3A2F; color: #ffffff;"> | |
| <th style="padding: 10px 14px; text-align: left; border: 1px solid #ccc;">Size</th> | |
| <th style="padding: 10px 14px; text-align: center; border: 1px solid #ccc;">Layers Used</th> | |
| <th style="padding: 10px 14px; text-align: center; border: 1px solid #ccc;">Embedding Dim</th> | |
| <th style="padding: 10px 14px; text-align: center; border: 1px solid #ccc;">Relative Speed</th> | |
| <th style="padding: 10px 14px; text-align: center; border: 1px solid #ccc;">Expected Quality</th> | |
| <th style="padding: 10px 14px; text-align: left; border: 1px solid #ccc;">Best For</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr style="background-color: #f9f9f9;"> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; font-weight: bold; color: #00704A;">☕ Demi</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">2</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">768</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★★★★ Fastest</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc;">Keyword-heavy queries, latency-critical systems</td> | |
| </tr> | |
| <tr> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; font-weight: bold; color: #00704A;">☕ Short</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">4</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">768</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc;">High-throughput first-stage retrieval</td> | |
| </tr> | |
| <tr style="background-color: #f9f9f9;"> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; font-weight: bold; color: #00704A;">☕ Tall</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">6</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">768</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc;">Balanced deployments; good Pareto point</td> | |
| </tr> | |
| <tr> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; font-weight: bold; color: #00704A;">☕ Grande</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">8</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">768</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc;">Quality-focused pipelines with moderate compute budgets</td> | |
| </tr> | |
| <tr style="background-color: #f9f9f9;"> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; font-weight: bold; color: #00704A;">☕ Venti</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">10</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">768</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc;">Near-full-model quality; slight latency saving</td> | |
| </tr> | |
| <tr> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; font-weight: bold; color: #00704A;">☕ Trenta</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">12 (full)</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">768</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★ Slowest</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc; text-align: center;">★★★★★★</td> | |
| <td style="padding: 8px 14px; border: 1px solid #ccc;">Maximum accuracy; offline indexing or small corpora</td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| """ | |