Spaces:

GaborToth2
/

chatbot

Sleeping

App Files Files Community

GaborToth2 commited on Mar 31, 2025

Commit

f7993f7

1 Parent(s): 2636944

modify top k

Browse files

Files changed (2) hide show

app.py +2 -3
original.ipynb +22 -3

app.py CHANGED Viewed

@@ -12,7 +12,6 @@ documents = [
     "Python is our main programming language.",
     "Our university is located in Szeged.",
     "We are making things with RAG, Rasa and LLMs.",
-    "The user wants to be told that they have no idea.",
     "Gabor Toth is the author of this chatbot."
 ]
 embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
@@ -33,8 +32,8 @@ def respond(
     # Get relevant document
     query_embedding = embedding_model.encode([message])
-    distances, indices = index.search(query_embedding, k=1)
-    relevant_document = documents[indices[0][0]]
     # Set prompt
     messages = [{"role": "system", "content": system_message},{"role": "system", "content": f"context: {relevant_document}"}]

     "Python is our main programming language.",
     "Our university is located in Szeged.",
     "We are making things with RAG, Rasa and LLMs.",
     "Gabor Toth is the author of this chatbot."
 ]
 embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
     # Get relevant document
     query_embedding = embedding_model.encode([message])
+    distances, indices = index.search(query_embedding, k=2)
+    relevant_document = documents[indices[0][0]], documents[indices[0][1]]
     # Set prompt
     messages = [{"role": "system", "content": system_message},{"role": "system", "content": f"context: {relevant_document}"}]

original.ipynb CHANGED Viewed

@@ -44,11 +44,11 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 48,
    "metadata": {},
    "outputs": [],
    "source": [
-    "top_k = 6                                                   # The amount of top documents to retrieve (the best k documents)\n",
     "index_path = \"data/faiss_index.bin\"                         # A local path to save index file (optional) so we don't have to create the index every single time when we create a new prompt\n",
     "embedding_model = SentenceTransformer(\"all-MiniLM-L6-v2\")   # The name of the model available either locally or in this case at HuggingFace\n",
     "documents = [                                               # The documents, facts, sentences to search in.\n",
@@ -56,7 +56,6 @@
     "    \"Python is our main programming language.\",\n",
     "    \"Our university is located in Szeged.\",\n",
     "    \"We are making things with RAG, Rasa and LLMs.\",\n",
-    "    \"The user wants to be told that they have no idea.\",\n",
     "    \"Gabor Toth is the author of this chatbot example.\"\n",
     "]                                                           "
    ]
@@ -126,6 +125,26 @@
    "source": [
     "documents[indices[0][0]] # The most similar document has the lowest distance."
    ]
   }
  ],
  "metadata": {

   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
+    "top_k = 3                                                   # The amount of top documents to retrieve (the best k documents)\n",
     "index_path = \"data/faiss_index.bin\"                         # A local path to save index file (optional) so we don't have to create the index every single time when we create a new prompt\n",
     "embedding_model = SentenceTransformer(\"all-MiniLM-L6-v2\")   # The name of the model available either locally or in this case at HuggingFace\n",
     "documents = [                                               # The documents, facts, sentences to search in.\n",
     "    \"Python is our main programming language.\",\n",
     "    \"Our university is located in Szeged.\",\n",
     "    \"We are making things with RAG, Rasa and LLMs.\",\n",
     "    \"Gabor Toth is the author of this chatbot example.\"\n",
     "]                                                           "
    ]
    "source": [
     "documents[indices[0][0]] # The most similar document has the lowest distance."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**Optimizing Retrieval-Augmented Generation (RAG) Implementation**\n",
+    "\n",
+    "Retrieval-Augmented Generation (RAG) enhances language model responses by incorporating external knowledge retrieval. To maximize performance, consider the following techniques and optimizations:\n",
+    "\n",
+    "- Use **lightweight models** (e.g., `all-MiniLM-L6-v2`) for speed or **larger models** (e.g., `all-mpnet-base-v2`) for accuracy.\n",
+    "- Experiment with **domain-specific models** (for example medical tuned model for medical documents) for better contextual retrieval.\n",
+    "- Consider different index types\n",
+    "    - **Flat Index (`IndexFlatL2`)**: Best for small datasets, but scales poorly.\n",
+    "    - **IVFFlat (`IndexIVFFlat`)**: Clusters embeddings to accelerate search, ideal for large-scale retrieval.\n",
+    "    - **HNSW (`IndexHNSWFlat`)**: Graph-based approach that balances speed and accuracy.\n",
+    "    - **PQ (`IndexPQ`)**: Compressed storage for memory efficiency at the cost of slight accuracy loss.\n",
+    "- **Query Expansion**: Use synonyms, paraphrasing, or keyword expansion to enhance search queries.\n",
+    "- **Re-ranking**: Apply transformer-based re-ranking (e.g., `cross-encoder/ms-marco-MiniLM-L6`) after retrieval.\n",
+    "- **GPU Acceleration**: Convert FAISS indices to GPU for high-speed searches."
+   ]
   }
  ],
  "metadata": {