Spaces:

alexandraroze
/

rag_test_task

Sleeping

App Files Files Community

alexandraroze commited on Jul 11, 2024

Commit

87370c1

1 Parent(s): 5847e55

added readme

Browse files

Files changed (2) hide show

README.md +60 -12
src/chat.py +1 -7

README.md CHANGED Viewed

@@ -1,12 +1,60 @@
----
-title: Rag Test Task
-emoji: 💬
-colorFrom: yellow
-colorTo: purple
-sdk: gradio
-sdk_version: 4.36.1
-app_file: app.py
-pinned: false
----
-An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).

+# QA with RAG
+### Quick start
+The script is designed to be a Hugging Face chat interface, allowing users to simply use the chat without installing any dependencies.
+This is a link to the chat: [QA with RAG](https://huggingface.co/spaces/alexandraroze/rag_test_task).
+This chat uses a pre-built RAG model (instructions on how to run the script with RAG building will be described below).
+### Key features
+1. To start the chat, write a question in the chat and press Enter.
+2. The model saves the history of all conversations, so you can ask questions about previous answers.
+3. For each of your questions, the model identifies the topic you are interested in and extracts this topic from the query.
+4. The relevant documents are retrieved from the RAG using the extracted topic, so the model can base its answer on the retrieved documents.
+5. If your question does not relate to a specific topic or clarifies the previous model's message, the topic won't be extracted, and the model will use previously retrieved documents.
+6. Retrieved documents are listed below the chat, as well as the extracted topic (if the topic is not extracted, it will be empty).
+### How to run RAG building script
+Before launching the script, you should create a file `.env` in the root directory with the following content:
+```
+OPENAI_API_KEY="your_openai_token"
+OPENAI_EMBEDDINGS_MODEL="text-embedding-3-large"
+CHAT_MODEL="gpt-4o"
+PATH_TO_DATASET="Dataset"
+PATH_TO_INDEX="faiss_db"
+```
+Please, do not change `OPENAI_EMBEDDINGS_MODEL` value.
+To run the script which builds RAG, you need to launch the following commands:
+```bash
+pip install -r requirements.txt
+python ./build_rag.py --path_to_dataset Dataset --path_to_index faiss_db
+```
+The script will build the RAG model and save it to the specified path. It uses the dataset from the `Dataset` folder and saves the index to the `faiss_db` folder.
+### How to test retrieval from RAG separately
+If you want to look at retrieval process without chat interface, you can run the following command:
+```bash
+python ./test_rag.py --path_to_index faiss_db
+```
+After launching the script, you will be able to enter your queries and see the retrieved documents. To exit the script, enter `exit`.
+### Implementation details
+#### Splitting documents
+I wrote my own splitter for splitting documents since existing splitters do not consider the semantic meaning of the text. (There are some splitters that consider semantic meaning, but I did not like their quality.)
+- This splitter works like Agglomerative Clustering but considers the order of sentences in the text.
+- It splits the text into clusters of sentences, where each cluster contains sentences that are semantically close to each other and form a sequential order.
+- The splitter uses embeddings from the OpenAI embeddings model to calculate the similarity between sentences.
+- Each cluster represents a separate document in the RAG index.
+All implementation details are in the `src/rag.py` file.
+#### Indexing
+I used Faiss library and OpenAI embeddings model, namely text-embedding-3-large, since it is the latest and one of the best models for text embeddings.
+#### Chat interface
+I used Langchain library for the chat interface, since it allows to easily create a chat which saves history of all conversation.
+Implementation details are in the `src/chat.py` file.

src/chat.py CHANGED Viewed

@@ -15,11 +15,6 @@ GENERATE_ARGS = {
     'max_tokens': int(os.getenv("MAX_NEW_TOKENS", 1024)),
 }
-GENERATE_KWARGS = {
-    'top_p': float(os.getenv("TOP_P", 0.6)),
-    'frequency_penalty': max(-2, min(float(os.getenv("FREQ_PENALTY", 0)), 2))
-}
 class Chat:
@@ -31,8 +26,7 @@ class Chat:
         self.assistant_model = base(
             model=model,
             streaming=True,
-            **GENERATE_ARGS,
-            model_kwargs=GENERATE_KWARGS
         )
         self.store = {}

     'max_tokens': int(os.getenv("MAX_NEW_TOKENS", 1024)),
 }
 class Chat:
         self.assistant_model = base(
             model=model,
             streaming=True,
+            **GENERATE_ARGS
         )
         self.store = {}