murkasad commited on
Commit
0a7cc78
·
verified ·
1 Parent(s): 401cd2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -46
README.md CHANGED
@@ -1,46 +1,15 @@
1
- # Text Retrieval and Summarizer ChatBot Framework
2
-
3
- ### RAG (Retrieval-Augmented Generation) System
4
-
5
- **Project Summary**:
6
-
7
- This project is a text retrieval and summarization system that allows users to input a question and receive a concise summary based on relevant content.
8
-
9
- It works by first converting the user’s input into numerical embeddings using a sentence transformer model. These embeddings are then compared against a pre-built vector index (FAISS) to identify the most relevant text chunks from your dataset. The retrieved content is combined and passed to a transformer-based summarization model BART, which generates a concise summary as the final output.
10
-
11
- The entire pipeline is integrated into an interactive user interface built using Gradio, allowing users to easily input queries and view summarized results in real time.
12
-
13
- Steps:
14
- 1. Retrieves relevant text from a User's Document (FAISS)
15
- 2. Converts Corpus to Sentences (Sentence Transformer)
16
- 3. Generates a Summarized output (HuggingFace Text Summarizer)
17
-
18
- **Use of SBERT**:
19
-
20
- Sentence Transformers(SBERT), uses pretrained "Embedding" models, all we do is provide them our chunks from previous step and it creates vectors. (huggingface)
21
- Embeddings are dense, lower-dimensional, numerical vector representations of data such as text, images, or audio that capture semantic meaning and relationships.(soucre: google)
22
-
23
- Steps:
24
- 1. Load an embedding model
25
- 2. Feed text chunks into the model
26
- 3. Convert each chunk into a vector of numbers
27
-
28
- Transformer Model (all-MiniLM-L6-v2):
29
- This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.(huggingface)
30
-
31
- **Use of FAISS**:
32
-
33
- FAISS as a super-fast “vector search engine”, stands for Facebook AI Similarity Search.
34
- It is an open-source library developed by Meta's Fundamental AI Research group (formerly Facebook AI Research) designed for the efficient similarity search and clustering of dense vectors. (google)
35
-
36
- Takes chunks of text from the document
37
- As each chunk is previously converted to a 384-dimensional embedding by MiniLM
38
- This store all embeddings in FAISS
39
- so when a user asks a question, the question is converted to a vector and FAISS finds the nearest embeddings (most similar chunks of text from the document)
40
- Then we pass those chunks to your LLM to generate the answer
41
-
42
-
43
- **Final Pipeline**:
44
- Take PDF -> Get chunks -> Make embeddings -> Ask Question -> Retrieve Answer -> Summarize Result and Display Metrics
45
-
46
- *--by Murk Asad*
 
1
+ ---
2
+ title: RAGDeepLearningChatbot
3
+ emoji: 💬
4
+ python_version: '3.10'
5
+ colorFrom: yellow
6
+ colorTo: purple
7
+ sdk: gradio
8
+ sdk_version: 6.5.1
9
+ app_file: app.py
10
+ pinned: false
11
+ hf_oauth: true
12
+ hf_oauth_scopes:
13
+ - inference-api
14
+ short_description: Deep Learning Information Support Chatbot
15
+ ---