Ono-Enzo commited on
Commit
6be80e5
Β·
verified Β·
1 Parent(s): bc15cc1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -39
README.md CHANGED
@@ -9,50 +9,30 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- # BigQA RAG
13
 
14
- This project is a Retrieval-Augmented Generation (RAG) application built with Streamlit and LangChain. It allows users to ask questions about a specific dataset, retrieving relevant context to generate accurate answers using a Large Language Model (LLM).
15
 
16
- ## Features
17
 
18
- - **RAG Architecture**: Combines document retrieval with generative AI.
19
- - **Streamlit Interface**: User-friendly web interface.
20
- - **Vector Search**: Uses HuggingFace embeddings (`all-MiniLM-L6-v2`) and an in-memory vector store.
21
- - **LLM Integration**: Connects to OpenRouter to use the `openai/gpt-oss-20b` model.
22
- - **Dataset**: Automatically loads and indexes the `Ono-Enzo/Dataset_test` dataset from Hugging Face.
23
 
24
- ## Prerequisites
 
 
 
25
 
26
- - Python 3.8+
27
- - An API Key from OpenRouter
28
 
29
- ## Installation
 
 
 
 
30
 
31
- 1. **Clone the repository** (or download the files):
32
- ```bash
33
- git clone <repository-url>
34
- cd RAG_BigQA
35
- ```
36
 
37
- 2. **Install dependencies**:
38
- ```bash
39
- pip install streamlit python-dotenv langchain-openai datasets langchain-community langchain-huggingface
40
- ```
41
-
42
- ## Configuration
43
-
44
- 1. Create a `.env` file in the root directory of the project.
45
- 2. Add your OpenRouter API key to the file:
46
- ```env
47
- OPENROUTER_API_KEY=your_api_key_here
48
- ```
49
-
50
- ## Usage
51
-
52
- Run the Streamlit application:
53
-
54
- ```bash
55
- streamlit run app.py
56
- ```
57
-
58
- The application will open in your default web browser. It will automatically download the dataset, generate embeddings, and prepare the LLM. Once ready, you can type your questions in the input field.
 
9
  pinned: false
10
  ---
11
 
12
+ # πŸ” BigQA β€” Retrieval-Augmented Generation
13
 
14
+ BigQA is a software architecture designed for querying large volumes of textual data. This application implements a **Retrieval-Augmented Generation (RAG)** pipeline, combining semantic document retrieval with Large Language Models (LLMs) to provide precise, context-aware answers.
15
 
16
+ ## πŸ“š Scientific Foundation
17
 
18
+ This implementation is based on the reference architecture proposed in the following research papers:
 
 
 
 
19
 
20
+ * **Design Principles and a Software Reference Architecture for Big Data Question Answering Systems (2023)**.
21
+ [Access Paper](https://www.scitepress.org/Link.aspx?doi=10.5220/0011842700003467)
22
+ * **BigQA: A Software Reference Architecture for Big Data Question Answering Systems (2024)**.
23
+ [Access Paper](https://link.springer.com/chapter/10.1007/978-3-031-64748-2_3)
24
 
25
+ ## πŸš€ Features
 
26
 
27
+ - **RAG Architecture**: Full integration between document retrieval and generative AI.
28
+ - **Vector Search**: Uses HuggingFace embeddings (`all-MiniLM-L6-v2`) for semantic similarity search.
29
+ - **Streamlit Interface**: An intuitive and responsive web interface for real-time querying.
30
+ - **LLM Integration**: Connected via OpenRouter to access state-of-the-art models (e.g., Qwen, Gemini, GPT).
31
+ - **Automated Indexing**: Automatic loading and processing of the `Ono-Enzo/Dataset_test` dataset.
32
 
33
+ ## πŸ› οΈ Tech Stack
 
 
 
 
34
 
35
+ - **LangChain**: Framework for orchestrating the AI logic.
36
+ - **Streamlit**: For the user interface.
37
+ - **OpenRouter**: Gateway for LLM access.
38
+ - **Hugging Face Datasets & Embeddings**: For data management and vectorization.