Spaces:

Ono-Enzo
/

BIGqa-RAG

Sleeping

App Files Files Community

Ono-Enzo commited on Jan 7

Commit

6be80e5

verified ·

1 Parent(s): bc15cc1

Update README.md

Browse files

Files changed (1) hide show

README.md +19 -39

README.md CHANGED Viewed

@@ -9,50 +9,30 @@ app_file: app.py
 pinned: false
 ---
-# BigQA RAG
-This project is a Retrieval-Augmented Generation (RAG) application built with Streamlit and LangChain. It allows users to ask questions about a specific dataset, retrieving relevant context to generate accurate answers using a Large Language Model (LLM).
-## Features
-- **RAG Architecture**: Combines document retrieval with generative AI.
-- **Streamlit Interface**: User-friendly web interface.
-- **Vector Search**: Uses HuggingFace embeddings (`all-MiniLM-L6-v2`) and an in-memory vector store.
-- **LLM Integration**: Connects to OpenRouter to use the `openai/gpt-oss-20b` model.
-- **Dataset**: Automatically loads and indexes the `Ono-Enzo/Dataset_test` dataset from Hugging Face.
-## Prerequisites
-- Python 3.8+
-- An API Key from OpenRouter
-## Installation
-1. **Clone the repository** (or download the files):
-   ```bash
-   git clone <repository-url>
-   cd RAG_BigQA
-   ```
-2. **Install dependencies**:
-   ```bash
-   pip install streamlit python-dotenv langchain-openai datasets langchain-community langchain-huggingface
-   ```
-## Configuration
-1. Create a `.env` file in the root directory of the project.
-2. Add your OpenRouter API key to the file:
-   ```env
-   OPENROUTER_API_KEY=your_api_key_here
-   ```
-## Usage
-Run the Streamlit application:
-```bash
-streamlit run app.py
-```
-The application will open in your default web browser. It will automatically download the dataset, generate embeddings, and prepare the LLM. Once ready, you can type your questions in the input field.

 pinned: false
 ---
+# 🔍 BigQA — Retrieval-Augmented Generation
+BigQA is a software architecture designed for querying large volumes of textual data. This application implements a **Retrieval-Augmented Generation (RAG)** pipeline, combining semantic document retrieval with Large Language Models (LLMs) to provide precise, context-aware answers.
+## 📚 Scientific Foundation
+This implementation is based on the reference architecture proposed in the following research papers:
+* **Design Principles and a Software Reference Architecture for Big Data Question Answering Systems (2023)**.
+    [Access Paper](https://www.scitepress.org/Link.aspx?doi=10.5220/0011842700003467)
+* **BigQA: A Software Reference Architecture for Big Data Question Answering Systems (2024)**.
+    [Access Paper](https://link.springer.com/chapter/10.1007/978-3-031-64748-2_3)
+## 🚀 Features
+- **RAG Architecture**: Full integration between document retrieval and generative AI.
+- **Vector Search**: Uses HuggingFace embeddings (`all-MiniLM-L6-v2`) for semantic similarity search.
+- **Streamlit Interface**: An intuitive and responsive web interface for real-time querying.
+- **LLM Integration**: Connected via OpenRouter to access state-of-the-art models (e.g., Qwen, Gemini, GPT).
+- **Automated Indexing**: Automatic loading and processing of the `Ono-Enzo/Dataset_test` dataset.
+## 🛠️ Tech Stack
+- **LangChain**: Framework for orchestrating the AI logic.
+- **Streamlit**: For the user interface.
+- **OpenRouter**: Gateway for LLM access.
+- **Hugging Face Datasets & Embeddings**: For data management and vectorization.