Spaces:

pawlo2013
/

scifact_rag_mistral

Sleeping

App Files Files Community

pawlo2013 commited on Jan 26

Commit

d094bd5

1 Parent(s): c0f8067

init commit on hf branch

Browse files

Files changed (11) hide show

.gitignore +2 -1
README.md +1 -161
__pycache__/answer.cpython-313.pyc +0 -0
answer.py +6 -9
app.py +7 -6
db/chroma.sqlite3 +1 -1
evaluation/__pycache__/eval.cpython-313.pyc +0 -0
evaluation/__pycache__/eval_canonical.cpython-313.pyc +0 -0
evaluation/__pycache__/test.cpython-313.pyc +0 -0
pyproject.toml +56 -0
requirements.txt +1 -0

.gitignore CHANGED Viewed

@@ -1,3 +1,4 @@
 .env
 .vscode
-.history

 .env
 .vscode
+.history

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: SciFacts Expert Assistant
 short_description: Verify scientific claims with RAG
 emoji: 🧬
 colorFrom: blue
@@ -12,163 +12,3 @@ license: mit
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-# 🧬 SciFacts Expert Assistant
-A high-precision **Retrieval-Augmented Generation (RAG)** application designed to verify scientific claims and answer complex biomedical questions using the [SciFacts dataset](https://ir-datasets.com/beir.html#beir/scifact).
-This system leverages **LLM-based Reranking** to significantly improve retrieval performance, ensuring the chat model receives the most relevant scientific evidence.
----
-![alt text](UI.png "UI of the system")
-Check out the Gradio App at https://huggingface.co/spaces/pawlo2013/Scifact_RAG
-## ⚡ Technology Stack
-| Component           | Technology / Model                   | Why?                                                                         |
-| :------------------ | :----------------------------------- | :--------------------------------------------------------------------------- |
-| **Frontend UI**     | **Gradio**                           | Interactive web interface with streaming chat and real-time dashboard.       |
-| **Orchestration**   | **LangChain**                        | Manages the retrieval chains, prompt templates, and LLM interaction.         |
-| **Vector Database** | **ChromaDB**                         | Stores document embeddings for efficient semantic search.                    |
-| **Embeddings**      | **HuggingFace** (`all-MiniLM-L6-v2`) | Converts scientific text into 384-dimensional vectors.                       |
-| **LLM Provider**    | **Groq**                             | Provides ultra-fast inference for the chat and reranking models.             |
-| **Main Model**      | **Kimi-k2-instruct**                 | Handles the final answer synthesis (selected for long-context capabilities). |
-| **Reranker**        | **GPT-OSS-120b**                     | Re-ranks retrieved documents to optimize relevance.                          |
----
-## 📊 Performance Benchmark: The Impact of Reranking
-We evaluated the retrieval system using an **LLM-generated test set** to measure the impact of adding a reranking step.
-### 🏆 Retrieval Evaluation Results
-| Metric                         | Base Retrieval | With Reranker (GPT-OSS-120b) | Improvement  |
-| :----------------------------- | :------------: | :--------------------------: | :----------: |
-| **Mean Reciprocal Rank (MRR)** |     0.8193     |          **0.8480**          | 🟢 **+3.5%** |
-| **Normalized DCG (nDCG)**      |     0.8079     |          **0.8323**          | 🟢 **+3.0%** |
-| **Keyword Coverage**           |     89.3%      |            89.3%             |   ➖ Same    |
-> **Insight:** While keyword coverage remained stable, the **Reranker** significantly improved the ranking quality (MRR & nDCG). This means relevant documents are pushed to the top of the context window, reducing hallucinations and improving answer accuracy.
----
-## 🏗️ System Architecture
-1.  **Ingestion:** The SciFacts corpus is chunked and embedded using `all-MiniLM-L6-v2`.
-2.  **Vector Store:** Stored in **ChromaDB** for fast similarity search.
-3.  **Retrieval:** Initial fetch of top-k ($k=20$) documents based on cosine similarity.
-4.  **Reranking:** The **GPT-OSS-120b** model re-scores the retrieved documents to filter noise, passing only the top ($k=10$) most relevant chunks to the generator.
-5.  **Generation:** **Kimi-k2-instruct** synthesizes the final answer based on the refined evidence.
----
-## 🚀 Features
-- **Interactive UI:** Built with **Gradio**, featuring streaming responses and a side-by-side view of retrieved evidence.
-- **Reference Questions:** One-click execution of verified ground-truth questions.
-- **Live Evaluation Dashboard:** Built-in dashboard to run and visualize MRR, nDCG, and Answer Accuracy metrics in real-time.
-- **Dual Evaluation Modes:**
-  - **Canonical:** Standard SciFacts benchmark.
-  - **LLM-Generated:** Synthetic test set for broad coverage.
----
-## 🛠️ Installation & Setup
-### 1. Clone the Repository
-```bash
-git clone [https://github.com/your-username/scifact-rag.git](https://github.com/your-username/scifact-rag.git)
-cd scifact-rag
-```
-### 2. Install Dependencies
-```bash
-pip install -r requirements.txt
-```
-_Note: Ensure you have `gradio`, `langchain`, `chromadb`, `pydantic`, and `tiktoken` installed._
-### 3. Environment Variables
-Create a `.env` file in the root directory:
-```env
-GROQ_API_KEY=your_groq_api_key_here
-OPENAI_API_KEY=your_openai_api_key_here  # If using OpenAI for evaluation generation
-HF_TOKEN = your_hf_token_here #Y ou may also need to login to hugginface or provide a token
-```
-### 4. Ingest Data (Build Vector DB)
-If you haven't built the database yet:
-```bash
-python ingestion.py --corpus_file_path ./scifact/corpus.jsonl --embedding_provider huggingface
-```
-### 5. Generate Test Data (Optional)
-To create a fresh synthetic test set for evaluation:
-```bash
-python generate_tests.py --TOTAL_NUMBER_OF_QUESTIONS 50
-```
----
-## 🖥️ Running the Application
-### Main Chat Interface
-Launch the research assistant:
-```bash
-python app.py
-```
-Access the UI at `http://localhost:7860`
-### Evaluation Dashboard
-Launch the metrics dashboard to reproduce the benchmark results:
-```bash
-python dashboard.py
-```
----
-## 📂 Project Structure
-```text
-├── app.py                 # Main Gradio Chat Application
-├── evaluator.py           # Evaluation Dashboard (Metrics Visualization)
-├── answer.py              # Core RAG logic (Retrieval, Reranking, Generation)
-├── ingest.py           # Script to load SciFacts into ChromaDB
-├── make_test_answers.py      # LLM-based synthetic test generation
-├── evaluation/
-│   ├── eval.py            # Evaluation logic for Retrieval & Answers
-│   ├── eval_canonical.py  # Logic for SciFacts standard benchmark
-│   ├── test.py            # Test data loading utilities
-│   └── tests.jsonl        # Generated test questions
-└── scifact/               # Dataset directory
-```
----
-## 📜 License
-This project is open-source and available under the MIT License.

 ---
+title: SciFacts Expert Assistant with Mistral
 short_description: Verify scientific claims with RAG
 emoji: 🧬
 colorFrom: blue
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

__pycache__/answer.cpython-313.pyc ADDED Viewed

Binary file (5.99 kB). View file

answer.py CHANGED Viewed

@@ -1,7 +1,6 @@
 from dotenv import load_dotenv
-from langchain_groq import ChatGroq
 from langchain_huggingface import HuggingFaceEmbeddings
-from langchain_openai import ChatOpenAI
 from langchain_chroma import Chroma
 from langchain_core.messages import SystemMessage, HumanMessage, convert_to_messages
 from langchain_core.documents import Document
@@ -19,8 +18,9 @@ RETRIEVAL_K = 20
 RETRIEVAL_AFTER_RERANK_K = 10
-chat_model = "moonshotai/kimi-k2-instruct-0905"
-llm = ChatGroq(temperature=0, model_name=chat_model)
 # Embeddings (kept as HuggingFace per your snippet)
 embedding_model = "all-MiniLM-L6-v2"
@@ -36,7 +36,7 @@ retriever = vectorstore.as_retriever()
 # Ensure GROQ_API_KEY is in your .env file
-reranker_model = "openai/gpt-oss-120b"
 # reranker_model = "gpt-5-nano"
@@ -46,12 +46,9 @@ class RankOrder(BaseModel):
     )
-reranker_llm = ChatGroq(
     temperature=0, model_name=reranker_model
 ).with_structured_output(RankOrder)
-# reranker_llm = ChatOpenAI(
-#   temperature=0, model_name=reranker_model
-# ).with_structured_output(RankOrder)
 def rerank(question, docs):

 from dotenv import load_dotenv
 from langchain_huggingface import HuggingFaceEmbeddings
+from langchain_mistralai import ChatMistralAI
 from langchain_chroma import Chroma
 from langchain_core.messages import SystemMessage, HumanMessage, convert_to_messages
 from langchain_core.documents import Document
 RETRIEVAL_AFTER_RERANK_K = 10
+chat_model = "mistral-large-latest"
+llm = ChatMistralAI(temperature=0, model_name=chat_model)
 # Embeddings (kept as HuggingFace per your snippet)
 embedding_model = "all-MiniLM-L6-v2"
 # Ensure GROQ_API_KEY is in your .env file
+reranker_model = "ministral-14b-latest"
 # reranker_model = "gpt-5-nano"
     )
+reranker_llm = ChatMistralAI(
     temperature=0, model_name=reranker_model
 ).with_structured_output(RankOrder)
 def rerank(question, docs):

app.py CHANGED Viewed

@@ -111,7 +111,7 @@ def main():
             with gr.Column():
                 gr.Markdown(
                     """
-                    # 🧬 SciFacts Expert Assistant
                     ### Verify scientific claims with high-precision RAG
                     """
                 )
@@ -122,17 +122,18 @@ def main():
                 with gr.Column():
                     gr.Markdown(
                         """
-                        **🤖 Main Chat Model: Kimi-2**
-                        * **Model:** `moonshotai/kimi-k2-instruct-0905`
                         * **Why:** State-of-the-art long-context understanding.
-                        * [🔗 Official Kimi Documentation](https://moonshotai.github.io/Kimi-K2/)
                         """
                     )
                 with gr.Column():
                     gr.Markdown(
                         """
-                        **⚖️ Reranker: GPT-OSS-120b**
-                        * **Model:** `openai/gpt-oss-120b` (via Groq)
                         * **Function:** Re-scores retrieved documents for relevance.
                         """
                     )

             with gr.Column():
                 gr.Markdown(
                     """
+                    # 🧬 SciFacts Expert Assistant with Mistral
                     ### Verify scientific claims with high-precision RAG
                     """
                 )
                 with gr.Column():
                     gr.Markdown(
                         """
+                        **🤖 Main Chat Model: Mistral Large 3**
+                        * **Model:** `mistral-large-latest` (with LangChain MistralAI)
                         * **Why:** State-of-the-art long-context understanding.
+                        * [🔗 Official Mistral Documentation](https://mistral.ai/news/mistral-3)
                         """
                     )
                 with gr.Column():
                     gr.Markdown(
                         """
+                        **⚖️ Reranker: Ministral 14B**
+                        * **Model:** `ministral-14b-latest` (with LangChain MistralAI)
+                        * **Why:** Light and fast model ideal for document reranking.
                         * **Function:** Re-scores retrieved documents for relevance.
                         """
                     )

db/chroma.sqlite3 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:75de1b661e43aeb9c5433ae415d07dc588dadc9e8379b5441679d9b6a06dcee5
 size 69177344

 version https://git-lfs.github.com/spec/v1
+oid sha256:d00aebcc12a94e8bb5aa1e53a4a861b7c45c6ef4ad11bbade9520a679aef3a60
 size 69177344

evaluation/__pycache__/eval.cpython-313.pyc ADDED Viewed

Binary file (7.44 kB). View file

evaluation/__pycache__/eval_canonical.cpython-313.pyc ADDED Viewed

Binary file (6.41 kB). View file

evaluation/__pycache__/test.cpython-313.pyc ADDED Viewed

Binary file (1.78 kB). View file

pyproject.toml ADDED Viewed

	@@ -0,0 +1,56 @@

+[project]
+name = "llm-engineering"
+version = "0.1.0"
+requires-python = ">=3.11"
+dependencies = [
+    "anthropic>=0.69.0",
+    "beautifulsoup4>=4.14.2",
+    "chromadb>=1.1.0",
+    "datasets==3.6.0",
+    "feedparser>=6.0.12",
+    "google-genai>=1.41.0",
+    "google-generativeai>=0.8.5",
+    "gradio>=5.47.2,<6.0",
+    "ipykernel>=6.30.1",
+    "ipywidgets>=8.1.7",
+    "jupyter-dash>=0.4.2",
+    "langchain>=0.3.27",
+    "langchain-chroma>=0.2.6",
+    "langchain-community>=0.3.30",
+    "langchain-core>=0.3.76",
+    "langchain-openai>=0.3.33",
+    "langchain-text-splitters>=0.3.11",
+    "litellm>=1.77.5",
+    "matplotlib>=3.10.6",
+    "nbformat>=5.10.4",
+    "modal>=1.1.4",
+    "numpy>=2.3.3",
+    "ollama>=0.6.0",
+    "openai>=1.109.1",
+    "pandas>=2.3.3",
+    "plotly>=6.3.0",
+    "protobuf==3.20.2",
+    "psutil>=7.1.0",
+    "pydub>=0.25.1",
+    "python-dotenv>=1.1.1",
+    "requests>=2.32.5",
+    "scikit-learn>=1.7.2",
+    "scipy>=1.16.2",
+    "sentence-transformers>=5.1.1",
+    "setuptools>=80.9.0",
+    "speedtest-cli>=2.1.3",
+    "tiktoken>=0.11.0",
+    "torch>=2.8.0",
+    "tqdm>=4.67.1",
+    "transformers>=4.56.2",
+    "wandb>=0.22.1",
+    "langchain-huggingface>=1.0.0",
+    "langchain-ollama>=1.0.0",
+    "langchain-anthropic>=1.0.1",
+    "langchain-experimental>=0.0.42",
+    "groq>=0.33.0",
+    "xgboost>=3.1.1",
+    "langchain-groq>=1.0.1",
+    "mistralai>=1.9.11",
+    "langchain-mistralai>=1.1.1",
+]

requirements.txt CHANGED Viewed

@@ -4,6 +4,7 @@ langchain-core>=0.1.0
 langchain-groq
 langchain-huggingface
 langchain-openai
 langchain-chroma
 chromadb>=0.4.0
 pydantic>=2.0.0

 langchain-groq
 langchain-huggingface
 langchain-openai
+langchain-mistralai
 langchain-chroma
 chromadb>=0.4.0
 pydantic>=2.0.0