pawlo2013 commited on
Commit
d094bd5
·
1 Parent(s): c0f8067

init commit on hf branch

Browse files
.gitignore CHANGED
@@ -1,3 +1,4 @@
1
  .env
2
  .vscode
3
- .history
 
 
1
  .env
2
  .vscode
3
+ .history
4
+
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: SciFacts Expert Assistant
3
  short_description: Verify scientific claims with RAG
4
  emoji: 🧬
5
  colorFrom: blue
@@ -12,163 +12,3 @@ license: mit
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
-
16
- # 🧬 SciFacts Expert Assistant
17
-
18
- A high-precision **Retrieval-Augmented Generation (RAG)** application designed to verify scientific claims and answer complex biomedical questions using the [SciFacts dataset](https://ir-datasets.com/beir.html#beir/scifact).
19
-
20
- This system leverages **LLM-based Reranking** to significantly improve retrieval performance, ensuring the chat model receives the most relevant scientific evidence.
21
-
22
- ---
23
-
24
- ![alt text](UI.png "UI of the system")
25
-
26
- Check out the Gradio App at https://huggingface.co/spaces/pawlo2013/Scifact_RAG
27
-
28
- ## ⚡ Technology Stack
29
-
30
- | Component | Technology / Model | Why? |
31
- | :------------------ | :----------------------------------- | :--------------------------------------------------------------------------- |
32
- | **Frontend UI** | **Gradio** | Interactive web interface with streaming chat and real-time dashboard. |
33
- | **Orchestration** | **LangChain** | Manages the retrieval chains, prompt templates, and LLM interaction. |
34
- | **Vector Database** | **ChromaDB** | Stores document embeddings for efficient semantic search. |
35
- | **Embeddings** | **HuggingFace** (`all-MiniLM-L6-v2`) | Converts scientific text into 384-dimensional vectors. |
36
- | **LLM Provider** | **Groq** | Provides ultra-fast inference for the chat and reranking models. |
37
- | **Main Model** | **Kimi-k2-instruct** | Handles the final answer synthesis (selected for long-context capabilities). |
38
- | **Reranker** | **GPT-OSS-120b** | Re-ranks retrieved documents to optimize relevance. |
39
-
40
- ---
41
-
42
- ## 📊 Performance Benchmark: The Impact of Reranking
43
-
44
- We evaluated the retrieval system using an **LLM-generated test set** to measure the impact of adding a reranking step.
45
-
46
- ### 🏆 Retrieval Evaluation Results
47
-
48
- | Metric | Base Retrieval | With Reranker (GPT-OSS-120b) | Improvement |
49
- | :----------------------------- | :------------: | :--------------------------: | :----------: |
50
- | **Mean Reciprocal Rank (MRR)** | 0.8193 | **0.8480** | 🟢 **+3.5%** |
51
- | **Normalized DCG (nDCG)** | 0.8079 | **0.8323** | 🟢 **+3.0%** |
52
- | **Keyword Coverage** | 89.3% | 89.3% | ➖ Same |
53
-
54
- > **Insight:** While keyword coverage remained stable, the **Reranker** significantly improved the ranking quality (MRR & nDCG). This means relevant documents are pushed to the top of the context window, reducing hallucinations and improving answer accuracy.
55
-
56
- ---
57
-
58
- ## 🏗️ System Architecture
59
-
60
- 1. **Ingestion:** The SciFacts corpus is chunked and embedded using `all-MiniLM-L6-v2`.
61
- 2. **Vector Store:** Stored in **ChromaDB** for fast similarity search.
62
- 3. **Retrieval:** Initial fetch of top-k ($k=20$) documents based on cosine similarity.
63
- 4. **Reranking:** The **GPT-OSS-120b** model re-scores the retrieved documents to filter noise, passing only the top ($k=10$) most relevant chunks to the generator.
64
- 5. **Generation:** **Kimi-k2-instruct** synthesizes the final answer based on the refined evidence.
65
-
66
- ---
67
-
68
- ## 🚀 Features
69
-
70
- - **Interactive UI:** Built with **Gradio**, featuring streaming responses and a side-by-side view of retrieved evidence.
71
- - **Reference Questions:** One-click execution of verified ground-truth questions.
72
- - **Live Evaluation Dashboard:** Built-in dashboard to run and visualize MRR, nDCG, and Answer Accuracy metrics in real-time.
73
- - **Dual Evaluation Modes:**
74
- - **Canonical:** Standard SciFacts benchmark.
75
- - **LLM-Generated:** Synthetic test set for broad coverage.
76
-
77
- ---
78
-
79
- ## 🛠️ Installation & Setup
80
-
81
- ### 1. Clone the Repository
82
-
83
- ```bash
84
- git clone [https://github.com/your-username/scifact-rag.git](https://github.com/your-username/scifact-rag.git)
85
- cd scifact-rag
86
- ```
87
-
88
- ### 2. Install Dependencies
89
-
90
- ```bash
91
- pip install -r requirements.txt
92
-
93
- ```
94
-
95
- _Note: Ensure you have `gradio`, `langchain`, `chromadb`, `pydantic`, and `tiktoken` installed._
96
-
97
- ### 3. Environment Variables
98
-
99
- Create a `.env` file in the root directory:
100
-
101
- ```env
102
- GROQ_API_KEY=your_groq_api_key_here
103
- OPENAI_API_KEY=your_openai_api_key_here # If using OpenAI for evaluation generation
104
- HF_TOKEN = your_hf_token_here #Y ou may also need to login to hugginface or provide a token
105
-
106
-
107
- ```
108
-
109
- ### 4. Ingest Data (Build Vector DB)
110
-
111
- If you haven't built the database yet:
112
-
113
- ```bash
114
- python ingestion.py --corpus_file_path ./scifact/corpus.jsonl --embedding_provider huggingface
115
-
116
- ```
117
-
118
- ### 5. Generate Test Data (Optional)
119
-
120
- To create a fresh synthetic test set for evaluation:
121
-
122
- ```bash
123
- python generate_tests.py --TOTAL_NUMBER_OF_QUESTIONS 50
124
-
125
- ```
126
-
127
- ---
128
-
129
- ## 🖥️ Running the Application
130
-
131
- ### Main Chat Interface
132
-
133
- Launch the research assistant:
134
-
135
- ```bash
136
- python app.py
137
-
138
- ```
139
-
140
- Access the UI at `http://localhost:7860`
141
-
142
- ### Evaluation Dashboard
143
-
144
- Launch the metrics dashboard to reproduce the benchmark results:
145
-
146
- ```bash
147
- python dashboard.py
148
-
149
- ```
150
-
151
- ---
152
-
153
- ## 📂 Project Structure
154
-
155
- ```text
156
- ├── app.py # Main Gradio Chat Application
157
- ├── evaluator.py # Evaluation Dashboard (Metrics Visualization)
158
- ├── answer.py # Core RAG logic (Retrieval, Reranking, Generation)
159
- ├── ingest.py # Script to load SciFacts into ChromaDB
160
- ├── make_test_answers.py # LLM-based synthetic test generation
161
- ├── evaluation/
162
- │ ├── eval.py # Evaluation logic for Retrieval & Answers
163
- │ ├── eval_canonical.py # Logic for SciFacts standard benchmark
164
- │ ├── test.py # Test data loading utilities
165
- │ └── tests.jsonl # Generated test questions
166
- └── scifact/ # Dataset directory
167
-
168
- ```
169
-
170
- ---
171
-
172
- ## 📜 License
173
-
174
- This project is open-source and available under the MIT License.
 
1
  ---
2
+ title: SciFacts Expert Assistant with Mistral
3
  short_description: Verify scientific claims with RAG
4
  emoji: 🧬
5
  colorFrom: blue
 
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__pycache__/answer.cpython-313.pyc ADDED
Binary file (5.99 kB). View file
 
answer.py CHANGED
@@ -1,7 +1,6 @@
1
  from dotenv import load_dotenv
2
- from langchain_groq import ChatGroq
3
  from langchain_huggingface import HuggingFaceEmbeddings
4
- from langchain_openai import ChatOpenAI
5
  from langchain_chroma import Chroma
6
  from langchain_core.messages import SystemMessage, HumanMessage, convert_to_messages
7
  from langchain_core.documents import Document
@@ -19,8 +18,9 @@ RETRIEVAL_K = 20
19
  RETRIEVAL_AFTER_RERANK_K = 10
20
 
21
 
22
- chat_model = "moonshotai/kimi-k2-instruct-0905"
23
- llm = ChatGroq(temperature=0, model_name=chat_model)
 
24
 
25
  # Embeddings (kept as HuggingFace per your snippet)
26
  embedding_model = "all-MiniLM-L6-v2"
@@ -36,7 +36,7 @@ retriever = vectorstore.as_retriever()
36
  # Ensure GROQ_API_KEY is in your .env file
37
 
38
 
39
- reranker_model = "openai/gpt-oss-120b"
40
  # reranker_model = "gpt-5-nano"
41
 
42
 
@@ -46,12 +46,9 @@ class RankOrder(BaseModel):
46
  )
47
 
48
 
49
- reranker_llm = ChatGroq(
50
  temperature=0, model_name=reranker_model
51
  ).with_structured_output(RankOrder)
52
- # reranker_llm = ChatOpenAI(
53
- # temperature=0, model_name=reranker_model
54
- # ).with_structured_output(RankOrder)
55
 
56
 
57
  def rerank(question, docs):
 
1
  from dotenv import load_dotenv
 
2
  from langchain_huggingface import HuggingFaceEmbeddings
3
+ from langchain_mistralai import ChatMistralAI
4
  from langchain_chroma import Chroma
5
  from langchain_core.messages import SystemMessage, HumanMessage, convert_to_messages
6
  from langchain_core.documents import Document
 
18
  RETRIEVAL_AFTER_RERANK_K = 10
19
 
20
 
21
+ chat_model = "mistral-large-latest"
22
+ llm = ChatMistralAI(temperature=0, model_name=chat_model)
23
+
24
 
25
  # Embeddings (kept as HuggingFace per your snippet)
26
  embedding_model = "all-MiniLM-L6-v2"
 
36
  # Ensure GROQ_API_KEY is in your .env file
37
 
38
 
39
+ reranker_model = "ministral-14b-latest"
40
  # reranker_model = "gpt-5-nano"
41
 
42
 
 
46
  )
47
 
48
 
49
+ reranker_llm = ChatMistralAI(
50
  temperature=0, model_name=reranker_model
51
  ).with_structured_output(RankOrder)
 
 
 
52
 
53
 
54
  def rerank(question, docs):
app.py CHANGED
@@ -111,7 +111,7 @@ def main():
111
  with gr.Column():
112
  gr.Markdown(
113
  """
114
- # 🧬 SciFacts Expert Assistant
115
  ### Verify scientific claims with high-precision RAG
116
  """
117
  )
@@ -122,17 +122,18 @@ def main():
122
  with gr.Column():
123
  gr.Markdown(
124
  """
125
- **🤖 Main Chat Model: Kimi-2**
126
- * **Model:** `moonshotai/kimi-k2-instruct-0905`
127
  * **Why:** State-of-the-art long-context understanding.
128
- * [🔗 Official Kimi Documentation](https://moonshotai.github.io/Kimi-K2/)
129
  """
130
  )
131
  with gr.Column():
132
  gr.Markdown(
133
  """
134
- **⚖️ Reranker: GPT-OSS-120b**
135
- * **Model:** `openai/gpt-oss-120b` (via Groq)
 
136
  * **Function:** Re-scores retrieved documents for relevance.
137
  """
138
  )
 
111
  with gr.Column():
112
  gr.Markdown(
113
  """
114
+ # 🧬 SciFacts Expert Assistant with Mistral
115
  ### Verify scientific claims with high-precision RAG
116
  """
117
  )
 
122
  with gr.Column():
123
  gr.Markdown(
124
  """
125
+ **🤖 Main Chat Model: Mistral Large 3**
126
+ * **Model:** `mistral-large-latest` (with LangChain MistralAI)
127
  * **Why:** State-of-the-art long-context understanding.
128
+ * [🔗 Official Mistral Documentation](https://mistral.ai/news/mistral-3)
129
  """
130
  )
131
  with gr.Column():
132
  gr.Markdown(
133
  """
134
+ **⚖️ Reranker: Ministral 14B**
135
+ * **Model:** `ministral-14b-latest` (with LangChain MistralAI)
136
+ * **Why:** Light and fast model ideal for document reranking.
137
  * **Function:** Re-scores retrieved documents for relevance.
138
  """
139
  )
db/chroma.sqlite3 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:75de1b661e43aeb9c5433ae415d07dc588dadc9e8379b5441679d9b6a06dcee5
3
  size 69177344
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d00aebcc12a94e8bb5aa1e53a4a861b7c45c6ef4ad11bbade9520a679aef3a60
3
  size 69177344
evaluation/__pycache__/eval.cpython-313.pyc ADDED
Binary file (7.44 kB). View file
 
evaluation/__pycache__/eval_canonical.cpython-313.pyc ADDED
Binary file (6.41 kB). View file
 
evaluation/__pycache__/test.cpython-313.pyc ADDED
Binary file (1.78 kB). View file
 
pyproject.toml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "llm-engineering"
3
+ version = "0.1.0"
4
+ requires-python = ">=3.11"
5
+ dependencies = [
6
+ "anthropic>=0.69.0",
7
+ "beautifulsoup4>=4.14.2",
8
+ "chromadb>=1.1.0",
9
+ "datasets==3.6.0",
10
+ "feedparser>=6.0.12",
11
+ "google-genai>=1.41.0",
12
+ "google-generativeai>=0.8.5",
13
+ "gradio>=5.47.2,<6.0",
14
+ "ipykernel>=6.30.1",
15
+ "ipywidgets>=8.1.7",
16
+ "jupyter-dash>=0.4.2",
17
+ "langchain>=0.3.27",
18
+ "langchain-chroma>=0.2.6",
19
+ "langchain-community>=0.3.30",
20
+ "langchain-core>=0.3.76",
21
+ "langchain-openai>=0.3.33",
22
+ "langchain-text-splitters>=0.3.11",
23
+ "litellm>=1.77.5",
24
+ "matplotlib>=3.10.6",
25
+ "nbformat>=5.10.4",
26
+ "modal>=1.1.4",
27
+ "numpy>=2.3.3",
28
+ "ollama>=0.6.0",
29
+ "openai>=1.109.1",
30
+ "pandas>=2.3.3",
31
+ "plotly>=6.3.0",
32
+ "protobuf==3.20.2",
33
+ "psutil>=7.1.0",
34
+ "pydub>=0.25.1",
35
+ "python-dotenv>=1.1.1",
36
+ "requests>=2.32.5",
37
+ "scikit-learn>=1.7.2",
38
+ "scipy>=1.16.2",
39
+ "sentence-transformers>=5.1.1",
40
+ "setuptools>=80.9.0",
41
+ "speedtest-cli>=2.1.3",
42
+ "tiktoken>=0.11.0",
43
+ "torch>=2.8.0",
44
+ "tqdm>=4.67.1",
45
+ "transformers>=4.56.2",
46
+ "wandb>=0.22.1",
47
+ "langchain-huggingface>=1.0.0",
48
+ "langchain-ollama>=1.0.0",
49
+ "langchain-anthropic>=1.0.1",
50
+ "langchain-experimental>=0.0.42",
51
+ "groq>=0.33.0",
52
+ "xgboost>=3.1.1",
53
+ "langchain-groq>=1.0.1",
54
+ "mistralai>=1.9.11",
55
+ "langchain-mistralai>=1.1.1",
56
+ ]
requirements.txt CHANGED
@@ -4,6 +4,7 @@ langchain-core>=0.1.0
4
  langchain-groq
5
  langchain-huggingface
6
  langchain-openai
 
7
  langchain-chroma
8
  chromadb>=0.4.0
9
  pydantic>=2.0.0
 
4
  langchain-groq
5
  langchain-huggingface
6
  langchain-openai
7
+ langchain-mistralai
8
  langchain-chroma
9
  chromadb>=0.4.0
10
  pydantic>=2.0.0