chippyjolly commited on
Commit
712059e
ยท
verified ยท
1 Parent(s): 74e453e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +90 -10
README.md CHANGED
@@ -1,14 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: My-App
3
- emoji: ๐Ÿš€
4
- colorFrom: blue
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: "4.44.0"
8
- app_file: app.py
9
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- # My Hugging Face Space
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- This is a demo Gradio application powered by Groq, LangChain, and FAISS.
 
1
+ # ๐Ÿ”ฌ AI Research Companion (Groq + LangChain + FAISS)
2
+
3
+ An advanced **AI-powered research assistant** that helps you analyze academic papers, ask natural language questions, generate engaging summaries, and discover related research papers โ€” all from a modern, tabbed **Gradio interface**.
4
+
5
+ ---
6
+
7
+ ## ๐Ÿš€ Features
8
+
9
+ โœ… **PDF Upload & Text Extraction** โ€“ Extracts full text from research papers
10
+ โœ… **Chunking & Vector Embedding** โ€“ Uses LangChain + HuggingFace embeddings for semantic search
11
+ โœ… **Groq LLM Q&A** โ€“ Powered by `llama-3.3-70b-versatile` for accurate, context-aware answers
12
+ โœ… **Cited Source References** โ€“ Displays the exact chunks used for each answer
13
+ โœ… **Research Paper Summarization** โ€“ Creates engaging, layperson-friendly summaries
14
+ โœ… **Similar Paper Discovery** โ€“ Queries arXiv API to find related academic works
15
+ โœ… **Beautiful Multi-Tab UI** โ€“ Fully custom styled with Gradio + CSS
16
+
17
  ---
18
+
19
+ ## ๐Ÿ›  Tech Stack
20
+
21
+ - **Python 3.9+**
22
+ - [Gradio](https://gradio.app/) โ€“ Interactive UI framework
23
+ - [LangChain](https://www.langchain.com/) โ€“ Document processing & QA chain
24
+ - [FAISS](https://github.com/facebookresearch/faiss) โ€“ Efficient similarity search
25
+ - [HuggingFace Sentence Transformers](https://www.sbert.net/) โ€“ Embeddings (`all-mpnet-base-v2`)
26
+ - [Groq API](https://groq.com/) โ€“ High-performance LLM inference
27
+ - [PyPDF2](https://pypi.org/project/PyPDF2/) โ€“ PDF parsing
28
+ - [Feedparser](https://pypi.org/project/feedparser/) โ€“ arXiv paper search
29
+ - **Custom CSS** โ€“ Modern tabbed layout, shadows, gradients, and animations
30
+
31
+ ---
32
+
33
+ ## ๐Ÿ“ฆ Installation & Setup
34
+
35
+ ### 1๏ธโƒฃ Clone the Repository
36
+
37
+ ```bash
38
+ git clone https://github.com/bobbythomas985/Research_Assistant
39
+ cd Research_Assistant
40
+ ```
41
+ ### 2๏ธโƒฃ Install Dependencies
42
+ ```bash
43
+ pip install -r requirements.txt
44
+ ```
45
+ ### 3๏ธโƒฃ Set Up Your API Key
46
+ Export your Groq API key as an environment variable:
47
+ **Linux / macOS**
48
+ ```bash
49
+ export GROQ_API_KEY="your_api_key_here"
50
+ ```
51
+ **Windows**
52
+ ```powershell
53
+ setx GROQ_API_KEY "your_api_key_here"
54
+ ```
55
+ Alternatively, replace the placeholder in **app.py**:
56
+ ```python
57
+ GROQ_API_KEY = os.getenv("GROQ_API_KEY", "your-api-key")
58
+ ```
59
+ ### 4๏ธโƒฃ Run the App
60
+ ```bash
61
+ python app.py
62
+ ```
63
+
64
  ---
65
 
66
+ ## ๐Ÿ–ฅ๏ธ How It Works
67
+
68
+ 1๏ธโƒฃ **Upload a PDF**
69
+ ๐Ÿ“„ The system extracts all text from the research paper.
70
+
71
+ 2๏ธโƒฃ **Process & Embed**
72
+ ๐Ÿ” Splits the extracted text into overlapping chunks and creates a **FAISS vector index** using **HuggingFace embeddings** for efficient semantic search.
73
+
74
+ 3๏ธโƒฃ **Ask Questions**
75
+ โ“ User questions are converted into embeddings and matched with the most relevant chunks from the document.
76
+
77
+ 4๏ธโƒฃ **LLM Answer Generation**
78
+ ๐Ÿค– Groqโ€™s `llama-3.3-70b-versatile` model is used to generate accurate, context-aware answers with a custom prompt.
79
+
80
+ 5๏ธโƒฃ **Summarize & Discover Papers**
81
+ ๐Ÿ“ Generates engaging, structured summaries of the document and retrieves similar papers from **arXiv** for further reading.
82
+
83
+
84
+ ## ๐Ÿ”ฎ Future Improvements
85
+
86
+ - ๐Ÿ“š **Multi-document support** โ€“ Build a single knowledge base from multiple PDFs
87
+ - ๐Ÿ“ท **LLM Reranking** โ€“ Use cross-encoder reranking for better context selection
88
+ - ๐Ÿ“‘ **Clickable Source References** โ€“ Jump directly to relevant sections inside the PDF
89
+ - ๐Ÿš€ **Deploy on Hugging Face Spaces / Streamlit Cloud** โ€“ Make it public and shareable
90
+ - ๐ŸŒ **Multilingual Q&A** โ€“ Integrate translation for global research accessibility
91
+
92
+ ---
93
 
94
+ > *Empowering researchers to go from papers โ†’ insights โ†’ new discoveries.*