snakeeee commited on
Commit
d9eac5e
Β·
1 Parent(s): 678baaf

add project documentation

Browse files
Files changed (1) hide show
  1. README.md +9 -119
README.md CHANGED
@@ -1,119 +1,9 @@
1
-
2
- # Scholar RAG Engine
3
-
4
- Scholar RAG Engine is a Retrieval-Augmented Generation (RAG) system designed for answering questions from PDFs and web pages.
5
-
6
- The system extracts content, builds semantic indexes, retrieves relevant context, and generates answers using an LLM.
7
-
8
- ## Features
9
-
10
- - PDF document indexing
11
- - Website content scraping
12
- - Hybrid semantic retrieval
13
- - ColBERT-style retrieval
14
- - Cross-encoder reranking
15
- - LLM answer generation
16
- - Modern UI with dark mode
17
- - Expandable retrieved context viewer
18
-
19
- ## Architecture
20
-
21
- Pipeline:
22
-
23
- User Query
24
- ↓
25
- Retriever (ColBERT)
26
- ↓
27
- Reranker (Cross Encoder)
28
- ↓
29
- Context Compression
30
- ↓
31
- LLM (Gemini)
32
- ↓
33
- Final Answer
34
-
35
- ## Tech Stack
36
-
37
- Backend:
38
- - FastAPI
39
- - Python
40
-
41
- Retrieval:
42
- - Sentence Transformers
43
- - FAISS
44
- - ColBERT-style token similarity
45
-
46
- Ranking:
47
- - Cross Encoder (MS MARCO)
48
-
49
- LLM:
50
- - Google Gemini API
51
-
52
- Frontend:
53
- - HTML
54
- - CSS
55
- - JavaScript
56
-
57
- Deployment:
58
- - Hugging Face Spaces
59
- - Docker
60
-
61
- ## Project Structure
62
- scholar-rag-engine
63
- β”‚
64
- β”œβ”€β”€ main.py
65
- β”œβ”€β”€ ingestion.py
66
- β”œβ”€β”€ chunking.py
67
- β”œβ”€β”€ scraper.py
68
- β”œβ”€β”€ retrieval_colbert.py
69
- β”œβ”€β”€ reranker.py
70
- β”œβ”€β”€ LLM.py
71
- β”œβ”€β”€ requirements.txt
72
- β”œβ”€β”€ Dockerfile
73
- β”‚
74
- └── templates
75
- └── index.html
76
-
77
-
78
- ## Installation
79
-
80
- Clone the repository
81
- git clone https://github.com/mr-snake-mr/scholar-rag-engine
82
- cd scholar-rag-engine
83
-
84
-
85
- Install dependencies
86
- pip install -r requirements.txt
87
-
88
-
89
- Run the server
90
- uvicorn main:app --reload
91
-
92
-
93
- Open in browser
94
- http://localhost:8000
95
-
96
-
97
- ## Environment Variables
98
-
99
- Set your Gemini API key:
100
- GOOGLE_API_KEY=your_gemini_api_key
101
-
102
-
103
- ## Deployment
104
-
105
- This project is deployed on Hugging Face Spaces using Docker.
106
- https://huggingface.co/spaces/snakeeee/scholar-rag-engine
107
-
108
-
109
- ## Future Improvements
110
-
111
- - Streaming responses
112
- - Chat-style UI
113
- - Multi-document support
114
- - Vector database integration
115
- - GPU acceleration
116
-
117
- ## Author
118
-
119
- Developed as an AI-powered research assistant project.
 
1
+ ---
2
+ title: Scholar RAG Engine
3
+ emoji: πŸ“š
4
+ colorFrom: blue
5
+ colorTo: indigo
6
+ sdk: docker
7
+ app_file: main.py
8
+ pinned: false
9
+ ---