File size: 4,009 Bytes
a7933ea
 
 
 
 
 
 
 
 
 
 
 
e049915
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
title: DocuChatDeepSeek
emoji: 
colorFrom: yellow
colorTo: purple
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
short_description: Deepseek-DocuChat  Simple, intuitive, and descriptive.
---

📄 DocuChat - AI-Powered RAG Chatbot
DocuChat is a Retrieval-Augmented Generation (RAG) chatbot powered by DeepSeek and built with Streamlit. It allows users to upload documents (PDF, Word, Markdown) or provide a web link, process the content, and ask questions about it. The application uses semantic embeddings and a FAISS vector database for efficient retrieval and question-answering.

🚀 Features
Document Upload: Upload PDF, Word (.docx), or Markdown (.md) files.

Web Link Support: Provide a web link to extract and process content.

Semantic Search: Generate embeddings using sentence-transformers for semantic understanding.

Efficient Retrieval: Store embeddings in a FAISS vector database for fast and accurate querying.

Question-Answering: Use DeepSeek API for intelligent question-answering capabilities.

User-Friendly Interface: Built with Streamlit for an interactive and intuitive UI.

🛠️ Installation
Clone the Repository:

git clone https://github.com/your-username/DocuChat.git
cd DocuChat

Install Dependencies:
Make sure you have Python 3.8+ installed. Then, install the required packages:

pip install -r requirements.txt
Set Up DeepSeek API Key:

Obtain your API key from DeepSeek.

Add the API key in the Streamlit app when prompted.

🖥️ Usage
Run the Application:

streamlit run app.py
Input Your DeepSeek API Key:

Enter your API key in the provided field.

Upload a Document or Enter a Web Link:

Choose between uploading a document (PDF, Word, or Markdown) or providing a web link.

Ask Questions:

Once the document is processed, ask questions about its content.

🧩 How It Works
Document Processing:

The uploaded document or web content is split into smaller chunks for efficient processing.

Semantic embeddings are generated using sentence-transformers.

Vector Database:

Embeddings are stored in a FAISS vector database for fast and accurate retrieval.

Question-Answering:

When a user asks a question, the app retrieves the most relevant chunks from the vector database.

The DeepSeek API generates a response based on the retrieved information.

📂 File Structure
Copy
DocuChat/
├── app.py                  # Main Streamlit application
├── requirements.txt        # List of dependencies
├── README.md               # Project documentation
└── .gitignore              # Files to ignore in Git


📝 Requirements
Python 3.8+
Streamlit
LangChain
FAISS
Sentence-Transformers
PyPDF
Docx2txt
Unstructured (for Markdown files)
WebBaseLoader (for web links)

🔧 Dependencies
Install all dependencies using:
pip install -r requirements.txt


🌟 Why DocuChat?
Efficient: Processes documents once and retrieves answers quickly.

Versatile: Supports multiple file types and web links.

Intelligent: Uses state-of-the-art AI models for semantic understanding and question-answering.

User-Friendly: Simple and intuitive interface powered by Streamlit.

🤝 Contributing
Contributions are welcome! If you'd like to contribute, please follow these steps:

Fork the repository.

Create a new branch (git checkout -b feature/YourFeatureName).

Commit your changes (git commit -m 'Add some feature').

Push to the branch (git push origin feature/YourFeatureName).

Open a pull request.

📜 License
This project is licensed under the MIT License. See the LICENSE file for details.

🙏 Acknowledgments
DeepSeek for providing the question-answering API.

LangChain for the document processing and retrieval framework.

Streamlit for the interactive UI framework.

Sentence-Transformers for semantic embeddings.

📧 Contact
For questions or feedback, feel free to reach out:

sagunchalise@gmail.com

GitHub - https://github.com/schalise

Enjoy using DocuChat! 🎉
Let your documents speak for themselves. 🗣️