--- title: DocuChatDeepSeek emoji: ⚡ colorFrom: yellow colorTo: purple sdk: streamlit sdk_version: 1.41.1 app_file: app.py pinned: false short_description: Deepseek-DocuChat – Simple, intuitive, and descriptive. --- 📄 DocuChat - AI-Powered RAG Chatbot DocuChat is a Retrieval-Augmented Generation (RAG) chatbot powered by DeepSeek and built with Streamlit. It allows users to upload documents (PDF, Word, Markdown) or provide a web link, process the content, and ask questions about it. The application uses semantic embeddings and a FAISS vector database for efficient retrieval and question-answering. 🚀 Features Document Upload: Upload PDF, Word (.docx), or Markdown (.md) files. Web Link Support: Provide a web link to extract and process content. Semantic Search: Generate embeddings using sentence-transformers for semantic understanding. Efficient Retrieval: Store embeddings in a FAISS vector database for fast and accurate querying. Question-Answering: Use DeepSeek API for intelligent question-answering capabilities. User-Friendly Interface: Built with Streamlit for an interactive and intuitive UI. 🛠️ Installation Clone the Repository: git clone https://github.com/your-username/DocuChat.git cd DocuChat Install Dependencies: Make sure you have Python 3.8+ installed. Then, install the required packages: pip install -r requirements.txt Set Up DeepSeek API Key: Obtain your API key from DeepSeek. Add the API key in the Streamlit app when prompted. 🖥️ Usage Run the Application: streamlit run app.py Input Your DeepSeek API Key: Enter your API key in the provided field. Upload a Document or Enter a Web Link: Choose between uploading a document (PDF, Word, or Markdown) or providing a web link. Ask Questions: Once the document is processed, ask questions about its content. 🧩 How It Works Document Processing: The uploaded document or web content is split into smaller chunks for efficient processing. Semantic embeddings are generated using sentence-transformers. Vector Database: Embeddings are stored in a FAISS vector database for fast and accurate retrieval. Question-Answering: When a user asks a question, the app retrieves the most relevant chunks from the vector database. The DeepSeek API generates a response based on the retrieved information. 📂 File Structure Copy DocuChat/ ├── app.py # Main Streamlit application ├── requirements.txt # List of dependencies ├── README.md # Project documentation └── .gitignore # Files to ignore in Git 📝 Requirements Python 3.8+ Streamlit LangChain FAISS Sentence-Transformers PyPDF Docx2txt Unstructured (for Markdown files) WebBaseLoader (for web links) 🔧 Dependencies Install all dependencies using: pip install -r requirements.txt 🌟 Why DocuChat? Efficient: Processes documents once and retrieves answers quickly. Versatile: Supports multiple file types and web links. Intelligent: Uses state-of-the-art AI models for semantic understanding and question-answering. User-Friendly: Simple and intuitive interface powered by Streamlit. 🤝 Contributing Contributions are welcome! If you'd like to contribute, please follow these steps: Fork the repository. Create a new branch (git checkout -b feature/YourFeatureName). Commit your changes (git commit -m 'Add some feature'). Push to the branch (git push origin feature/YourFeatureName). Open a pull request. 📜 License This project is licensed under the MIT License. See the LICENSE file for details. 🙏 Acknowledgments DeepSeek for providing the question-answering API. LangChain for the document processing and retrieval framework. Streamlit for the interactive UI framework. Sentence-Transformers for semantic embeddings. 📧 Contact For questions or feedback, feel free to reach out: sagunchalise@gmail.com GitHub - https://github.com/schalise Enjoy using DocuChat! 🎉 Let your documents speak for themselves. 🗣️