File size: 4,216 Bytes
6bc30eb
 
 
 
 
 
936c81e
6bc30eb
 
 
 
1cfcd72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1bcf42f
 
1cfcd72
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
title: Developer Docs Chat
emoji: πŸ“˜
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: "5.35.0"
app_file: app.py
pinned: false
---

# πŸ“˜ Dev Docs Chat

A powerful RAG (Retrieval-Augmented Generation) system that allows you to upload documents, ingest content from URLs, and ask questions about your knowledge base with AI-powered answers.

## πŸš€ Features

### πŸ“ **Document Support**

- **PDF Files**: Extract and process PDF documents
- **Text Files**: Plain text document processing
- **Markdown Files**: Structured markdown with proper parsing
- **URL Ingestion**: Fetch and process content from web URLs

### 🎯 **Core Functionality**

- **Smart Search**: Vector-based semantic search across your documents
- **AI-Powered Q&A**: Get intelligent answers based on your content
- **Conversational Memory**: Maintains context across multiple questions

### πŸ—‚οΈ **Data Management**

- **File Upload**: Drag-and-drop interface for document ingestion
- **URL Ingestion**: Process web content with progress indicators
- **Delete Operations**: Remove files, URLs, and their embeddings
- **Bulk Clear**: Reset entire knowledge base with one click

## πŸ› οΈ Installation

### Prerequisites

- Python 3.10+
- pip package manager

### Setup Instructions

1. **Clone the repository**

   ```bash
   git clone <repository-url>
   cd dev_docs_chat
   ```

2. **Create virtual environment**

   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```

3. **Install dependencies**

   ```bash
   pip install -r requirements.txt
   ```

4. **Set up environment variables**
   Create a `.env` file in the project root:

   ```env
   GROQ_API_KEY=your_groq_api_key_here
   GROQ_API_BASE=https://api.groq.com/openai/v1
   ```

5. **Get API Key**
   - Sign up at [Groq](https://console.groq.com/)
   - Generate an API key
   - Add it to your `.env` file

## πŸš€ Usage

### Starting the Application

```bash
python app.py
```

The application will be available at `http://127.0.0.1:7860`

## πŸ“ Project Structure

```
dev-docs-chat/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ qa_pipeline.py        # Question-answering logic
β”œβ”€β”€ ingestion.py          # Document ingestion logic
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ .env                 # Environment variables (create this)
β”œβ”€β”€ chroma_db/          # Vector database storage
β”œβ”€β”€ uploads/            # Uploaded file storage
β”œβ”€β”€ ingested_urls.txt   # List of ingested URLs
└── README.md           # This file
```

## πŸ”§ Technical Details

### **Architecture**

- **Vector Database**: ChromaDB for efficient similarity search
- **Embeddings**: HuggingFace sentence-transformers
- **LLM**: Groq's fast LLM for quick responses
- **Framework**: Gradio for web interface

## 🎯 Use Cases

### **πŸ“š Documentation Assistant**

- Upload project documentation and README files
- Ask questions about implementation details
- Get instant answers about your codebase

### **πŸ” Research Tool**

- Ingest research papers and technical articles
- Ask questions about new technologies
- Stay updated with industry trends

### **πŸ“– Learning Platform**

- Upload tutorials and educational content
- Ask questions about complex topics
- Get personalized explanations

## πŸ“ˆ Future Enhancements

- [ ] **Streaming Responses**: Real-time answer generation
- [ ] **File Type Support**: Excel, Word, PowerPoint documents
- [ ] **Advanced Search**: Filters and date-based search
- [ ] **Export Features**: Save conversations and answers
- [ ] **User Authentication**: Multi-user support
- [ ] **API Endpoints**: REST API for integration

## 🀝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

## πŸ™ Acknowledgments

- **LangChain**: For the RAG framework
- **ChromaDB**: For vector storage
- **Gradio**: For the web interface
- **Groq**: For fast LLM inference
- **HuggingFace**: For embedding models

---

**Made with ❀️ by Govind Kurapati**