KUNAL SHAW commited on
Commit Β·
f3b2748
1
Parent(s): 74c92bd
chore: initial public commit - IMSKOS core (no secrets)
Browse files- .env.example +16 -0
- LICENSE +21 -0
- README.md +385 -0
- app.py +698 -0
- requirements.txt +34 -0
.env.example
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ==================== IMSKOS Configuration ====================
|
| 2 |
+
# Intelligent Multi-Source Knowledge Orchestration System
|
| 3 |
+
# Environment Variables Configuration
|
| 4 |
+
|
| 5 |
+
# DataStax Astra DB Configuration
|
| 6 |
+
# Get these from: https://astra.datastax.com
|
| 7 |
+
ASTRA_DB_APPLICATION_TOKEN=AstraCS:your_token_here
|
| 8 |
+
ASTRA_DB_ID=your_database_id_here
|
| 9 |
+
|
| 10 |
+
# Groq API Configuration
|
| 11 |
+
# Get your API key from: https://console.groq.com
|
| 12 |
+
GROQ_API_KEY=your_groq_api_key_here
|
| 13 |
+
|
| 14 |
+
# Optional: Application Configuration
|
| 15 |
+
# APP_PORT=8501
|
| 16 |
+
# LOG_LEVEL=INFO
|
LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# MIT License
|
| 2 |
+
|
| 3 |
+
Copyright (c) 2025 [Your Name]
|
| 4 |
+
|
| 5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
+
of this software and associated documentation files (the "Software"), to deal
|
| 7 |
+
in the Software without restriction, including without limitation the rights
|
| 8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
| 9 |
+
copies of the Software, and to permit persons to whom the Software is
|
| 10 |
+
furnished to do so, subject to the following conditions:
|
| 11 |
+
|
| 12 |
+
The above copyright notice and this permission notice shall be included in all
|
| 13 |
+
copies or substantial portions of the Software.
|
| 14 |
+
|
| 15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
| 16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
| 18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
| 19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
| 20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
| 21 |
+
SOFTWARE.
|
README.md
ADDED
|
@@ -0,0 +1,385 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π§ IMSKOS - Intelligent Multi-Source Knowledge Orchestration System
|
| 2 |
+
|
| 3 |
+
[](https://www.python.org/downloads/)
|
| 4 |
+
[](https://langchain.com/)
|
| 5 |
+
[](https://github.com/langchain-ai/langgraph)
|
| 6 |
+
[](https://streamlit.io/)
|
| 7 |
+
[](https://opensource.org/licenses/MIT)
|
| 8 |
+
|
| 9 |
+
> **Enterprise-Grade Agentic RAG Framework with Adaptive Query Routing**
|
| 10 |
+
|
| 11 |
+
An advanced production-ready system that intelligently orchestrates knowledge retrieval from multiple sources using state-of-the-art LangGraph workflows, distributed vector storage with DataStax Astra DB, and high-performance LLM inference via Groq.
|
| 12 |
+
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
## π― Project Overview
|
| 16 |
+
|
| 17 |
+
**IMSKOS** represents a paradigm shift in intelligent information retrieval by combining:
|
| 18 |
+
|
| 19 |
+
- **π Adaptive Query Routing**: LLM-powered decision engine that dynamically routes queries to optimal data sources
|
| 20 |
+
- **ποΈ Distributed Vector Storage**: Scalable DataStax Astra DB for production-grade vector operations
|
| 21 |
+
- **β‘ High-Performance Inference**: Groq's lightning-fast LLM API for sub-second responses
|
| 22 |
+
- **π Stateful Workflows**: LangGraph for complex, multi-step retrieval orchestration
|
| 23 |
+
- **π¨ Modern UI/UX**: Professional Streamlit interface with real-time analytics
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## ποΈ System Architecture
|
| 28 |
+
|
| 29 |
+
```
|
| 30 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 31 |
+
β User Query Interface β
|
| 32 |
+
β (Streamlit App) β
|
| 33 |
+
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
|
| 34 |
+
β
|
| 35 |
+
βΌ
|
| 36 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 37 |
+
β Intelligent Query Router (Groq LLM) β
|
| 38 |
+
β Analyzes query β Determines optimal source β
|
| 39 |
+
ββββββββββββββββ¬βββββββββββββββββββββββββββββ¬ββββββββββββββββββ
|
| 40 |
+
β β
|
| 41 |
+
βΌ βΌ
|
| 42 |
+
ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ
|
| 43 |
+
β Vector Store Retrieval β β Wikipedia External Search β
|
| 44 |
+
β (Astra DB + Cassandra) β β (LangChain Wikipedia Tool) β
|
| 45 |
+
β - AI/ML Content β β - General Knowledge β
|
| 46 |
+
β - Technical Docs β β - Current Events β
|
| 47 |
+
ββββββββββββββββ¬ββββββββββββ ββββββββββββββββ¬ββββββββββββββββ
|
| 48 |
+
β β
|
| 49 |
+
ββββββββββββββββ¬ββββββββββββββββ
|
| 50 |
+
βΌ
|
| 51 |
+
βββββββββββββββββββββββ
|
| 52 |
+
β LangGraph Workflowβ
|
| 53 |
+
β State Management β
|
| 54 |
+
β Result Aggregationβ
|
| 55 |
+
ββββββββββββ¬βββββββββββ
|
| 56 |
+
βΌ
|
| 57 |
+
βββββββββββββββββββββββ
|
| 58 |
+
β Formatted Response β
|
| 59 |
+
β + Analytics β
|
| 60 |
+
βββββββββββββββββββββββ
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
## β¨ Key Features
|
| 66 |
+
|
| 67 |
+
### π― Intelligent Capabilities
|
| 68 |
+
|
| 69 |
+
| Feature | Description | Technology |
|
| 70 |
+
|---------|-------------|------------|
|
| 71 |
+
| **Adaptive Routing** | Context-aware query routing to optimal data sources | Groq LLM + Pydantic |
|
| 72 |
+
| **Semantic Search** | Deep semantic understanding with transformer embeddings | HuggingFace Embeddings |
|
| 73 |
+
| **Multi-Source Fusion** | Seamless integration of proprietary and public knowledge | LangGraph |
|
| 74 |
+
| **Real-time Analytics** | Query performance monitoring and routing statistics | Streamlit |
|
| 75 |
+
| **Scalable Storage** | Distributed vector database with auto-scaling | DataStax Astra DB |
|
| 76 |
+
|
| 77 |
+
### π§ Technical Highlights
|
| 78 |
+
|
| 79 |
+
- **ποΈ Production-Ready Architecture**: Modular design with separation of concerns
|
| 80 |
+
- **π Security-First**: Environment variable management, no hardcoded credentials
|
| 81 |
+
- **π Observable**: Built-in analytics dashboard and query history
|
| 82 |
+
- **π Performance Optimized**: Caching, efficient document chunking, parallel processing
|
| 83 |
+
- **π¨ Professional UI**: Modern, responsive interface with custom CSS styling
|
| 84 |
+
- **π Scalable**: Handles growing document collections without performance degradation
|
| 85 |
+
|
| 86 |
+
---
|
| 87 |
+
|
| 88 |
+
## π Quick Start
|
| 89 |
+
|
| 90 |
+
### Prerequisites
|
| 91 |
+
|
| 92 |
+
- Python 3.9 or higher
|
| 93 |
+
- DataStax Astra DB account ([Sign up free](https://astra.datastax.com))
|
| 94 |
+
- Groq API key ([Get API key](https://console.groq.com))
|
| 95 |
+
|
| 96 |
+
### Installation
|
| 97 |
+
|
| 98 |
+
1. **Clone the repository:**
|
| 99 |
+
```bash
|
| 100 |
+
git clone https://github.com/yourusername/IMSKOS.git
|
| 101 |
+
cd IMSKOS
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
2. **Create virtual environment:**
|
| 105 |
+
```bash
|
| 106 |
+
python -m venv venv
|
| 107 |
+
|
| 108 |
+
# Windows
|
| 109 |
+
venv\Scripts\activate
|
| 110 |
+
|
| 111 |
+
# Linux/Mac
|
| 112 |
+
source venv/bin/activate
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
3. **Install dependencies:**
|
| 116 |
+
```bash
|
| 117 |
+
pip install -r requirements.txt
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
4. **Configure environment variables:**
|
| 121 |
+
```bash
|
| 122 |
+
# Copy example file
|
| 123 |
+
cp .env.example .env
|
| 124 |
+
|
| 125 |
+
# Edit .env with your credentials
|
| 126 |
+
# ASTRA_DB_APPLICATION_TOKEN=your_token_here
|
| 127 |
+
# ASTRA_DB_ID=your_database_id_here
|
| 128 |
+
# GROQ_API_KEY=your_groq_api_key_here
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
5. **Run the application:**
|
| 132 |
+
```bash
|
| 133 |
+
streamlit run app.py
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
6. **Access the application:**
|
| 137 |
+
Open your browser and navigate to `http://localhost:8501`
|
| 138 |
+
|
| 139 |
+
---
|
| 140 |
+
|
| 141 |
+
## π Usage Guide
|
| 142 |
+
|
| 143 |
+
### Step 1: Index Your Knowledge Base
|
| 144 |
+
|
| 145 |
+
1. Navigate to the **"Knowledge Base Indexing"** tab
|
| 146 |
+
2. Add URLs of documents you want to index (default includes AI/ML research papers)
|
| 147 |
+
3. Click **"Index Documents"** to process and store in Astra DB
|
| 148 |
+
4. Wait for the indexing process to complete (progress shown in real-time)
|
| 149 |
+
|
| 150 |
+
### Step 2: Execute Intelligent Queries
|
| 151 |
+
|
| 152 |
+
1. Switch to the **"Intelligent Query"** tab
|
| 153 |
+
2. Enter your question in the text input
|
| 154 |
+
3. Click **"Execute Query"**
|
| 155 |
+
4. The system will:
|
| 156 |
+
- Analyze your query
|
| 157 |
+
- Route to optimal data source (Vector Store or Wikipedia)
|
| 158 |
+
- Retrieve relevant information
|
| 159 |
+
- Display results with metadata
|
| 160 |
+
|
| 161 |
+
### Step 3: Monitor Performance
|
| 162 |
+
|
| 163 |
+
1. Visit the **"Analytics"** tab to see:
|
| 164 |
+
- Total queries executed
|
| 165 |
+
- Routing distribution (Vector Store vs Wikipedia)
|
| 166 |
+
- Average execution time
|
| 167 |
+
- Complete query history
|
| 168 |
+
|
| 169 |
+
---
|
| 170 |
+
|
| 171 |
+
## π Example Queries
|
| 172 |
+
|
| 173 |
+
### Vector Store Queries (Routed to Astra DB)
|
| 174 |
+
```
|
| 175 |
+
β
"What are the types of agent memory?"
|
| 176 |
+
β
"Explain chain of thought prompting techniques"
|
| 177 |
+
β
"How do adversarial attacks work on large language models?"
|
| 178 |
+
β
"What is ReAct prompting?"
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
### Wikipedia Queries (Routed to External Search)
|
| 182 |
+
```
|
| 183 |
+
β
"Who is Elon Musk?"
|
| 184 |
+
β
"What is quantum computing?"
|
| 185 |
+
β
"Tell me about the Marvel Avengers"
|
| 186 |
+
β
"History of artificial intelligence"
|
| 187 |
+
```
|
| 188 |
+
|
| 189 |
+
---
|
| 190 |
+
|
| 191 |
+
## π’ Production Deployment
|
| 192 |
+
|
| 193 |
+
### Deploying to Streamlit Cloud
|
| 194 |
+
|
| 195 |
+
1. **Push to GitHub:**
|
| 196 |
+
```bash
|
| 197 |
+
git init
|
| 198 |
+
git add .
|
| 199 |
+
git commit -m "Initial commit: IMSKOS production deployment"
|
| 200 |
+
git branch -M main
|
| 201 |
+
git remote add origin https://github.com/yourusername/IMSKOS.git
|
| 202 |
+
git push -u origin main
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
2. **Configure Streamlit Cloud:**
|
| 206 |
+
- Go to [share.streamlit.io](https://share.streamlit.io)
|
| 207 |
+
- Click "New app"
|
| 208 |
+
- Select your repository
|
| 209 |
+
- Set main file: `app.py`
|
| 210 |
+
- Add secrets in "Advanced settings":
|
| 211 |
+
```toml
|
| 212 |
+
ASTRA_DB_APPLICATION_TOKEN = "your_token"
|
| 213 |
+
ASTRA_DB_ID = "your_database_id"
|
| 214 |
+
GROQ_API_KEY = "your_groq_key"
|
| 215 |
+
```
|
| 216 |
+
|
| 217 |
+
3. **Deploy!**
|
| 218 |
+
|
| 219 |
+
### Alternative Deployment Options
|
| 220 |
+
|
| 221 |
+
#### Docker Deployment
|
| 222 |
+
```dockerfile
|
| 223 |
+
# Dockerfile
|
| 224 |
+
FROM python:3.9-slim
|
| 225 |
+
|
| 226 |
+
WORKDIR /app
|
| 227 |
+
COPY requirements.txt .
|
| 228 |
+
RUN pip install -r requirements.txt
|
| 229 |
+
|
| 230 |
+
COPY . .
|
| 231 |
+
|
| 232 |
+
EXPOSE 8501
|
| 233 |
+
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
|
| 234 |
+
```
|
| 235 |
+
|
| 236 |
+
```bash
|
| 237 |
+
# Build and run
|
| 238 |
+
docker build -t imskos .
|
| 239 |
+
docker run -p 8501:8501 --env-file .env imskos
|
| 240 |
+
```
|
| 241 |
+
|
| 242 |
+
#### AWS/GCP/Azure Deployment
|
| 243 |
+
See detailed deployment guides in the `/docs` folder (coming soon).
|
| 244 |
+
|
| 245 |
+
---
|
| 246 |
+
|
| 247 |
+
## π§ Configuration
|
| 248 |
+
|
| 249 |
+
### Environment Variables
|
| 250 |
+
|
| 251 |
+
| Variable | Description | Required | Default |
|
| 252 |
+
|----------|-------------|----------|---------|
|
| 253 |
+
| `ASTRA_DB_APPLICATION_TOKEN` | DataStax Astra DB token | Yes | - |
|
| 254 |
+
| `ASTRA_DB_ID` | Astra DB instance ID | Yes | - |
|
| 255 |
+
| `GROQ_API_KEY` | Groq API authentication key | Yes | - |
|
| 256 |
+
|
| 257 |
+
### Customization Options
|
| 258 |
+
|
| 259 |
+
**Modify document chunking:**
|
| 260 |
+
```python
|
| 261 |
+
# In app.py - KnowledgeBaseManager.load_and_process_documents()
|
| 262 |
+
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
|
| 263 |
+
chunk_size=500, # Adjust chunk size
|
| 264 |
+
chunk_overlap=50 # Adjust overlap
|
| 265 |
+
)
|
| 266 |
+
```
|
| 267 |
+
|
| 268 |
+
**Change embedding model:**
|
| 269 |
+
```python
|
| 270 |
+
# In app.py - KnowledgeBaseManager.setup_embeddings()
|
| 271 |
+
self.embeddings = HuggingFaceEmbeddings(
|
| 272 |
+
model_name="all-MiniLM-L6-v2" # Try: "all-mpnet-base-v2" for higher quality
|
| 273 |
+
)
|
| 274 |
+
```
|
| 275 |
+
|
| 276 |
+
**Adjust LLM parameters:**
|
| 277 |
+
```python
|
| 278 |
+
# In app.py - IntelligentRouter.initialize()
|
| 279 |
+
self.llm = ChatGroq(
|
| 280 |
+
model_name="llama-3.1-8b-instant", # Try other Groq models
|
| 281 |
+
temperature=0 # Increase for more creative responses
|
| 282 |
+
)
|
| 283 |
+
```
|
| 284 |
+
|
| 285 |
+
---
|
| 286 |
+
|
| 287 |
+
## π Performance Benchmarks
|
| 288 |
+
|
| 289 |
+
| Metric | Value | Notes |
|
| 290 |
+
|--------|-------|-------|
|
| 291 |
+
| **Query Latency** | < 2s | Average end-to-end response time |
|
| 292 |
+
| **Embedding Generation** | ~100ms | Per document chunk |
|
| 293 |
+
| **Vector Search** | < 500ms | Top-K retrieval from Astra DB |
|
| 294 |
+
| **LLM Routing** | < 300ms | Groq inference time |
|
| 295 |
+
| **Concurrent Users** | 50+ | Tested on Streamlit Cloud |
|
| 296 |
+
|
| 297 |
+
---
|
| 298 |
+
|
| 299 |
+
## π οΈ Technology Stack
|
| 300 |
+
|
| 301 |
+
### Core Framework
|
| 302 |
+
- **[Streamlit](https://streamlit.io/)** - Interactive web application framework
|
| 303 |
+
- **[LangChain](https://langchain.com/)** - LLM application framework
|
| 304 |
+
- **[LangGraph](https://github.com/langchain-ai/langgraph)** - Stateful workflow orchestration
|
| 305 |
+
|
| 306 |
+
### AI/ML Components
|
| 307 |
+
- **[Groq](https://groq.com/)** - High-performance LLM inference
|
| 308 |
+
- **[HuggingFace Transformers](https://huggingface.co/)** - Sentence embeddings
|
| 309 |
+
- **[DataStax Astra DB](https://astra.datastax.com)** - Vector database
|
| 310 |
+
|
| 311 |
+
### Supporting Libraries
|
| 312 |
+
- **Pydantic** - Data validation and settings management
|
| 313 |
+
- **BeautifulSoup4** - Web scraping and HTML parsing
|
| 314 |
+
- **TikToken** - Token counting and text splitting
|
| 315 |
+
- **Wikipedia API** - External knowledge retrieval
|
| 316 |
+
|
| 317 |
+
---
|
| 318 |
+
|
| 319 |
+
## π Roadmap
|
| 320 |
+
|
| 321 |
+
### Version 1.1 (Planned)
|
| 322 |
+
- [ ] Multi-modal support (images, PDFs)
|
| 323 |
+
- [ ] Advanced RAG techniques (HyDE, Multi-Query)
|
| 324 |
+
- [ ] Custom document upload via UI
|
| 325 |
+
- [ ] Export results to PDF/Markdown
|
| 326 |
+
- [ ] User authentication & session management
|
| 327 |
+
|
| 328 |
+
### Version 2.0 (Future)
|
| 329 |
+
- [ ] Multi-language support
|
| 330 |
+
- [ ] Graph RAG integration
|
| 331 |
+
- [ ] Real-time collaborative features
|
| 332 |
+
- [ ] API endpoints for programmatic access
|
| 333 |
+
- [ ] Advanced analytics dashboard
|
| 334 |
+
|
| 335 |
+
---
|
| 336 |
+
|
| 337 |
+
## π€ Contributing
|
| 338 |
+
|
| 339 |
+
Contributions are welcome! Please follow these steps:
|
| 340 |
+
|
| 341 |
+
1. Fork the repository
|
| 342 |
+
2. Create a feature branch (`git checkout -b feature/AmazingFeature`)
|
| 343 |
+
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
|
| 344 |
+
4. Push to the branch (`git push origin feature/AmazingFeature`)
|
| 345 |
+
5. Open a Pull Request
|
| 346 |
+
|
| 347 |
+
---
|
| 348 |
+
|
| 349 |
+
## π License
|
| 350 |
+
|
| 351 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
| 352 |
+
|
| 353 |
+
---
|
| 354 |
+
|
| 355 |
+
## π Acknowledgments
|
| 356 |
+
|
| 357 |
+
- LangChain team for the amazing framework
|
| 358 |
+
- DataStax for Astra DB and Cassandra support
|
| 359 |
+
- Groq for lightning-fast LLM inference
|
| 360 |
+
- HuggingFace for open-source embeddings
|
| 361 |
+
- Streamlit for the intuitive app framework
|
| 362 |
+
|
| 363 |
+
---
|
| 364 |
+
|
| 365 |
+
## π Contact & Support
|
| 366 |
+
|
| 367 |
+
- **GitHub Issues**: [Report bugs or request features](https://github.com/yourusername/IMSKOS/issues)
|
| 368 |
+
- **Email**: your.email@example.com
|
| 369 |
+
- **LinkedIn**: [Your Profile](https://linkedin.com/in/yourprofile)
|
| 370 |
+
|
| 371 |
+
---
|
| 372 |
+
|
| 373 |
+
## π Star History
|
| 374 |
+
|
| 375 |
+
If you find this project useful, please consider giving it a β!
|
| 376 |
+
|
| 377 |
+
---
|
| 378 |
+
|
| 379 |
+
<div align="center">
|
| 380 |
+
|
| 381 |
+
**Built with β€οΈ using LangGraph, Astra DB, and Groq**
|
| 382 |
+
|
| 383 |
+
*Elevating Information Retrieval to Intelligence*
|
| 384 |
+
|
| 385 |
+
</div>
|
app.py
ADDED
|
@@ -0,0 +1,698 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
π€ Intelligent Multi-Source Knowledge Orchestration System (IMSKOS)
|
| 3 |
+
================================================================
|
| 4 |
+
Advanced Agentic RAG Framework with Dynamic Routing & Distributed Vector Storage
|
| 5 |
+
|
| 6 |
+
An enterprise-grade, production-ready intelligent query routing system that leverages:
|
| 7 |
+
- LangGraph for stateful workflow orchestration
|
| 8 |
+
- DataStax Astra DB for distributed vector storage
|
| 9 |
+
- Groq LLM for high-performance inference
|
| 10 |
+
- Adaptive routing between proprietary knowledge base and Wikipedia
|
| 11 |
+
- Real-time semantic search with HuggingFace embeddings
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import streamlit as st
|
| 15 |
+
import os
|
| 16 |
+
from typing import List, Dict, Any
|
| 17 |
+
import cassio
|
| 18 |
+
from langchain_text_splitters import RecursiveCharacterTextSplitter
|
| 19 |
+
from langchain_community.document_loaders import WebBaseLoader
|
| 20 |
+
from langchain_community.vectorstores import Cassandra
|
| 21 |
+
from langchain_huggingface import HuggingFaceEmbeddings
|
| 22 |
+
from langchain_community.utilities import WikipediaAPIWrapper
|
| 23 |
+
from langchain_community.tools import WikipediaQueryRun
|
| 24 |
+
from langchain_core.prompts import ChatPromptTemplate
|
| 25 |
+
from langchain_groq import ChatGroq
|
| 26 |
+
from langchain_core.documents import Document
|
| 27 |
+
from langgraph.graph import END, StateGraph, START
|
| 28 |
+
from typing_extensions import TypedDict
|
| 29 |
+
from pydantic import BaseModel, Field
|
| 30 |
+
from typing import Literal
|
| 31 |
+
import time
|
| 32 |
+
import json
|
| 33 |
+
from datetime import datetime
|
| 34 |
+
|
| 35 |
+
# Page Configuration
|
| 36 |
+
st.set_page_config(
|
| 37 |
+
page_title="IMSKOS - Intelligent Knowledge Orchestration",
|
| 38 |
+
page_icon="π§ ",
|
| 39 |
+
layout="wide",
|
| 40 |
+
initial_sidebar_state="expanded"
|
| 41 |
+
)
|
| 42 |
+
|
| 43 |
+
# Custom CSS for modern UI
|
| 44 |
+
st.markdown("""
|
| 45 |
+
<style>
|
| 46 |
+
.main-header {
|
| 47 |
+
font-size: 3rem;
|
| 48 |
+
font-weight: bold;
|
| 49 |
+
background: linear-gradient(120deg, #667eea 0%, #764ba2 100%);
|
| 50 |
+
-webkit-background-clip: text;
|
| 51 |
+
-webkit-text-fill-color: transparent;
|
| 52 |
+
text-align: center;
|
| 53 |
+
padding: 1rem 0;
|
| 54 |
+
}
|
| 55 |
+
.sub-header {
|
| 56 |
+
text-align: center;
|
| 57 |
+
color: #666;
|
| 58 |
+
font-size: 1.2rem;
|
| 59 |
+
margin-bottom: 2rem;
|
| 60 |
+
}
|
| 61 |
+
.metric-card {
|
| 62 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 63 |
+
padding: 1.5rem;
|
| 64 |
+
border-radius: 10px;
|
| 65 |
+
color: white;
|
| 66 |
+
box-shadow: 0 4px 6px rgba(0,0,0,0.1);
|
| 67 |
+
}
|
| 68 |
+
.info-box {
|
| 69 |
+
background-color: #f0f2f6;
|
| 70 |
+
padding: 1.5rem;
|
| 71 |
+
border-radius: 10px;
|
| 72 |
+
border-left: 5px solid #667eea;
|
| 73 |
+
margin: 1rem 0;
|
| 74 |
+
}
|
| 75 |
+
.stButton>button {
|
| 76 |
+
background: linear-gradient(120deg, #667eea 0%, #764ba2 100%);
|
| 77 |
+
color: white;
|
| 78 |
+
font-weight: bold;
|
| 79 |
+
border-radius: 10px;
|
| 80 |
+
padding: 0.5rem 2rem;
|
| 81 |
+
border: none;
|
| 82 |
+
box-shadow: 0 4px 6px rgba(0,0,0,0.1);
|
| 83 |
+
}
|
| 84 |
+
.success-box {
|
| 85 |
+
background-color: #d4edda;
|
| 86 |
+
border-left: 5px solid #28a745;
|
| 87 |
+
padding: 1rem;
|
| 88 |
+
border-radius: 5px;
|
| 89 |
+
margin: 1rem 0;
|
| 90 |
+
}
|
| 91 |
+
.route-indicator {
|
| 92 |
+
display: inline-block;
|
| 93 |
+
padding: 0.5rem 1rem;
|
| 94 |
+
border-radius: 20px;
|
| 95 |
+
font-weight: bold;
|
| 96 |
+
margin: 0.5rem 0;
|
| 97 |
+
}
|
| 98 |
+
.route-vector {
|
| 99 |
+
background-color: #e3f2fd;
|
| 100 |
+
color: #1565c0;
|
| 101 |
+
}
|
| 102 |
+
.route-wiki {
|
| 103 |
+
background-color: #fff3e0;
|
| 104 |
+
color: #e65100;
|
| 105 |
+
}
|
| 106 |
+
</style>
|
| 107 |
+
""", unsafe_allow_html=True)
|
| 108 |
+
|
| 109 |
+
# ==================== Configuration & Initialization ====================
|
| 110 |
+
|
| 111 |
+
class Config:
|
| 112 |
+
"""Centralized configuration management"""
|
| 113 |
+
|
| 114 |
+
@staticmethod
|
| 115 |
+
def load_env_variables():
|
| 116 |
+
"""Load and validate environment variables"""
|
| 117 |
+
required_vars = {
|
| 118 |
+
"ASTRA_DB_APPLICATION_TOKEN": os.getenv("ASTRA_DB_APPLICATION_TOKEN"),
|
| 119 |
+
"ASTRA_DB_ID": os.getenv("ASTRA_DB_ID"),
|
| 120 |
+
"GROQ_API_KEY": os.getenv("GROQ_API_KEY")
|
| 121 |
+
}
|
| 122 |
+
|
| 123 |
+
missing_vars = [key for key, value in required_vars.items() if not value]
|
| 124 |
+
|
| 125 |
+
if missing_vars:
|
| 126 |
+
st.error(f"β οΈ Missing environment variables: {', '.join(missing_vars)}")
|
| 127 |
+
st.info("Please set these in your .env file or Streamlit secrets")
|
| 128 |
+
st.stop()
|
| 129 |
+
|
| 130 |
+
return required_vars
|
| 131 |
+
|
| 132 |
+
@staticmethod
|
| 133 |
+
def get_default_urls():
|
| 134 |
+
"""Default knowledge base URLs"""
|
| 135 |
+
return [
|
| 136 |
+
"https://lilianweng.github.io/posts/2023-06-23-agent/",
|
| 137 |
+
"https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
|
| 138 |
+
"https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
|
| 139 |
+
]
|
| 140 |
+
|
| 141 |
+
# ==================== State Management Classes ====================
|
| 142 |
+
|
| 143 |
+
class RouteQuery(BaseModel):
|
| 144 |
+
"""Pydantic model for query routing decisions"""
|
| 145 |
+
datasource: Literal["vectorstore", "wiki_search"] = Field(
|
| 146 |
+
...,
|
| 147 |
+
description="Route query to wikipedia or vectorstore based on content",
|
| 148 |
+
)
|
| 149 |
+
|
| 150 |
+
class GraphState(TypedDict):
|
| 151 |
+
"""LangGraph state schema"""
|
| 152 |
+
question: str
|
| 153 |
+
generation: str
|
| 154 |
+
documents: List[str]
|
| 155 |
+
|
| 156 |
+
# ==================== Core System Classes ====================
|
| 157 |
+
|
| 158 |
+
class KnowledgeBaseManager:
|
| 159 |
+
"""Manages document ingestion and vector storage"""
|
| 160 |
+
|
| 161 |
+
def __init__(self, astra_token: str, astra_db_id: str):
|
| 162 |
+
self.astra_token = astra_token
|
| 163 |
+
self.astra_db_id = astra_db_id
|
| 164 |
+
self.embeddings = None
|
| 165 |
+
self.vector_store = None
|
| 166 |
+
|
| 167 |
+
def initialize_cassandra(self):
|
| 168 |
+
"""Initialize Cassandra connection"""
|
| 169 |
+
cassio.init(token=self.astra_token, database_id=self.astra_db_id)
|
| 170 |
+
|
| 171 |
+
def setup_embeddings(self):
|
| 172 |
+
"""Initialize HuggingFace embeddings"""
|
| 173 |
+
self.embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
|
| 174 |
+
|
| 175 |
+
def load_and_process_documents(self, urls: List[str], progress_callback=None):
|
| 176 |
+
"""Load, split, and index documents"""
|
| 177 |
+
if progress_callback:
|
| 178 |
+
progress_callback("Loading documents from URLs...")
|
| 179 |
+
|
| 180 |
+
docs = []
|
| 181 |
+
for i, url in enumerate(urls):
|
| 182 |
+
try:
|
| 183 |
+
loader = WebBaseLoader(url)
|
| 184 |
+
docs.extend(loader.load())
|
| 185 |
+
if progress_callback:
|
| 186 |
+
progress_callback(f"Loaded {i+1}/{len(urls)} documents")
|
| 187 |
+
except Exception as e:
|
| 188 |
+
st.warning(f"Failed to load {url}: {str(e)}")
|
| 189 |
+
|
| 190 |
+
if progress_callback:
|
| 191 |
+
progress_callback("Splitting documents into chunks...")
|
| 192 |
+
|
| 193 |
+
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
|
| 194 |
+
chunk_size=500, chunk_overlap=50
|
| 195 |
+
)
|
| 196 |
+
doc_splits = text_splitter.split_documents(docs)
|
| 197 |
+
|
| 198 |
+
return doc_splits
|
| 199 |
+
|
| 200 |
+
def create_vector_store(self):
|
| 201 |
+
"""Initialize Astra DB vector store"""
|
| 202 |
+
self.vector_store = Cassandra(
|
| 203 |
+
embedding=self.embeddings,
|
| 204 |
+
table_name="intelligent_knowledge_base",
|
| 205 |
+
session=None,
|
| 206 |
+
keyspace=None
|
| 207 |
+
)
|
| 208 |
+
return self.vector_store
|
| 209 |
+
|
| 210 |
+
def add_documents(self, documents: List[Document], progress_callback=None):
|
| 211 |
+
"""Add documents to vector store"""
|
| 212 |
+
if progress_callback:
|
| 213 |
+
progress_callback("Indexing documents in Astra DB...")
|
| 214 |
+
|
| 215 |
+
self.vector_store.add_documents(documents)
|
| 216 |
+
|
| 217 |
+
if progress_callback:
|
| 218 |
+
progress_callback(f"Successfully indexed {len(documents)} document chunks")
|
| 219 |
+
|
| 220 |
+
class IntelligentRouter:
|
| 221 |
+
"""LLM-powered query router"""
|
| 222 |
+
|
| 223 |
+
def __init__(self, groq_api_key: str):
|
| 224 |
+
self.groq_api_key = groq_api_key
|
| 225 |
+
self.llm = None
|
| 226 |
+
self.question_router = None
|
| 227 |
+
|
| 228 |
+
def initialize(self):
|
| 229 |
+
"""Set up LLM and routing chain"""
|
| 230 |
+
self.llm = ChatGroq(
|
| 231 |
+
groq_api_key=self.groq_api_key,
|
| 232 |
+
model_name="llama-3.1-8b-instant",
|
| 233 |
+
temperature=0
|
| 234 |
+
)
|
| 235 |
+
|
| 236 |
+
structured_llm = self.llm.with_structured_output(RouteQuery)
|
| 237 |
+
|
| 238 |
+
system_prompt = """You are an expert at routing user questions to the most relevant data source.
|
| 239 |
+
|
| 240 |
+
The vectorstore contains specialized documents about:
|
| 241 |
+
- AI Agents and their architectures
|
| 242 |
+
- Prompt Engineering techniques and best practices
|
| 243 |
+
- Adversarial attacks on Large Language Models
|
| 244 |
+
- Machine Learning security concepts
|
| 245 |
+
|
| 246 |
+
Route to 'vectorstore' for questions about these topics.
|
| 247 |
+
Route to 'wiki_search' for general knowledge, current events, people, places, or topics outside the vectorstore domain.
|
| 248 |
+
|
| 249 |
+
Be precise in your routing decisions."""
|
| 250 |
+
|
| 251 |
+
route_prompt = ChatPromptTemplate.from_messages([
|
| 252 |
+
("system", system_prompt),
|
| 253 |
+
("human", "{question}"),
|
| 254 |
+
])
|
| 255 |
+
|
| 256 |
+
self.question_router = route_prompt | structured_llm
|
| 257 |
+
|
| 258 |
+
def route(self, question: str) -> str:
|
| 259 |
+
"""Route question to appropriate data source"""
|
| 260 |
+
result = self.question_router.invoke({"question": question})
|
| 261 |
+
return result.datasource
|
| 262 |
+
|
| 263 |
+
class AdaptiveRAGWorkflow:
|
| 264 |
+
"""LangGraph-based adaptive retrieval workflow"""
|
| 265 |
+
|
| 266 |
+
def __init__(self, vector_store, question_router):
|
| 267 |
+
self.vector_store = vector_store
|
| 268 |
+
self.question_router = question_router
|
| 269 |
+
self.retriever = vector_store.as_retriever(search_kwargs={"k": 4})
|
| 270 |
+
self.wiki = self._setup_wikipedia()
|
| 271 |
+
self.workflow = None
|
| 272 |
+
self.app = None
|
| 273 |
+
|
| 274 |
+
def _setup_wikipedia(self):
|
| 275 |
+
"""Initialize Wikipedia search tool"""
|
| 276 |
+
api_wrapper = WikipediaAPIWrapper(
|
| 277 |
+
top_k_results=1,
|
| 278 |
+
doc_content_chars_max=500
|
| 279 |
+
)
|
| 280 |
+
return WikipediaQueryRun(api_wrapper=api_wrapper)
|
| 281 |
+
|
| 282 |
+
def retrieve(self, state: Dict) -> Dict:
|
| 283 |
+
"""Retrieve from vector store"""
|
| 284 |
+
question = state["question"]
|
| 285 |
+
documents = self.retriever.invoke(question)
|
| 286 |
+
return {"documents": documents, "question": question}
|
| 287 |
+
|
| 288 |
+
def wiki_search(self, state: Dict) -> Dict:
|
| 289 |
+
"""Search Wikipedia"""
|
| 290 |
+
question = state["question"]
|
| 291 |
+
docs = self.wiki.invoke({"query": question})
|
| 292 |
+
wiki_results = Document(page_content=docs)
|
| 293 |
+
return {"documents": wiki_results, "question": question}
|
| 294 |
+
|
| 295 |
+
def route_question(self, state: Dict) -> str:
|
| 296 |
+
"""Route based on question type"""
|
| 297 |
+
question = state["question"]
|
| 298 |
+
source = self.question_router.route(question)
|
| 299 |
+
|
| 300 |
+
if source == "wiki_search":
|
| 301 |
+
return "wiki_search"
|
| 302 |
+
else:
|
| 303 |
+
return "vectorstore"
|
| 304 |
+
|
| 305 |
+
def build_graph(self):
|
| 306 |
+
"""Construct LangGraph workflow"""
|
| 307 |
+
workflow = StateGraph(GraphState)
|
| 308 |
+
|
| 309 |
+
# Add nodes
|
| 310 |
+
workflow.add_node("wiki_search", self.wiki_search)
|
| 311 |
+
workflow.add_node("retrieve", self.retrieve)
|
| 312 |
+
|
| 313 |
+
# Add conditional edges
|
| 314 |
+
workflow.add_conditional_edges(
|
| 315 |
+
START,
|
| 316 |
+
self.route_question,
|
| 317 |
+
{
|
| 318 |
+
"wiki_search": "wiki_search",
|
| 319 |
+
"vectorstore": "retrieve",
|
| 320 |
+
},
|
| 321 |
+
)
|
| 322 |
+
|
| 323 |
+
workflow.add_edge("retrieve", END)
|
| 324 |
+
workflow.add_edge("wiki_search", END)
|
| 325 |
+
|
| 326 |
+
self.app = workflow.compile()
|
| 327 |
+
|
| 328 |
+
def execute(self, question: str) -> Dict[str, Any]:
|
| 329 |
+
"""Execute workflow and return results"""
|
| 330 |
+
inputs = {"question": question}
|
| 331 |
+
|
| 332 |
+
result = {
|
| 333 |
+
"route": None,
|
| 334 |
+
"documents": [],
|
| 335 |
+
"execution_time": 0
|
| 336 |
+
}
|
| 337 |
+
|
| 338 |
+
start_time = time.time()
|
| 339 |
+
|
| 340 |
+
for output in self.app.stream(inputs):
|
| 341 |
+
for key, value in output.items():
|
| 342 |
+
result["route"] = key
|
| 343 |
+
result["documents"] = value.get("documents", [])
|
| 344 |
+
|
| 345 |
+
result["execution_time"] = time.time() - start_time
|
| 346 |
+
|
| 347 |
+
return result
|
| 348 |
+
|
| 349 |
+
# ==================== Streamlit UI ====================
|
| 350 |
+
|
| 351 |
+
def render_header():
|
| 352 |
+
"""Render application header"""
|
| 353 |
+
st.markdown('<h1 class="main-header">π§ IMSKOS</h1>', unsafe_allow_html=True)
|
| 354 |
+
st.markdown(
|
| 355 |
+
'<p class="sub-header">Intelligent Multi-Source Knowledge Orchestration System</p>',
|
| 356 |
+
unsafe_allow_html=True
|
| 357 |
+
)
|
| 358 |
+
st.markdown("---")
|
| 359 |
+
|
| 360 |
+
def render_sidebar():
|
| 361 |
+
"""Render sidebar with configuration and info"""
|
| 362 |
+
with st.sidebar:
|
| 363 |
+
st.image("https://img.icons8.com/fluency/96/000000/artificial-intelligence.png", width=100)
|
| 364 |
+
st.title("βοΈ System Configuration")
|
| 365 |
+
|
| 366 |
+
st.markdown("### π§ Core Technologies")
|
| 367 |
+
st.markdown("""
|
| 368 |
+
- **LangGraph**: Stateful workflow orchestration
|
| 369 |
+
- **Astra DB**: Distributed vector storage
|
| 370 |
+
- **Groq**: High-performance LLM inference
|
| 371 |
+
- **HuggingFace**: Semantic embeddings
|
| 372 |
+
""")
|
| 373 |
+
|
| 374 |
+
st.markdown("---")
|
| 375 |
+
st.markdown("### π System Capabilities")
|
| 376 |
+
st.markdown("""
|
| 377 |
+
β
Adaptive query routing
|
| 378 |
+
β
Real-time semantic search
|
| 379 |
+
β
Multi-source knowledge fusion
|
| 380 |
+
β
Scalable vector operations
|
| 381 |
+
β
Enterprise-grade architecture
|
| 382 |
+
""")
|
| 383 |
+
|
| 384 |
+
st.markdown("---")
|
| 385 |
+
st.markdown("### π― Use Cases")
|
| 386 |
+
st.markdown("""
|
| 387 |
+
- AI/ML Research Assistance
|
| 388 |
+
- Technical Documentation Q&A
|
| 389 |
+
- Competitive Intelligence
|
| 390 |
+
- Knowledge Base Management
|
| 391 |
+
""")
|
| 392 |
+
|
| 393 |
+
return st.button("π Reset System", use_container_width=True)
|
| 394 |
+
|
| 395 |
+
def initialize_system():
|
| 396 |
+
"""Initialize all system components"""
|
| 397 |
+
if 'initialized' not in st.session_state:
|
| 398 |
+
with st.spinner("π Initializing Intelligent Knowledge Orchestration System..."):
|
| 399 |
+
try:
|
| 400 |
+
# Load configuration
|
| 401 |
+
config = Config.load_env_variables()
|
| 402 |
+
|
| 403 |
+
# Initialize Knowledge Base Manager
|
| 404 |
+
kb_manager = KnowledgeBaseManager(
|
| 405 |
+
config["ASTRA_DB_APPLICATION_TOKEN"],
|
| 406 |
+
config["ASTRA_DB_ID"]
|
| 407 |
+
)
|
| 408 |
+
kb_manager.initialize_cassandra()
|
| 409 |
+
kb_manager.setup_embeddings()
|
| 410 |
+
|
| 411 |
+
# Initialize Router
|
| 412 |
+
router = IntelligentRouter(config["GROQ_API_KEY"])
|
| 413 |
+
router.initialize()
|
| 414 |
+
|
| 415 |
+
# Store in session state
|
| 416 |
+
st.session_state.kb_manager = kb_manager
|
| 417 |
+
st.session_state.router = router
|
| 418 |
+
st.session_state.initialized = True
|
| 419 |
+
st.session_state.documents_indexed = False
|
| 420 |
+
|
| 421 |
+
st.success("β
System initialized successfully!")
|
| 422 |
+
|
| 423 |
+
except Exception as e:
|
| 424 |
+
st.error(f"β Initialization failed: {str(e)}")
|
| 425 |
+
st.stop()
|
| 426 |
+
|
| 427 |
+
def render_indexing_tab():
|
| 428 |
+
"""Render document indexing interface"""
|
| 429 |
+
st.header("π Knowledge Base Indexing")
|
| 430 |
+
|
| 431 |
+
st.markdown("""
|
| 432 |
+
<div class="info-box">
|
| 433 |
+
<strong>π About Knowledge Base:</strong><br>
|
| 434 |
+
Index domain-specific documents to create a proprietary knowledge base.
|
| 435 |
+
The system uses advanced chunking strategies and distributed vector storage
|
| 436 |
+
for optimal retrieval performance.
|
| 437 |
+
</div>
|
| 438 |
+
""", unsafe_allow_html=True)
|
| 439 |
+
|
| 440 |
+
# URL input
|
| 441 |
+
st.subheader("π Document Sources")
|
| 442 |
+
default_urls = Config.get_default_urls()
|
| 443 |
+
|
| 444 |
+
urls_text = st.text_area(
|
| 445 |
+
"Enter URLs (one per line):",
|
| 446 |
+
value="\n".join(default_urls),
|
| 447 |
+
height=150
|
| 448 |
+
)
|
| 449 |
+
|
| 450 |
+
urls = [url.strip() for url in urls_text.split("\n") if url.strip()]
|
| 451 |
+
|
| 452 |
+
col1, col2 = st.columns(2)
|
| 453 |
+
with col1:
|
| 454 |
+
st.metric("π URLs Configured", len(urls))
|
| 455 |
+
with col2:
|
| 456 |
+
st.metric("πΎ Chunk Size", "500 tokens")
|
| 457 |
+
|
| 458 |
+
if st.button("π Index Documents", type="primary", use_container_width=True):
|
| 459 |
+
if not urls:
|
| 460 |
+
st.warning("β οΈ Please provide at least one URL")
|
| 461 |
+
return
|
| 462 |
+
|
| 463 |
+
progress_bar = st.progress(0)
|
| 464 |
+
status_text = st.empty()
|
| 465 |
+
|
| 466 |
+
def update_progress(message):
|
| 467 |
+
status_text.info(message)
|
| 468 |
+
|
| 469 |
+
try:
|
| 470 |
+
# Load and process documents
|
| 471 |
+
kb_manager = st.session_state.kb_manager
|
| 472 |
+
doc_splits = kb_manager.load_and_process_documents(urls, update_progress)
|
| 473 |
+
progress_bar.progress(50)
|
| 474 |
+
|
| 475 |
+
# Create vector store
|
| 476 |
+
if not kb_manager.vector_store:
|
| 477 |
+
kb_manager.create_vector_store()
|
| 478 |
+
|
| 479 |
+
# Add documents
|
| 480 |
+
kb_manager.add_documents(doc_splits, update_progress)
|
| 481 |
+
progress_bar.progress(100)
|
| 482 |
+
|
| 483 |
+
# Store in session state
|
| 484 |
+
st.session_state.documents_indexed = True
|
| 485 |
+
st.session_state.num_documents = len(doc_splits)
|
| 486 |
+
st.session_state.index_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
|
| 487 |
+
|
| 488 |
+
st.markdown("""
|
| 489 |
+
<div class="success-box">
|
| 490 |
+
β
<strong>Indexing Complete!</strong><br>
|
| 491 |
+
Documents have been successfully processed and stored in Astra DB vector database.
|
| 492 |
+
</div>
|
| 493 |
+
""", unsafe_allow_html=True)
|
| 494 |
+
|
| 495 |
+
col1, col2, col3 = st.columns(3)
|
| 496 |
+
with col1:
|
| 497 |
+
st.metric("π Total Chunks", len(doc_splits))
|
| 498 |
+
with col2:
|
| 499 |
+
st.metric("π’ Vector Dimensions", 384)
|
| 500 |
+
with col3:
|
| 501 |
+
st.metric("β‘ Status", "Ready")
|
| 502 |
+
|
| 503 |
+
except Exception as e:
|
| 504 |
+
st.error(f"β Indexing failed: {str(e)}")
|
| 505 |
+
progress_bar.empty()
|
| 506 |
+
|
| 507 |
+
def render_query_tab():
|
| 508 |
+
"""Render intelligent query interface"""
|
| 509 |
+
st.header("π Intelligent Query Interface")
|
| 510 |
+
|
| 511 |
+
if not st.session_state.get('documents_indexed', False):
|
| 512 |
+
st.warning("β οΈ Please index documents first in the 'Knowledge Base Indexing' tab")
|
| 513 |
+
return
|
| 514 |
+
|
| 515 |
+
st.markdown("""
|
| 516 |
+
<div class="info-box">
|
| 517 |
+
<strong>π― How It Works:</strong><br>
|
| 518 |
+
The system automatically routes your query to the optimal data source:
|
| 519 |
+
<ul>
|
| 520 |
+
<li><strong>Vector Store:</strong> For AI/ML, prompt engineering, and security topics</li>
|
| 521 |
+
<li><strong>Wikipedia:</strong> For general knowledge and current information</li>
|
| 522 |
+
</ul>
|
| 523 |
+
</div>
|
| 524 |
+
""", unsafe_allow_html=True)
|
| 525 |
+
|
| 526 |
+
# Query examples
|
| 527 |
+
with st.expander("π‘ Example Queries"):
|
| 528 |
+
col1, col2 = st.columns(2)
|
| 529 |
+
with col1:
|
| 530 |
+
st.markdown("**Vector Store Queries:**")
|
| 531 |
+
st.code("What are the types of agent memory?")
|
| 532 |
+
st.code("Explain chain of thought prompting")
|
| 533 |
+
st.code("How do adversarial attacks work on LLMs?")
|
| 534 |
+
with col2:
|
| 535 |
+
st.markdown("**Wikipedia Queries:**")
|
| 536 |
+
st.code("Who is Elon Musk?")
|
| 537 |
+
st.code("What is quantum computing?")
|
| 538 |
+
st.code("Tell me about the Avengers")
|
| 539 |
+
|
| 540 |
+
# Query input
|
| 541 |
+
query = st.text_input(
|
| 542 |
+
"π€ Enter your question:",
|
| 543 |
+
placeholder="e.g., What is an AI agent?",
|
| 544 |
+
key="query_input"
|
| 545 |
+
)
|
| 546 |
+
|
| 547 |
+
col1, col2, col3 = st.columns([2, 1, 1])
|
| 548 |
+
with col1:
|
| 549 |
+
search_button = st.button("π Execute Query", type="primary", use_container_width=True)
|
| 550 |
+
with col2:
|
| 551 |
+
advanced_mode = st.checkbox("π¬ Advanced Mode")
|
| 552 |
+
|
| 553 |
+
if search_button and query:
|
| 554 |
+
with st.spinner("π€ Processing your query..."):
|
| 555 |
+
try:
|
| 556 |
+
# Build workflow if not exists
|
| 557 |
+
if 'rag_workflow' not in st.session_state:
|
| 558 |
+
kb_manager = st.session_state.kb_manager
|
| 559 |
+
router = st.session_state.router
|
| 560 |
+
|
| 561 |
+
rag_workflow = AdaptiveRAGWorkflow(
|
| 562 |
+
kb_manager.vector_store,
|
| 563 |
+
router
|
| 564 |
+
)
|
| 565 |
+
rag_workflow.build_graph()
|
| 566 |
+
st.session_state.rag_workflow = rag_workflow
|
| 567 |
+
|
| 568 |
+
# Execute query
|
| 569 |
+
workflow = st.session_state.rag_workflow
|
| 570 |
+
result = workflow.execute(query)
|
| 571 |
+
|
| 572 |
+
# Display results
|
| 573 |
+
st.markdown("---")
|
| 574 |
+
st.subheader("π Query Results")
|
| 575 |
+
|
| 576 |
+
# Routing information
|
| 577 |
+
route = result["route"]
|
| 578 |
+
route_class = "route-vector" if route == "retrieve" else "route-wiki"
|
| 579 |
+
route_emoji = "ποΈ" if route == "retrieve" else "π"
|
| 580 |
+
route_name = "Vector Store" if route == "retrieve" else "Wikipedia"
|
| 581 |
+
|
| 582 |
+
col1, col2, col3 = st.columns(3)
|
| 583 |
+
with col1:
|
| 584 |
+
st.markdown(
|
| 585 |
+
f'<div class="route-indicator {route_class}">'
|
| 586 |
+
f'{route_emoji} Route: {route_name}</div>',
|
| 587 |
+
unsafe_allow_html=True
|
| 588 |
+
)
|
| 589 |
+
with col2:
|
| 590 |
+
st.metric("β‘ Execution Time", f"{result['execution_time']:.2f}s")
|
| 591 |
+
with col3:
|
| 592 |
+
st.metric("π Documents", len(result['documents']) if isinstance(result['documents'], list) else 1)
|
| 593 |
+
|
| 594 |
+
# Display documents
|
| 595 |
+
st.markdown("### π Retrieved Information")
|
| 596 |
+
|
| 597 |
+
documents = result['documents']
|
| 598 |
+
if isinstance(documents, list):
|
| 599 |
+
for i, doc in enumerate(documents[:5], 1):
|
| 600 |
+
with st.expander(f"π Document {i}", expanded=(i == 1)):
|
| 601 |
+
st.markdown(doc.page_content)
|
| 602 |
+
|
| 603 |
+
if advanced_mode and hasattr(doc, 'metadata'):
|
| 604 |
+
st.markdown("**Metadata:**")
|
| 605 |
+
st.json(doc.metadata)
|
| 606 |
+
else:
|
| 607 |
+
st.markdown(documents.page_content)
|
| 608 |
+
|
| 609 |
+
# Store query history
|
| 610 |
+
if 'query_history' not in st.session_state:
|
| 611 |
+
st.session_state.query_history = []
|
| 612 |
+
|
| 613 |
+
st.session_state.query_history.append({
|
| 614 |
+
"query": query,
|
| 615 |
+
"route": route_name,
|
| 616 |
+
"timestamp": datetime.now().strftime("%H:%M:%S"),
|
| 617 |
+
"execution_time": result['execution_time']
|
| 618 |
+
})
|
| 619 |
+
|
| 620 |
+
except Exception as e:
|
| 621 |
+
st.error(f"β Query execution failed: {str(e)}")
|
| 622 |
+
|
| 623 |
+
def render_analytics_tab():
|
| 624 |
+
"""Render system analytics and monitoring"""
|
| 625 |
+
st.header("π System Analytics")
|
| 626 |
+
|
| 627 |
+
if 'query_history' not in st.session_state or not st.session_state.query_history:
|
| 628 |
+
st.info("π No queries executed yet. Run some queries to see analytics!")
|
| 629 |
+
return
|
| 630 |
+
|
| 631 |
+
history = st.session_state.query_history
|
| 632 |
+
|
| 633 |
+
# Metrics
|
| 634 |
+
col1, col2, col3, col4 = st.columns(4)
|
| 635 |
+
with col1:
|
| 636 |
+
st.metric("π Total Queries", len(history))
|
| 637 |
+
with col2:
|
| 638 |
+
vector_count = sum(1 for h in history if h['route'] == 'Vector Store')
|
| 639 |
+
st.metric("ποΈ Vector Store", vector_count)
|
| 640 |
+
with col3:
|
| 641 |
+
wiki_count = sum(1 for h in history if h['route'] == 'Wikipedia')
|
| 642 |
+
st.metric("π Wikipedia", wiki_count)
|
| 643 |
+
with col4:
|
| 644 |
+
avg_time = sum(h['execution_time'] for h in history) / len(history)
|
| 645 |
+
st.metric("β‘ Avg Time", f"{avg_time:.2f}s")
|
| 646 |
+
|
| 647 |
+
# Query history table
|
| 648 |
+
st.subheader("π Query History")
|
| 649 |
+
import pandas as pd
|
| 650 |
+
df = pd.DataFrame(history)
|
| 651 |
+
st.dataframe(df, use_container_width=True)
|
| 652 |
+
|
| 653 |
+
# System info
|
| 654 |
+
if st.session_state.get('documents_indexed'):
|
| 655 |
+
st.subheader("πΎ Knowledge Base Status")
|
| 656 |
+
col1, col2 = st.columns(2)
|
| 657 |
+
with col1:
|
| 658 |
+
st.metric("π Document Chunks", st.session_state.get('num_documents', 0))
|
| 659 |
+
with col2:
|
| 660 |
+
st.metric("π Last Indexed", st.session_state.get('index_timestamp', 'N/A'))
|
| 661 |
+
|
| 662 |
+
def main():
|
| 663 |
+
"""Main application entry point"""
|
| 664 |
+
render_header()
|
| 665 |
+
|
| 666 |
+
# Sidebar
|
| 667 |
+
reset_clicked = render_sidebar()
|
| 668 |
+
if reset_clicked:
|
| 669 |
+
for key in list(st.session_state.keys()):
|
| 670 |
+
del st.session_state[key]
|
| 671 |
+
st.rerun()
|
| 672 |
+
|
| 673 |
+
# Initialize system
|
| 674 |
+
initialize_system()
|
| 675 |
+
|
| 676 |
+
# Main tabs
|
| 677 |
+
tabs = st.tabs(["π Knowledge Base Indexing", "π Intelligent Query", "π Analytics"])
|
| 678 |
+
|
| 679 |
+
with tabs[0]:
|
| 680 |
+
render_indexing_tab()
|
| 681 |
+
|
| 682 |
+
with tabs[1]:
|
| 683 |
+
render_query_tab()
|
| 684 |
+
|
| 685 |
+
with tabs[2]:
|
| 686 |
+
render_analytics_tab()
|
| 687 |
+
|
| 688 |
+
# Footer
|
| 689 |
+
st.markdown("---")
|
| 690 |
+
st.markdown("""
|
| 691 |
+
<div style="text-align: center; color: #666; padding: 2rem 0;">
|
| 692 |
+
<p><strong>IMSKOS v1.0</strong> | Built with LangGraph, Astra DB, and Groq</p>
|
| 693 |
+
<p>Enterprise-Grade Intelligent Knowledge Orchestration</p>
|
| 694 |
+
</div>
|
| 695 |
+
""", unsafe_allow_html=True)
|
| 696 |
+
|
| 697 |
+
if __name__ == "__main__":
|
| 698 |
+
main()
|
requirements.txt
ADDED
|
@@ -0,0 +1,34 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ==================== Core Framework ====================
|
| 2 |
+
streamlit==1.31.0
|
| 3 |
+
python-dotenv==1.0.0
|
| 4 |
+
|
| 5 |
+
# ==================== LangChain Ecosystem ====================
|
| 6 |
+
langchain==0.1.16
|
| 7 |
+
langchain-community==0.0.38
|
| 8 |
+
langchain-core==0.1.46
|
| 9 |
+
langchain-groq==0.1.3
|
| 10 |
+
langchain-huggingface==0.0.1
|
| 11 |
+
langgraph==0.0.43
|
| 12 |
+
langchainhub==0.1.15
|
| 13 |
+
|
| 14 |
+
# ==================== Vector Database & Embeddings ====================
|
| 15 |
+
cassio==0.1.4
|
| 16 |
+
sentence-transformers==2.5.1
|
| 17 |
+
|
| 18 |
+
# ==================== Document Processing ====================
|
| 19 |
+
tiktoken==0.6.0
|
| 20 |
+
beautifulsoup4==4.12.3
|
| 21 |
+
lxml==5.1.0
|
| 22 |
+
|
| 23 |
+
# ==================== External APIs & Tools ====================
|
| 24 |
+
wikipedia==1.4.0
|
| 25 |
+
arxiv==2.1.0
|
| 26 |
+
|
| 27 |
+
# ==================== Data & Utilities ====================
|
| 28 |
+
pandas==2.2.1
|
| 29 |
+
pydantic==2.6.4
|
| 30 |
+
typing-extensions==4.10.0
|
| 31 |
+
|
| 32 |
+
# ==================== Optional: Performance & Monitoring ====================
|
| 33 |
+
# psutil==5.9.8
|
| 34 |
+
# prometheus-client==0.20.0
|