--- title: IMSKOS emoji: π§ colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.53.1 app_file: app.py pinned: false --- ## π― Project Overview **IMSKOS** represents a paradigm shift in intelligent information retrieval by combining: - **π Adaptive Query Routing**: LLM-powered decision engine that dynamically routes queries to optimal data sources - **ποΈ Distributed Vector Storage**: Scalable DataStax Astra DB for production-grade vector operations - **β‘ High-Performance Inference**: Groq's lightning-fast LLM API for sub-second responses - **π Stateful Workflows**: LangGraph for complex, multi-step retrieval orchestration - **π¨ Modern UI/UX**: Professional Streamlit interface with real-time analytics --- ## ποΈ System Architecture ``` βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β User Query Interface β β (Streamlit App) β ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Intelligent Query Router (Groq LLM) β β Analyzes query β Determines optimal source β ββββββββββββββββ¬βββββββββββββββββββββββββββββ¬ββββββββββββββββββ β β βΌ βΌ ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββ β Vector Store Retrieval β β Wikipedia External Search β β (Astra DB + Cassandra) β β (LangChain Wikipedia Tool) β β - AI/ML Content β β - General Knowledge β β - Technical Docs β β - Current Events β ββββββββββββββββ¬ββββββββββββ ββββββββββββββββ¬ββββββββββββββββ β β ββββββββββββββββ¬ββββββββββββββββ βΌ βββββββββββββββββββββββ β LangGraph Workflowβ β State Management β β Result Aggregationβ ββββββββββββ¬βββββββββββ βΌ βββββββββββββββββββββββ β Formatted Response β β + Analytics β βββββββββββββββββββββββ ``` --- ## β¨ Key Features ### π― Intelligent Capabilities | Feature | Description | Technology | |---------|-------------|------------| | **Adaptive Routing** | Context-aware query routing to optimal data sources | Groq LLM + Pydantic | | **Semantic Search** | Deep semantic understanding with transformer embeddings | HuggingFace Embeddings | | **Multi-Source Fusion** | Seamless integration of proprietary and public knowledge | LangGraph | | **Real-time Analytics** | Query performance monitoring and routing statistics | Streamlit | | **Scalable Storage** | Distributed vector database with auto-scaling | DataStax Astra DB | ### π§ Technical Highlights - **ποΈ Production-Ready Architecture**: Modular design with separation of concerns - **π Security-First**: Environment variable management, no hardcoded credentials - **π Observable**: Built-in analytics dashboard and query history - **π Performance Optimized**: Caching, efficient document chunking, parallel processing - **π¨ Professional UI**: Modern, responsive interface with custom CSS styling - **π Scalable**: Handles growing document collections without performance degradation --- ## π Quick Start ### Prerequisites - Python 3.9 or higher - DataStax Astra DB account ([Sign up free](https://astra.datastax.com)) - Groq API key ([Get API key](https://console.groq.com)) ### Installation 1. **Clone the repository:** ```bash git clone https://github.com/KUNALSHAWW/IMSKOS-Intelligent-Multi-Source-Knowledge-Orchestration-System-.git cd IMSKOS ``` 2. **Create virtual environment:** ```bash python -m venv venv # Windows venv\Scripts\activate # Linux/Mac source venv/bin/activate ``` 3. **Install dependencies:** ```bash pip install -r requirements.txt ``` 4. **Configure environment variables:** ```bash # Copy example file cp .env.example .env # Edit .env with your credentials # ASTRA_DB_APPLICATION_TOKEN=your_token_here # ASTRA_DB_ID=your_database_id_here # GROQ_API_KEY=your_groq_api_key_here ``` 5. **Run the application:** ```bash streamlit run app.py ``` 6. **Access the application:** Open your browser and navigate to `http://localhost:8501` --- ## π Usage Guide ### Step 1: Index Your Knowledge Base 1. Navigate to the **"Knowledge Base Indexing"** tab 2. Add URLs of documents you want to index (default includes AI/ML research papers) 3. Click **"Index Documents"** to process and store in Astra DB 4. Wait for the indexing process to complete (progress shown in real-time) ### Step 2: Execute Intelligent Queries 1. Switch to the **"Intelligent Query"** tab 2. Enter your question in the text input 3. Click **"Execute Query"** 4. The system will: - Analyze your query - Route to optimal data source (Vector Store or Wikipedia) - Retrieve relevant information - Display results with metadata ### Step 3: Monitor Performance 1. Visit the **"Analytics"** tab to see: - Total queries executed - Routing distribution (Vector Store vs Wikipedia) - Average execution time - Complete query history --- ## π Example Queries ### Vector Store Queries (Routed to Astra DB) ``` β "What are the types of agent memory?" β "Explain chain of thought prompting techniques" β "How do adversarial attacks work on large language models?" β "What is ReAct prompting?" ``` ### Wikipedia Queries (Routed to External Search) ``` β "Who is Elon Musk?" β "What is quantum computing?" β "Tell me about the Marvel Avengers" β "History of artificial intelligence" ``` --- ## π’ Production Deployment ### Deploying to Streamlit Cloud 1. **Push to GitHub:** ```bash git init git add . git commit -m "Initial commit: IMSKOS production deployment" git branch -M main git remote add origin https://github.com/yourusername/IMSKOS.git git push -u origin main ``` 2. **Configure Streamlit Cloud:** - Go to [share.streamlit.io](https://share.streamlit.io) - Click "New app" - Select your repository - Set main file: `app.py` - Add secrets in "Advanced settings": ```toml ASTRA_DB_APPLICATION_TOKEN = "your_token" ASTRA_DB_ID = "your_database_id" GROQ_API_KEY = "your_groq_key" ``` 3. **Deploy!** ### Alternative Deployment Options #### Docker Deployment ```dockerfile # Dockerfile FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8501 CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"] ``` ```bash # Build and run docker build -t imskos . docker run -p 8501:8501 --env-file .env imskos ``` #### AWS/GCP/Azure Deployment See detailed deployment guides in the `/docs` folder (coming soon). --- ## π§ Configuration ### Environment Variables | Variable | Description | Required | Default | |----------|-------------|----------|---------| | `ASTRA_DB_APPLICATION_TOKEN` | DataStax Astra DB token | Yes | - | | `ASTRA_DB_ID` | Astra DB instance ID | Yes | - | | `GROQ_API_KEY` | Groq API authentication key | Yes | - | ### Customization Options **Modify document chunking:** ```python # In app.py - KnowledgeBaseManager.load_and_process_documents() text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder( chunk_size=500, # Adjust chunk size chunk_overlap=50 # Adjust overlap ) ``` **Change embedding model:** ```python # In app.py - KnowledgeBaseManager.setup_embeddings() self.embeddings = HuggingFaceEmbeddings( model_name="all-MiniLM-L6-v2" # Try: "all-mpnet-base-v2" for higher quality ) ``` **Adjust LLM parameters:** ```python # In app.py - IntelligentRouter.initialize() self.llm = ChatGroq( model_name="llama-3.1-8b-instant", # Try other Groq models temperature=0 # Increase for more creative responses ) ``` --- ## π Performance Benchmarks | Metric | Value | Notes | |--------|-------|-------| | **Query Latency** | < 2s | Average end-to-end response time | | **Embedding Generation** | ~100ms | Per document chunk | | **Vector Search** | < 500ms | Top-K retrieval from Astra DB | | **LLM Routing** | < 300ms | Groq inference time | | **Concurrent Users** | 50+ | Tested on Streamlit Cloud | --- ## π οΈ Technology Stack ### Core Framework - **[Streamlit](https://streamlit.io/)** - Interactive web application framework - **[LangChain](https://langchain.com/)** - LLM application framework - **[LangGraph](https://github.com/langchain-ai/langgraph)** - Stateful workflow orchestration ### AI/ML Components - **[Groq](https://groq.com/)** - High-performance LLM inference - **[HuggingFace Transformers](https://huggingface.co/)** - Sentence embeddings - **[DataStax Astra DB](https://astra.datastax.com)** - Vector database ### Supporting Libraries - **Pydantic** - Data validation and settings management - **BeautifulSoup4** - Web scraping and HTML parsing - **TikToken** - Token counting and text splitting - **Wikipedia API** - External knowledge retrieval --- ## π Roadmap ### Version 1.1 (Planned) - [ ] Multi-modal support (images, PDFs) - [ ] Advanced RAG techniques (HyDE, Multi-Query) - [ ] Custom document upload via UI - [ ] Export results to PDF/Markdown - [ ] User authentication & session management ### Version 2.0 (Future) - [ ] Multi-language support - [ ] Graph RAG integration - [ ] Real-time collaborative features - [ ] API endpoints for programmatic access - [ ] Advanced analytics dashboard --- ## π€ Contributing Contributions are welcome! Please follow these steps: 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/AmazingFeature`) 3. Commit your changes (`git commit -m 'Add some AmazingFeature'`) 4. Push to the branch (`git push origin feature/AmazingFeature`) 5. Open a Pull Request --- ## π License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. --- ## π Acknowledgments - LangChain team for the amazing framework - DataStax for Astra DB and Cassandra support - Groq for lightning-fast LLM inference - HuggingFace for open-source embeddings - Streamlit for the intuitive app framework --- ## π Contact & Support - **GitHub Issues**: [Report bugs or request features](https://github.com/KUNALSHAWW/IMSKOS/issues) - **Email**: kunalshawkol17@gmail.com - **LinkedIn**: [Profile](https://www.linkedin.com/in/kunal-kumar-shaw-443999205/) --- ## π Star History If you find this project useful, please consider giving it a β! ---