| # Knowledge Base Browser | |
| A powerful Gradio custom component for retrieval-augmented generation (RAG) applications. This component provides an intuitive interface for searching through documents with AI-powered semantic search capabilities. | |
| ## Features | |
| - **Semantic Search**: AI-powered meaning-based search using OpenAI embeddings | |
| - **Keyword Search**: Traditional text matching for precise queries | |
| - **Hybrid Search**: Combines semantic and keyword approaches | |
| - **Source Type Filtering**: Filter by PDF, web pages, academic papers, or code repositories | |
| - **Citation Tracking**: Built-in citation management and export functionality | |
| - **Agent Integration**: Designed for both human users and AI agents | |
| - **LlamaIndex Integration**: Uses LlamaIndex with FAISS vector store for efficient retrieval | |
| - **Responsive UI**: Modern, accessible interface with expandable result cards | |
| ## Installation | |
| ```bash | |
| pip install gradio_kb_browser | |
| ``` | |
| For development installation: | |
| ```bash | |
| git clone <repository-url> | |
| cd kb_browser | |
| pip install -e . | |
| ``` | |
| ## Quick Start | |
| ### Basic Usage | |
| ```python | |
| import gradio as gr | |
| from kb_browser import KnowledgeBrowser | |
| # Create the component | |
| kb_browser = KnowledgeBrowser( | |
| index_path="./documents", # Path to your document directory | |
| search_type="semantic", # Default search type | |
| max_results=10 # Maximum results to return | |
| ) | |
| # Use in a Gradio interface | |
| with gr.Blocks() as demo: | |
| gr.Markdown("# Document Search") | |
| query = gr.Textbox(label="Search Query") | |
| search_btn = gr.Button("Search") | |
| results = gr.JSON(label="Results") | |
| def search_documents(query_text): | |
| return kb_browser.search(query_text) | |
| search_btn.click( | |
| fn=search_documents, | |
| inputs=query, | |
| outputs=results | |
| ) | |
| demo.launch() | |
| ``` | |
| ### Agent Integration | |
| ```python | |
| from kb_browser import KnowledgeBrowser | |
| # Initialize component for agent use | |
| kb_browser = KnowledgeBrowser() | |
| # Agent can search and get structured results | |
| def agent_research(question): | |
| results = kb_browser.search( | |
| query=question, | |
| search_type="semantic", | |
| max_results=5 | |
| ) | |
| # Process results for agent response | |
| citations = [] | |
| for doc in results["results"]: | |
| citations.append({ | |
| "title": doc["title"], | |
| "source": doc["source"], | |
| "relevance": doc["relevance_score"], | |
| "snippet": doc["snippet"] | |
| }) | |
| return citations | |
| ``` | |
| ## Configuration | |
| ### Environment Variables | |
| Set your OpenAI API key for semantic search: | |
| ```bash | |
| export OPENAI_API_KEY="your-api-key-here" | |
| ``` | |
| ### Component Parameters | |
| - `query`: Initial search query string | |
| - `results`: Pre-loaded search results | |
| - `index_path`: Path to document directory (default: "./data") | |
| - `search_type`: Search method - "semantic", "keyword", or "hybrid" | |
| - `max_results`: Maximum number of results to return | |
| - `label`: Component label for UI | |
| - `visible`: Whether component is visible | |
| - `elem_classes`: CSS classes for styling | |
| ## Document Formats | |
| The component supports various document formats: | |
| - **PDF Files**: Automatically parsed and indexed | |
| - **Text Files**: Plain text documents | |
| - **Markdown**: Documentation and notes | |
| - **JSON**: Structured data documents | |
| ## Search Types | |
| ### Semantic Search | |
| Uses OpenAI embeddings to understand meaning and context. Best for: | |
| - Conceptual queries | |
| - Finding related topics | |
| - Cross-domain searches | |
| ### Keyword Search | |
| Traditional text matching. Best for: | |
| - Exact phrase searches | |
| - Technical terms | |
| - Specific names or identifiers | |
| ### Hybrid Search | |
| Combines both approaches for comprehensive results. | |
| ## API Reference | |
| ### KnowledgeBrowser Class | |
| #### Methods | |
| - `search(query, search_type, max_results)`: Perform search and return results | |
| - `preprocess(payload)`: Preprocess component input | |
| - `postprocess(value)`: Postprocess component output | |
| - `api_info()`: Get API schema information | |
| #### Events | |
| - `submit`: Triggered when search is performed | |
| - `select`: Triggered when document is selected | |
| - `change`: Triggered when component state changes | |
| ## Example Applications | |
| ### Research Assistant | |
| ```python | |
| import gradio as gr | |
| from kb_browser import KnowledgeBrowser | |
| def create_research_app(): | |
| kb_browser = KnowledgeBrowser(index_path="./research_papers") | |
| with gr.Blocks() as app: | |
| gr.Markdown("# Research Assistant") | |
| question = gr.Textbox(label="Research Question") | |
| search_btn = gr.Button("Search Literature") | |
| results_display = gr.HTML() | |
| citations = gr.State([]) | |
| def research_query(question_text): | |
| results = kb_browser.search(question_text, max_results=5) | |
| html = "<div class='research-results'>" | |
| for doc in results["results"]: | |
| html += f""" | |
| <div class='paper'> | |
| <h3>{doc['title']}</h3> | |
| <p><strong>Source:</strong> {doc['source']}</p> | |
| <p><strong>Relevance:</strong> {doc['relevance_score']:.0%}</p> | |
| <p>{doc['snippet']}</p> | |
| </div> | |
| """ | |
| html += "</div>" | |
| return html | |
| search_btn.click(research_query, question, results_display) | |
| return app | |
| ``` | |
| ### Customer Support | |
| ```python | |
| def create_support_app(): | |
| kb_browser = KnowledgeBrowser(index_path="./support_docs") | |
| with gr.Blocks() as app: | |
| gr.Markdown("# Customer Support Assistant") | |
| issue = gr.Textbox(label="Describe your issue") | |
| help_btn = gr.Button("Find Solutions") | |
| solutions = gr.HTML() | |
| def find_solutions(issue_text): | |
| results = kb_browser.search(issue_text, search_type="hybrid") | |
| html = "<div class='solutions'>" | |
| for doc in results["results"][:3]: | |
| html += f""" | |
| <div class='solution'> | |
| <h4>{doc['title']}</h4> | |
| <p>{doc['snippet']}</p> | |
| <a href="{doc.get('url', '#')}" target="_blank">View Full Article</a> | |
| </div> | |
| """ | |
| html += "</div>" | |
| return html | |
| help_btn.click(find_solutions, issue, solutions) | |
| return app | |
| ``` | |
| ## Development | |
| ### Running Tests | |
| ```bash | |
| pip install pytest | |
| pytest test_kb_browser.py -v | |
| ``` | |
| ### Building the Component | |
| ```bash | |
| pip install build | |
| python -m build | |
| ``` | |
| ### Publishing | |
| ```bash | |
| gradio cc publish kb_browser --name "KnowledgeBaseBrowser" | |
| ``` | |
| ## Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Make your changes | |
| 4. Add tests for new functionality | |
| 5. Submit a pull request | |
| ## License | |
| MIT License - see LICENSE file for details. | |
| ## Support | |
| For issues and questions: | |
| - GitHub Issues: [Create an issue](https://github.com/gradio-app/gradio/issues) | |
| - Documentation: [Gradio Docs](https://gradio.app/docs) | |
| - Community: [Gradio Discord](https://discord.gg/gradio) |