|
|
--- |
|
|
title: AI Digital Library Assistant |
|
|
emoji: π |
|
|
colorFrom: blue |
|
|
colorTo: indigo |
|
|
sdk: gradio |
|
|
sdk_version: "5.38.0" |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
license: mit |
|
|
tags: |
|
|
- mcp-in-action-track-consumer |
|
|
- mcp-in-action-track-creative |
|
|
- building-mcp-track-consumer |
|
|
- building-mcp-track-creative |
|
|
- MCP-1st-Birthday |
|
|
--- |
|
|
|
|
|
Demo Link : https://youtu.be/09Lls0zJ-QE |
|
|
|
|
|
Social media post Link : https://x.com/nihald2000/status/1995198714156286290?s=20 |
|
|
|
|
|
The **AI Digital Library Assistant** is a next-generation knowledge management tool built for the **MCP 1st Birthday Hackathon**. It transforms your static document collection into an interactive, living library. |
|
|
|
|
|
Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the **Model Context Protocol (MCP)** to create a modular ecosystem of toolsβIngestion, Search, and Podcast Generationβthat work harmoniously to help you consume information in the way that suits *you* best. |
|
|
|
|
|
```mermaid |
|
|
graph TD |
|
|
User((π€ User)) |
|
|
|
|
|
subgraph "Frontend (Gradio)" |
|
|
UI[Web Interface] |
|
|
PodcastUI[Podcast Studio] |
|
|
end |
|
|
|
|
|
subgraph "MCP Server Layer" |
|
|
MCPServer[Content Organizer MCP Server] |
|
|
|
|
|
subgraph "MCP Tools" |
|
|
IngestTool[π₯ Ingestion Tool] |
|
|
SearchTool[π Search Tool] |
|
|
GenTool[β¨ Generative Tool] |
|
|
PodTool[π§ Podcast Tool] |
|
|
end |
|
|
end |
|
|
|
|
|
subgraph "Service Layer" |
|
|
VecStore[(Vector Store)] |
|
|
DocStore[(Document Store)] |
|
|
LLM[LLM Service (OpenAI / Nebius AI)] |
|
|
ElevenLabs[ElevenLabs API] |
|
|
LlamaIndex[LlamaIndex Agent] |
|
|
end |
|
|
|
|
|
User <--> UI |
|
|
UI <--> MCPServer |
|
|
|
|
|
MCPServer --> IngestTool |
|
|
MCPServer --> SearchTool |
|
|
MCPServer --> GenTool |
|
|
MCPServer --> PodTool |
|
|
|
|
|
IngestTool --> VecStore |
|
|
IngestTool --> DocStore |
|
|
SearchTool --> VecStore |
|
|
GenTool --> LLM |
|
|
PodTool --> LlamaIndex |
|
|
PodTool --> ElevenLabs |
|
|
PodTool --> LLM |
|
|
``` |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
Check out [QUICKSTART.md](QUICKSTART.md) for detailed local setup instructions. |
|
|
|
|
|
1. **Clone & Install**: |
|
|
```bash |
|
|
git clone https://huggingface.co/spaces/Nihal2000/AiDigitalLibraryAssistant |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
2. **Configure**: Add your `OPENAI_API_KEY` and `ELEVENLABS_API_KEY` to `.env`. |
|
|
3. **Run**: `python app.py` |
|
|
|
|
|
## π‘ How It Works |
|
|
|
|
|
### 1. The MCP Core |
|
|
At the heart of the application is the `AiDigitalLibraryAssistant`. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client! |
|
|
|
|
|
```bash |
|
|
{ |
|
|
"mcpServers": { |
|
|
"ai-library": { |
|
|
"command": "npx", |
|
|
"args": [ |
|
|
"-y", |
|
|
"mcp-remote", |
|
|
"https://mcp-1st-birthday-ai-digital-library-assistant.hf.space/gradio_api/mcp/sse" |
|
|
] |
|
|
} |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
### 2. π§ Podcast Studio (Star Feature) |
|
|
Turn your reading list into a playlist! The **Podcast Studio** is a flagship feature that transforms any selection of documents into an engaging, multi-speaker audio podcast. |
|
|
- **Intelligent Scripting**: Uses **LlamaIndex** and **OpenAI/Nebius AI** to analyze your documents and generate a natural, conversational script. |
|
|
- **Multi-Speaker Synthesis**: Leverages **ElevenLabs** to bring the script to life with distinct, realistic voices for each host. |
|
|
- **Customizable**: Choose your style (Educational, Casual, Teaching) and duration. |
|
|
|
|
|
|
|
|
### β¨ Features |
|
|
## π Document Management |
|
|
|
|
|
Multi-format Support: PDF, DOCX, TXT, and image files (PNG, JPG, JPEG) |
|
|
Intelligent OCR: Automatic text extraction from images and scanned documents |
|
|
Semantic Chunking: Documents automatically split into meaningful segments for better retrieval |
|
|
Metadata Tracking: Comprehensive document metadata including file size, type, creation date, and custom tags |
|
|
Vector Embeddings: All documents indexed with dense vector embeddings for semantic search |
|
|
|
|
|
## π Advanced Search |
|
|
|
|
|
Semantic Search: Find documents by meaning, not just keywords |
|
|
Configurable Results: Adjust the number of results (1-20) based on your needs |
|
|
Relevance Scoring: Each result includes a confidence score |
|
|
Source Attribution: Direct links to source documents with highlighted excerpts |
|
|
|
|
|
## π¨ Content Studio |
|
|
Transform your documents with 8 powerful AI tools: |
|
|
|
|
|
Summarize: Generate concise, detailed, bullet-point, or executive summaries |
|
|
Generate Outline: Create structured outlines from topics or documents (3-10 sections) |
|
|
Explain Concept: Get explanations tailored to different audiences (general, technical, beginner, expert) |
|
|
Paraphrase: Rewrite text in various styles (formal, casual, academic, simple, technical) |
|
|
Categorize: Automatically classify content into user-defined categories |
|
|
Key Insights: Extract the most important points from any document |
|
|
Generate Questions: Create comprehension, analysis, application, creative, or factual questions |
|
|
Extract Key Info: Pull out structured information (entities, dates, facts) in JSON format |
|
|
|
|
|
## π·οΈ Smart Tagging |
|
|
|
|
|
AI-Generated Tags: Automatically generate 3-15 relevant tags for any document |
|
|
Persistent Storage: Tags saved directly to document metadata |
|
|
Batch Processing: Tag multiple documents or custom text snippets |
|
|
|
|
|
## β RAG-Powered Q&A |
|
|
|
|
|
Context-Aware Answers: Ask questions and get answers grounded in your documents |
|
|
Source Citations: Every answer includes relevant source excerpts |
|
|
Confidence Scoring: Transparency about answer reliability |
|
|
Multi-Document Synthesis: Answers can draw from multiple documents simultaneously |
|
|
|
|
|
## ποΈ Podcast Studio |
|
|
Convert documents into engaging audio conversations: |
|
|
|
|
|
AI Voice Generation: Ultra-realistic voices powered by ElevenLabs |
|
|
Two-Host Format: Dynamic dialogue between two AI personalities |
|
|
Multiple Styles: Conversational, educational, technical, or casual |
|
|
Custom Duration: 5-30 minute podcasts |
|
|
Voice Selection: Choose from 7+ professional AI voices |
|
|
Full Transcripts: Complete text transcripts for every generated podcast |
|
|
Podcast Library: Browse, play, and manage all generated podcasts |
|
|
|
|
|
## π Dashboard & Analytics |
|
|
|
|
|
Real-time Stats: Track total documents, vector chunks, and storage usage |
|
|
Recent Activity: View recently added documents at a glance |
|
|
System Health: Monitor vector store, LLM service, and voice service status |
|
|
|
|
|
## Data Flow |
|
|
|
|
|
## Document Ingestion: |
|
|
- Files β OCR β Text Extraction β Chunking β Embedding Generation β Vector Store |
|
|
## Semantic Search: |
|
|
- Query β Embedding β Vector Search β Relevance Ranking β Results |
|
|
## RAG Q&A: |
|
|
- Question β Search β Context Retrieval β LLM Generation β Answer + Sources |
|
|
## Podcast Generation: |
|
|
- Documents β Content Analysis β Script Generation β Voice Synthesis β Audio File |
|
|
|
|
|
### Basic Workflow |
|
|
1. Upload Documents |
|
|
Navigate to the "π Upload Documents" tab: |
|
|
|
|
|
Click "Select a document" or drag-and-drop files |
|
|
Supported formats: PDF, DOCX, TXT, PNG, JPG, JPEG |
|
|
Click "π Process & Add to Library" |
|
|
Wait for processing to complete (OCR runs automatically for images) |
|
|
Note the Document ID from the output |
|
|
|
|
|
2. Search Your Library |
|
|
Go to "π Search Documents": |
|
|
|
|
|
Enter a natural language query (e.g., "What are the key findings about climate change?") |
|
|
Adjust "Number of Results" slider (1-20) |
|
|
Click "π Search" |
|
|
Review results with relevance scores and source excerpts |
|
|
|
|
|
3. Ask Questions |
|
|
Navigate to "β Ask Questions": |
|
|
|
|
|
Type your question about uploaded documents |
|
|
Click "β Get Answer" |
|
|
Receive AI-generated answer with source citations |
|
|
Check confidence level and review source documents |
|
|
|
|
|
4. Generate Content |
|
|
Open "π Content Studio": |
|
|
|
|
|
Select a document from dropdown OR paste custom text |
|
|
Choose a task from the dropdown: |
|
|
|
|
|
Summarize, Outline, Explain, Paraphrase, etc. |
|
|
|
|
|
|
|
|
Configure task-specific options in "βοΈ Advanced Options" |
|
|
Click "π Run Task" |
|
|
Copy or download the generated content |
|
|
|
|
|
5. Create Podcasts |
|
|
Visit "π§ Podcast Studio": |
|
|
|
|
|
Select 1-5 documents using checkboxes |
|
|
Choose Style (conversational, educational, technical, casual) |
|
|
Set Duration (5-30 minutes) |
|
|
Select voices for Host 1 and Host 2 |
|
|
Click "ποΈ Generate Podcast" |
|
|
Listen to the generated audio and read the transcript |
|
|
Browse past podcasts in the Podcast Library |
|
|
|
|
|
6. Generate Tags |
|
|
Go to "π·οΈ Generate Tags": |
|
|
|
|
|
Select a document OR paste custom text |
|
|
Adjust "Number of Tags" slider (3-15) |
|
|
Click "π·οΈ Generate Tags" |
|
|
|
|
|
## π Hackathon Tracks |
|
|
|
|
|
We are submitting to: |
|
|
- **Building MCP**: For our custom `AiDigitalLibraryAssistant` MCP server implementation. |
|
|
- **MCP in Action (Consumer/Creative)**: For the innovative Podcast interface that makes personal knowledge management accessible and fun. |
|
|
|
|
|
## π License |
|
|
|
|
|
MIT License. Built with β€οΈ for the AI community. |
|
|
|
|
|
## π Acknowledgements & Sponsors |
|
|
|
|
|
This project was built for the **MCP 1st Birthday Hackathon** and proudly leverages technology from: |
|
|
|
|
|
- **[OpenAI](https://openai.com)**: Providing the foundational intelligence for our document analysis and content generation. |
|
|
- **[Nebius AI](https://nebius.com)**: Powering our high-performance inference needs. |
|
|
- **[LlamaIndex](https://www.llamaindex.ai)**: The backbone of our data orchestration, enabling sophisticated RAG and agentic workflows for the Podcast Studio. |
|
|
- **[ElevenLabs](https://elevenlabs.io)**: Bringing our podcasts to life with industry-leading, hyper-realistic text-to-speech. |
|
|
- **[Hugging Face](https://huggingface.co)**: Hosting our application on **Spaces** and providing the **Gradio** framework for our beautiful, responsive UI. |
|
|
- **[Anthropic](https://anthropic.com)**: For pioneering the **Model Context Protocol (MCP)** that makes this modular architecture possible. |
|
|
|
|
|
## π Connect to Claude |
|
|
|
|
|
Want to use these tools directly inside Claude Desktop? |
|
|
Check out our [Client Setup Guide](CLIENT_SETUP.md) to connect this MCP server to your local Claude instance! |