Nihal2000's picture
Update README.md
7573623 verified
---
title: AI Digital Library Assistant
emoji: πŸ“š
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: "5.38.0"
app_file: app.py
pinned: false
license: mit
tags:
- mcp-in-action-track-consumer
- mcp-in-action-track-creative
- building-mcp-track-consumer
- building-mcp-track-creative
- MCP-1st-Birthday
---
Demo Link : https://youtu.be/09Lls0zJ-QE
Social media post Link : https://x.com/nihald2000/status/1995198714156286290?s=20
The **AI Digital Library Assistant** is a next-generation knowledge management tool built for the **MCP 1st Birthday Hackathon**. It transforms your static document collection into an interactive, living library.
Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the **Model Context Protocol (MCP)** to create a modular ecosystem of toolsβ€”Ingestion, Search, and Podcast Generationβ€”that work harmoniously to help you consume information in the way that suits *you* best.
```mermaid
graph TD
User((πŸ‘€ User))
subgraph "Frontend (Gradio)"
UI[Web Interface]
PodcastUI[Podcast Studio]
end
subgraph "MCP Server Layer"
MCPServer[Content Organizer MCP Server]
subgraph "MCP Tools"
IngestTool[πŸ“₯ Ingestion Tool]
SearchTool[πŸ” Search Tool]
GenTool[✨ Generative Tool]
PodTool[🎧 Podcast Tool]
end
end
subgraph "Service Layer"
VecStore[(Vector Store)]
DocStore[(Document Store)]
LLM[LLM Service (OpenAI / Nebius AI)]
ElevenLabs[ElevenLabs API]
LlamaIndex[LlamaIndex Agent]
end
User <--> UI
UI <--> MCPServer
MCPServer --> IngestTool
MCPServer --> SearchTool
MCPServer --> GenTool
MCPServer --> PodTool
IngestTool --> VecStore
IngestTool --> DocStore
SearchTool --> VecStore
GenTool --> LLM
PodTool --> LlamaIndex
PodTool --> ElevenLabs
PodTool --> LLM
```
![AI LIB](https://cdn-uploads.huggingface.co/production/uploads/66f1712d906c08084995f808/TSJexR45eNpUjHhbHDOag.png)
## πŸš€ Quick Start
Check out [QUICKSTART.md](QUICKSTART.md) for detailed local setup instructions.
1. **Clone & Install**:
```bash
git clone https://huggingface.co/spaces/Nihal2000/AiDigitalLibraryAssistant
pip install -r requirements.txt
```
2. **Configure**: Add your `OPENAI_API_KEY` and `ELEVENLABS_API_KEY` to `.env`.
3. **Run**: `python app.py`
## πŸ’‘ How It Works
### 1. The MCP Core
At the heart of the application is the `AiDigitalLibraryAssistant`. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client!
```bash
{
"mcpServers": {
"ai-library": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://mcp-1st-birthday-ai-digital-library-assistant.hf.space/gradio_api/mcp/sse"
]
}
}
}
```
### 2. 🎧 Podcast Studio (Star Feature)
Turn your reading list into a playlist! The **Podcast Studio** is a flagship feature that transforms any selection of documents into an engaging, multi-speaker audio podcast.
- **Intelligent Scripting**: Uses **LlamaIndex** and **OpenAI/Nebius AI** to analyze your documents and generate a natural, conversational script.
- **Multi-Speaker Synthesis**: Leverages **ElevenLabs** to bring the script to life with distinct, realistic voices for each host.
- **Customizable**: Choose your style (Educational, Casual, Teaching) and duration.
### ✨ Features
## πŸ“š Document Management
Multi-format Support: PDF, DOCX, TXT, and image files (PNG, JPG, JPEG)
Intelligent OCR: Automatic text extraction from images and scanned documents
Semantic Chunking: Documents automatically split into meaningful segments for better retrieval
Metadata Tracking: Comprehensive document metadata including file size, type, creation date, and custom tags
Vector Embeddings: All documents indexed with dense vector embeddings for semantic search
## πŸ” Advanced Search
Semantic Search: Find documents by meaning, not just keywords
Configurable Results: Adjust the number of results (1-20) based on your needs
Relevance Scoring: Each result includes a confidence score
Source Attribution: Direct links to source documents with highlighted excerpts
## 🎨 Content Studio
Transform your documents with 8 powerful AI tools:
Summarize: Generate concise, detailed, bullet-point, or executive summaries
Generate Outline: Create structured outlines from topics or documents (3-10 sections)
Explain Concept: Get explanations tailored to different audiences (general, technical, beginner, expert)
Paraphrase: Rewrite text in various styles (formal, casual, academic, simple, technical)
Categorize: Automatically classify content into user-defined categories
Key Insights: Extract the most important points from any document
Generate Questions: Create comprehension, analysis, application, creative, or factual questions
Extract Key Info: Pull out structured information (entities, dates, facts) in JSON format
## 🏷️ Smart Tagging
AI-Generated Tags: Automatically generate 3-15 relevant tags for any document
Persistent Storage: Tags saved directly to document metadata
Batch Processing: Tag multiple documents or custom text snippets
## ❓ RAG-Powered Q&A
Context-Aware Answers: Ask questions and get answers grounded in your documents
Source Citations: Every answer includes relevant source excerpts
Confidence Scoring: Transparency about answer reliability
Multi-Document Synthesis: Answers can draw from multiple documents simultaneously
## πŸŽ™οΈ Podcast Studio
Convert documents into engaging audio conversations:
AI Voice Generation: Ultra-realistic voices powered by ElevenLabs
Two-Host Format: Dynamic dialogue between two AI personalities
Multiple Styles: Conversational, educational, technical, or casual
Custom Duration: 5-30 minute podcasts
Voice Selection: Choose from 7+ professional AI voices
Full Transcripts: Complete text transcripts for every generated podcast
Podcast Library: Browse, play, and manage all generated podcasts
## πŸ“Š Dashboard & Analytics
Real-time Stats: Track total documents, vector chunks, and storage usage
Recent Activity: View recently added documents at a glance
System Health: Monitor vector store, LLM service, and voice service status
## Data Flow
## Document Ingestion:
- Files β†’ OCR β†’ Text Extraction β†’ Chunking β†’ Embedding Generation β†’ Vector Store
## Semantic Search:
- Query β†’ Embedding β†’ Vector Search β†’ Relevance Ranking β†’ Results
## RAG Q&A:
- Question β†’ Search β†’ Context Retrieval β†’ LLM Generation β†’ Answer + Sources
## Podcast Generation:
- Documents β†’ Content Analysis β†’ Script Generation β†’ Voice Synthesis β†’ Audio File
### Basic Workflow
1. Upload Documents
Navigate to the "πŸ“„ Upload Documents" tab:
Click "Select a document" or drag-and-drop files
Supported formats: PDF, DOCX, TXT, PNG, JPG, JPEG
Click "πŸš€ Process & Add to Library"
Wait for processing to complete (OCR runs automatically for images)
Note the Document ID from the output
2. Search Your Library
Go to "πŸ” Search Documents":
Enter a natural language query (e.g., "What are the key findings about climate change?")
Adjust "Number of Results" slider (1-20)
Click "πŸ” Search"
Review results with relevance scores and source excerpts
3. Ask Questions
Navigate to "❓ Ask Questions":
Type your question about uploaded documents
Click "❓ Get Answer"
Receive AI-generated answer with source citations
Check confidence level and review source documents
4. Generate Content
Open "πŸ“ Content Studio":
Select a document from dropdown OR paste custom text
Choose a task from the dropdown:
Summarize, Outline, Explain, Paraphrase, etc.
Configure task-specific options in "βš™οΈ Advanced Options"
Click "πŸš€ Run Task"
Copy or download the generated content
5. Create Podcasts
Visit "🎧 Podcast Studio":
Select 1-5 documents using checkboxes
Choose Style (conversational, educational, technical, casual)
Set Duration (5-30 minutes)
Select voices for Host 1 and Host 2
Click "πŸŽ™οΈ Generate Podcast"
Listen to the generated audio and read the transcript
Browse past podcasts in the Podcast Library
6. Generate Tags
Go to "🏷️ Generate Tags":
Select a document OR paste custom text
Adjust "Number of Tags" slider (3-15)
Click "🏷️ Generate Tags"
## πŸ† Hackathon Tracks
We are submitting to:
- **Building MCP**: For our custom `AiDigitalLibraryAssistant` MCP server implementation.
- **MCP in Action (Consumer/Creative)**: For the innovative Podcast interface that makes personal knowledge management accessible and fun.
## πŸ“œ License
MIT License. Built with ❀️ for the AI community.
## πŸ™ Acknowledgements & Sponsors
This project was built for the **MCP 1st Birthday Hackathon** and proudly leverages technology from:
- **[OpenAI](https://openai.com)**: Providing the foundational intelligence for our document analysis and content generation.
- **[Nebius AI](https://nebius.com)**: Powering our high-performance inference needs.
- **[LlamaIndex](https://www.llamaindex.ai)**: The backbone of our data orchestration, enabling sophisticated RAG and agentic workflows for the Podcast Studio.
- **[ElevenLabs](https://elevenlabs.io)**: Bringing our podcasts to life with industry-leading, hyper-realistic text-to-speech.
- **[Hugging Face](https://huggingface.co)**: Hosting our application on **Spaces** and providing the **Gradio** framework for our beautiful, responsive UI.
- **[Anthropic](https://anthropic.com)**: For pioneering the **Model Context Protocol (MCP)** that makes this modular architecture possible.
## πŸ”Œ Connect to Claude
Want to use these tools directly inside Claude Desktop?
Check out our [Client Setup Guide](CLIENT_SETUP.md) to connect this MCP server to your local Claude instance!