Spaces:

MCP-1st-Birthday
/

AI-Digital-Library-Assistant

Running

App Files Files Community

AI-Digital-Library-Assistant / README.md

Nihal2000

Update README.md

7573623 verified 14 days ago

preview code

raw

history blame contribute delete

10.1 kB

	---
	title: AI Digital Library Assistant
	emoji: 📚
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: "5.38.0"
	app_file: app.py
	pinned: false
	license: mit
	tags:
	- mcp-in-action-track-consumer
	- mcp-in-action-track-creative
	- building-mcp-track-consumer
	- building-mcp-track-creative
	- MCP-1st-Birthday
	---

	Demo Link : https://youtu.be/09Lls0zJ-QE

	Social media post Link : https://x.com/nihald2000/status/1995198714156286290?s=20

	The AI Digital Library Assistant is a next-generation knowledge management tool built for the MCP 1st Birthday Hackathon. It transforms your static document collection into an interactive, living library.

	Unlike traditional RAG (Retrieval Augmented Generation) apps, this project leverages the Model Context Protocol (MCP) to create a modular ecosystem of tools—Ingestion, Search, and Podcast Generation—that work harmoniously to help you consume information in the way that suits you best.

	```mermaid
	graph TD
	User((👤 User))

	subgraph "Frontend (Gradio)"
	UI[Web Interface]
	PodcastUI[Podcast Studio]
	end

	subgraph "MCP Server Layer"
	MCPServer[Content Organizer MCP Server]

	subgraph "MCP Tools"
	IngestTool[📥 Ingestion Tool]
	SearchTool[🔍 Search Tool]
	GenTool[✨ Generative Tool]
	PodTool[🎧 Podcast Tool]
	end
	end

	subgraph "Service Layer"
	VecStore[(Vector Store)]
	DocStore[(Document Store)]
	LLM[LLM Service (OpenAI / Nebius AI)]
	ElevenLabs[ElevenLabs API]
	LlamaIndex[LlamaIndex Agent]
	end

	User <--> UI
	UI <--> MCPServer

	MCPServer --> IngestTool
	MCPServer --> SearchTool
	MCPServer --> GenTool
	MCPServer --> PodTool

	IngestTool --> VecStore
	IngestTool --> DocStore
	SearchTool --> VecStore
	GenTool --> LLM
	PodTool --> LlamaIndex
	PodTool --> ElevenLabs
	PodTool --> LLM
	```

	![AI LIB](https://cdn-uploads.huggingface.co/production/uploads/66f1712d906c08084995f808/TSJexR45eNpUjHhbHDOag.png)


	## 🚀 Quick Start

	Check out [QUICKSTART.md](QUICKSTART.md) for detailed local setup instructions.

	1. Clone & Install:
	```bash
	git clone https://huggingface.co/spaces/Nihal2000/AiDigitalLibraryAssistant
	pip install -r requirements.txt
	```
	2. Configure: Add your `OPENAI_API_KEY` and `ELEVENLABS_API_KEY` to `.env`.
	3. Run: `python app.py`

	## 💡 How It Works

	### 1. The MCP Core
	At the heart of the application is the `AiDigitalLibraryAssistant`. It exposes atomic capabilities (Tools) that the frontend consumes. This means the same tools powering this UI could be connected to Claude Desktop or any other MCP client!

	```bash
	{
	"mcpServers": {
	"ai-library": {
	"command": "npx",
	"args": [
	"-y",
	"mcp-remote",
	"https://mcp-1st-birthday-ai-digital-library-assistant.hf.space/gradio_api/mcp/sse"
	]
	}
	}
	}
	```

	### 2. 🎧 Podcast Studio (Star Feature)
	Turn your reading list into a playlist! The Podcast Studio is a flagship feature that transforms any selection of documents into an engaging, multi-speaker audio podcast.
	- Intelligent Scripting: Uses LlamaIndex and OpenAI/Nebius AI to analyze your documents and generate a natural, conversational script.
	- Multi-Speaker Synthesis: Leverages ElevenLabs to bring the script to life with distinct, realistic voices for each host.
	- Customizable: Choose your style (Educational, Casual, Teaching) and duration.


	### ✨ Features
	## 📚 Document Management

	Multi-format Support: PDF, DOCX, TXT, and image files (PNG, JPG, JPEG)
	Intelligent OCR: Automatic text extraction from images and scanned documents
	Semantic Chunking: Documents automatically split into meaningful segments for better retrieval
	Metadata Tracking: Comprehensive document metadata including file size, type, creation date, and custom tags
	Vector Embeddings: All documents indexed with dense vector embeddings for semantic search

	## 🔍 Advanced Search

	Semantic Search: Find documents by meaning, not just keywords
	Configurable Results: Adjust the number of results (1-20) based on your needs
	Relevance Scoring: Each result includes a confidence score
	Source Attribution: Direct links to source documents with highlighted excerpts

	## 🎨 Content Studio
	Transform your documents with 8 powerful AI tools:

	Summarize: Generate concise, detailed, bullet-point, or executive summaries
	Generate Outline: Create structured outlines from topics or documents (3-10 sections)
	Explain Concept: Get explanations tailored to different audiences (general, technical, beginner, expert)
	Paraphrase: Rewrite text in various styles (formal, casual, academic, simple, technical)
	Categorize: Automatically classify content into user-defined categories
	Key Insights: Extract the most important points from any document
	Generate Questions: Create comprehension, analysis, application, creative, or factual questions
	Extract Key Info: Pull out structured information (entities, dates, facts) in JSON format

	## 🏷️ Smart Tagging

	AI-Generated Tags: Automatically generate 3-15 relevant tags for any document
	Persistent Storage: Tags saved directly to document metadata
	Batch Processing: Tag multiple documents or custom text snippets

	## ❓ RAG-Powered Q&A

	Context-Aware Answers: Ask questions and get answers grounded in your documents
	Source Citations: Every answer includes relevant source excerpts
	Confidence Scoring: Transparency about answer reliability
	Multi-Document Synthesis: Answers can draw from multiple documents simultaneously

	## 🎙️ Podcast Studio
	Convert documents into engaging audio conversations:

	AI Voice Generation: Ultra-realistic voices powered by ElevenLabs
	Two-Host Format: Dynamic dialogue between two AI personalities
	Multiple Styles: Conversational, educational, technical, or casual
	Custom Duration: 5-30 minute podcasts
	Voice Selection: Choose from 7+ professional AI voices
	Full Transcripts: Complete text transcripts for every generated podcast
	Podcast Library: Browse, play, and manage all generated podcasts

	## 📊 Dashboard & Analytics

	Real-time Stats: Track total documents, vector chunks, and storage usage
	Recent Activity: View recently added documents at a glance
	System Health: Monitor vector store, LLM service, and voice service status

	## Data Flow

	## Document Ingestion:
	- Files → OCR → Text Extraction → Chunking → Embedding Generation → Vector Store
	## Semantic Search:
	- Query → Embedding → Vector Search → Relevance Ranking → Results
	## RAG Q&A:
	- Question → Search → Context Retrieval → LLM Generation → Answer + Sources
	## Podcast Generation:
	- Documents → Content Analysis → Script Generation → Voice Synthesis → Audio File

	### Basic Workflow
	1. Upload Documents
	Navigate to the "📄 Upload Documents" tab:

	Click "Select a document" or drag-and-drop files
	Supported formats: PDF, DOCX, TXT, PNG, JPG, JPEG
	Click "🚀 Process & Add to Library"
	Wait for processing to complete (OCR runs automatically for images)
	Note the Document ID from the output

	2. Search Your Library
	Go to "🔍 Search Documents":

	Enter a natural language query (e.g., "What are the key findings about climate change?")
	Adjust "Number of Results" slider (1-20)
	Click "🔍 Search"
	Review results with relevance scores and source excerpts

	3. Ask Questions
	Navigate to "❓ Ask Questions":

	Type your question about uploaded documents
	Click "❓ Get Answer"
	Receive AI-generated answer with source citations
	Check confidence level and review source documents

	4. Generate Content
	Open "📝 Content Studio":

	Select a document from dropdown OR paste custom text
	Choose a task from the dropdown:

	Summarize, Outline, Explain, Paraphrase, etc.


	Configure task-specific options in "⚙️ Advanced Options"
	Click "🚀 Run Task"
	Copy or download the generated content

	5. Create Podcasts
	Visit "🎧 Podcast Studio":

	Select 1-5 documents using checkboxes
	Choose Style (conversational, educational, technical, casual)
	Set Duration (5-30 minutes)
	Select voices for Host 1 and Host 2
	Click "🎙️ Generate Podcast"
	Listen to the generated audio and read the transcript
	Browse past podcasts in the Podcast Library

	6. Generate Tags
	Go to "🏷️ Generate Tags":

	Select a document OR paste custom text
	Adjust "Number of Tags" slider (3-15)
	Click "🏷️ Generate Tags"

	## 🏆 Hackathon Tracks

	We are submitting to:
	- Building MCP: For our custom `AiDigitalLibraryAssistant` MCP server implementation.
	- MCP in Action (Consumer/Creative): For the innovative Podcast interface that makes personal knowledge management accessible and fun.

	## 📜 License

	MIT License. Built with ❤️ for the AI community.

	## 🙏 Acknowledgements & Sponsors

	This project was built for the MCP 1st Birthday Hackathon and proudly leverages technology from:

	- [OpenAI](https://openai.com): Providing the foundational intelligence for our document analysis and content generation.
	- [Nebius AI](https://nebius.com): Powering our high-performance inference needs.
	- [LlamaIndex](https://www.llamaindex.ai): The backbone of our data orchestration, enabling sophisticated RAG and agentic workflows for the Podcast Studio.
	- [ElevenLabs](https://elevenlabs.io): Bringing our podcasts to life with industry-leading, hyper-realistic text-to-speech.
	- [Hugging Face](https://huggingface.co): Hosting our application on Spaces and providing the Gradio framework for our beautiful, responsive UI.
	- [Anthropic](https://anthropic.com): For pioneering the Model Context Protocol (MCP) that makes this modular architecture possible.

	## 🔌 Connect to Claude

	Want to use these tools directly inside Claude Desktop?
	Check out our [Client Setup Guide](CLIENT_SETUP.md) to connect this MCP server to your local Claude instance!