Spaces:

ChiragKaushikCK
/

Chat_with_PDF

Runtime error

App Files Files Community

ChiragKaushikCK commited on Dec 10, 2025

Commit

b34c81b

verified ·

1 Parent(s): 854bfaa

Update README.md

Browse files

Files changed (1) hide show

README.md +57 -102

README.md CHANGED Viewed

@@ -1,142 +1,97 @@
 ---
-title: Advanced PDF Chat Assistant
-emoji: 📚
-colorFrom: purple
-colorTo: blue
 sdk: streamlit
-sdk_version: "1.31.0"
 app_file: app.py
 pinned: false
 ---
-# 📚 Advanced PDF Chat Assistant
-An intelligent PDF chat application with multiple AI models and interactive preprocessing.
 ## ✨ Features
-### 🤖 **Three Modes of Operation**
-1. **Standard Chat** - Vector-based retrieval with Google FLAN-T5
-2. **Advanced Chat** - Powered by Google Gemini 2.5 Flash (requires API key)
-3. **Document Summary** - BART-powered intelligent summaries
 ### 🎯 **Key Capabilities**
-- 📤 **Easy PDF Upload** - Drag and drop interface
-- ⚙️ **Interactive Preprocessing** - Customize chunk size, overlap, and text cleaning
-- 💬 **Conversational AI** - Natural language conversations with your documents
-- 📊 **Document Statistics** - Real-time processing metrics
-- 🎨 **Beautiful UI** - Modern, gradient-based design with animations
-- ⚡ **Quick Actions** - Pre-defined questions for instant insights
-- 📥 **Export Summaries** - Download generated summaries
 ## 🚀 How to Use
-### 1. Upload Your PDF
-- Click "Browse files" or drag and drop your PDF in the sidebar
-### 2. Configure Processing (Optional)
-- Expand "Advanced Preprocessing" to customize:
-  - Chunk Size (500-2000 characters)
-  - Chunk Overlap (0-500 characters)
-  - Text cleaning options
-### 3. Select Mode
-- **Standard Chat**: Best for general Q&A with embedded knowledge
-- **Advanced Chat**: Requires Gemini API key, best for complex reasoning
-- **Document Summary**: Generate concise summaries
-### 4. Process PDF
-- Click "🚀 Process PDF" button
-- Wait for processing to complete
-### 5. Start Chatting!
-- Type your questions in the chat box
-- Use Quick Actions for common queries
-- View chat history in real-time
-## 🔑 Getting a Gemini API Key
-For Advanced Chat mode:
-1. Visit [Google AI Studio](https://makersuite.google.com/app/apikey)
-2. Sign in with your Google account
-3. Create a new API key
-4. Paste it in the sidebar
 ## 🛠️ Technical Stack
-- **Frontend**: Streamlit
-- **PDF Processing**: PyPDF2
-- **Embeddings**: sentence-transformers/all-MiniLM-L6-v2
-- **Vector Store**: FAISS
-- **LLM (Standard)**: Google FLAN-T5-base
-- **Summarization**: facebook/bart-large-cnn
-- **LLM (Advanced)**: Google Gemini 2.5 Flash
-- **Framework**: LangChain
 ## 📦 Installation (Local)
-```bash
-git clone <your-repo>
-cd <your-repo>
-pip install -r requirements.txt
-streamlit run app.py
-```
-## 🌟 Features Breakdown
-### Standard Chat Mode
-- Uses vector embeddings for semantic search
-- FLAN-T5 model generates responses
-- Maintains conversation history
-- Fast and efficient
-### Advanced Chat Mode (Gemini)
-- Direct PDF processing with Gemini
-- Superior understanding of document context
-- Advanced reasoning capabilities
-- Requires API key
-### Document Summary Mode
-- BART model trained on CNN/DailyMail
-- Customizable summary length
-- Extract key points efficiently
-- Downloadable summaries
-## 💡 Tips
-1. **For best results**: Use clear, specific questions
-2. **Long documents**: Standard mode chunks documents for better processing
-3. **Complex queries**: Use Advanced (Gemini) mode for deep reasoning
-4. **Quick insights**: Try the Quick Action buttons
-## 📝 Example Questions
-- "What is this document about?"
-- "Summarize the main findings"
-- "What are the key recommendations?"
-- "Explain the methodology used"
-- "List all important dates mentioned"
-## ⚠️ Limitations
-- Maximum PDF size: Depends on Hugging Face Space limits
-- Gemini API: Requires valid API key and has usage limits
-- Processing time: Varies based on document size
-## 🤝 Contributing
-Contributions are welcome! Please feel free to submit issues or pull requests.
-## 📄 License
 MIT License
-## 🔗 Links
-- [Hugging Face Spaces](https://huggingface.co/spaces)
-- [LangChain Documentation](https://python.langchain.com/)
-- [Google Gemini API](https://ai.google.dev/)
----
-Made with ❤️ using Streamlit and Hugging Face

 ---
+title: DocTalk - Chat With PDF
+emoji:  📗💬
+colorFrom: indigo
+colorTo: pink
 sdk: streamlit
+sdk_version: "1.35.0"
 app_file: app.py
 pinned: false
 ---
+# 📗💬 DocTalk - Chat With PDF
+An intelligent, completely free-to-run PDF chat application powered by Google's Gemma-2-2b-it model. Optimized for CPU usage on Hugging Face Spaces.
 ## ✨ Features
+### 🤖 **Core Engine**
+* **Model:** Google Gemma-2-2B-IT (Instruction Tuned)
+* **Architecture:** Runs entirely locally on CPU (no GPU required)
+* **Performance:** Optimized with FAISS for instant vector retrieval
 ### 🎯 **Key Capabilities**
+* ⚡ **CPU Optimized** - Runs smoothly on Hugging Face Free Tier
+* 📤 **Easy Upload** - Simple sidebar PDF upload
+* 🧠 **Smart Context** - Uses `all-MiniLM-L6-v2` for precise semantic search
+* 💬 **Memory** - Maintains chat history within the session
+* 🔒 **Secure** - Handles Hugging Face tokens via environment secrets
 ## 🚀 How to Use
+### 1. Set Up Authentication
+* This app requires a **Hugging Face Access Token** (Read permissions) to download the Gemma model.
+* **For Users:** Enter your token in the app sidebar if prompted (or set it in Space secrets).
+### 2. Upload Your PDF
+* Navigate to the sidebar
+* Click "Browse files" to upload your PDF document
+* Click **"🚀 Process Document"**
+### 3. Start Chatting!
+* Wait for the "✅ Ready to chat!" notification
+* Type your question in the chat input at the bottom
+* Receive concise, context-aware answers from Gemma-2
 ## 🛠️ Technical Stack
+* **Frontend**: Streamlit
+* **LLM**: google/gemma-2-2b-it
+* **Embeddings**: sentence-transformers/all-MiniLM-L6-v2
+* **Vector Store**: FAISS (Facebook AI Similarity Search)
+* **PDF Processing**: PyPDFLoader
+* **Orchestration**: LangChain
 ## 📦 Installation (Local)
+To run this app on your own machine:
+https://huggingface.co/spaces/ChiragKaushikCK/Chat_with_PDF
+**🌟 Features Breakdown**
+FAISS Vector Search
+Replaces heavy database lookups with lightweight, in-memory similarity search.
+Ensures responses are strictly grounded in your uploaded document.
+Pre-loaded Models
+The embedding models are cached (@st.cache_resource) to ensure the app feels snappy after the initial cold start.
+Gemma-2-2B-IT
+Google's latest lightweight open model.
+Instruction-tuned for better Q&A performance compared to base models.
+Small enough (~2.6B params) to fit in standard RAM.
+**⚠️ Limitations**
+Speed: Since this runs on CPU, generating long answers may take a few seconds.
+Memory: Designed for standard PDFs. Extremely large files (500+ pages) might hit RAM limits on free tiers.
+Session: Chat history is cleared if the page is refreshed.
+🤝 Contributing
+Contributions are welcome! Please feel free to submit issues or pull requests to improve the UI or add new features.
+📄 License
 MIT License
+🔗 Links
+Google Gemma Models
+LangChain Documentation
+Streamlit
+<div align="center"> Made with ❤️ with Streamlit and Gemma model, by Tannu Yadav </div>