|
|
--- |
|
|
title: DocTalk - Chat With PDF |
|
|
emoji: ππ¬ |
|
|
colorFrom: indigo |
|
|
colorTo: pink |
|
|
sdk: streamlit |
|
|
sdk_version: "1.35.0" |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
# ππ¬ DocTalk - Chat With PDF |
|
|
|
|
|
An intelligent, completely free-to-run PDF chat application powered by Google's Gemma-2-2b-it model. Optimized for CPU usage on Hugging Face Spaces. |
|
|
|
|
|
## β¨ Features |
|
|
|
|
|
### π€ **Core Engine** |
|
|
* **Model:** Google Gemma-2-2B-IT (Instruction Tuned) |
|
|
* **Architecture:** Runs entirely locally on CPU (no GPU required) |
|
|
* **Performance:** Optimized with FAISS for instant vector retrieval |
|
|
|
|
|
### π― **Key Capabilities** |
|
|
* β‘ **CPU Optimized** - Runs smoothly on Hugging Face Free Tier |
|
|
* π€ **Easy Upload** - Simple sidebar PDF upload |
|
|
* π§ **Smart Context** - Uses `all-MiniLM-L6-v2` for precise semantic search |
|
|
* π¬ **Memory** - Maintains chat history within the session |
|
|
* π **Secure** - Handles Hugging Face tokens via environment secrets |
|
|
|
|
|
## π How to Use |
|
|
|
|
|
### 1. Set Up Authentication |
|
|
* This app requires a **Hugging Face Access Token** (Read permissions) to download the Gemma model. |
|
|
* **For Users:** Enter your token in the app sidebar if prompted (or set it in Space secrets). |
|
|
|
|
|
### 2. Upload Your PDF |
|
|
* Navigate to the sidebar |
|
|
* Click "Browse files" to upload your PDF document |
|
|
* Click **"π Process Document"** |
|
|
|
|
|
### 3. Start Chatting! |
|
|
* Wait for the "β
Ready to chat!" notification |
|
|
* Type your question in the chat input at the bottom |
|
|
* Receive concise, context-aware answers from Gemma-2 |
|
|
|
|
|
## π οΈ Technical Stack |
|
|
|
|
|
* **Frontend**: Streamlit |
|
|
* **LLM**: google/gemma-2-2b-it |
|
|
* **Embeddings**: sentence-transformers/all-MiniLM-L6-v2 |
|
|
* **Vector Store**: FAISS (Facebook AI Similarity Search) |
|
|
* **PDF Processing**: PyPDFLoader |
|
|
* **Orchestration**: LangChain |
|
|
|
|
|
## π¦ Installation (Local) |
|
|
|
|
|
To run this app on your own machine: |
|
|
|
|
|
https://huggingface.co/spaces/ChiragKaushikCK/Chat_with_PDF |
|
|
|
|
|
**π Features Breakdown** |
|
|
FAISS Vector Search |
|
|
Replaces heavy database lookups with lightweight, in-memory similarity search. |
|
|
|
|
|
Ensures responses are strictly grounded in your uploaded document. |
|
|
|
|
|
Pre-loaded Models |
|
|
The embedding models are cached (@st.cache_resource) to ensure the app feels snappy after the initial cold start. |
|
|
|
|
|
Gemma-2-2B-IT |
|
|
Google's latest lightweight open model. |
|
|
|
|
|
Instruction-tuned for better Q&A performance compared to base models. |
|
|
|
|
|
Small enough (~2.6B params) to fit in standard RAM. |
|
|
|
|
|
**β οΈ Limitations** |
|
|
Speed: Since this runs on CPU, generating long answers may take a few seconds. |
|
|
|
|
|
Memory: Designed for standard PDFs. Extremely large files (500+ pages) might hit RAM limits on free tiers. |
|
|
|
|
|
Session: Chat history is cleared if the page is refreshed. |
|
|
|
|
|
π€ Contributing |
|
|
Contributions are welcome! Please feel free to submit issues or pull requests to improve the UI or add new features. |
|
|
|
|
|
π License |
|
|
MIT License |
|
|
|
|
|
π Links |
|
|
Google Gemma Models |
|
|
|
|
|
LangChain Documentation |
|
|
|
|
|
Streamlit |
|
|
|
|
|
<div align="center"> Made with β€οΈ with Streamlit and Gemma model, by Tannu Yadav </div> |
|
|
|