Spaces:

Tannuyadav
/

DocTalk-Chat_With_PDF

Running

App Files Files Community

DocTalk-Chat_With_PDF / README.md

Tannuyadav

Update README.md

651b4cf verified 2 months ago

preview code

raw

history blame contribute delete

3 kB

	---
	title: DocTalk - Chat With PDF
	emoji: 📗💬
	colorFrom: indigo
	colorTo: pink
	sdk: streamlit
	sdk_version: "1.35.0"
	app_file: app.py
	pinned: false
	---

	# 📗💬 DocTalk - Chat With PDF

	An intelligent, completely free-to-run PDF chat application powered by Google's Gemma-2-2b-it model. Optimized for CPU usage on Hugging Face Spaces.

	## ✨ Features

	### 🤖 Core Engine
	* Model: Google Gemma-2-2B-IT (Instruction Tuned)
	* Architecture: Runs entirely locally on CPU (no GPU required)
	* Performance: Optimized with FAISS for instant vector retrieval

	### 🎯 Key Capabilities
	* ⚡ CPU Optimized - Runs smoothly on Hugging Face Free Tier
	* 📤 Easy Upload - Simple sidebar PDF upload
	* 🧠 Smart Context - Uses `all-MiniLM-L6-v2` for precise semantic search
	* 💬 Memory - Maintains chat history within the session
	* 🔒 Secure - Handles Hugging Face tokens via environment secrets

	## 🚀 How to Use

	### 1. Set Up Authentication
	* This app requires a Hugging Face Access Token (Read permissions) to download the Gemma model.
	* For Users: Enter your token in the app sidebar if prompted (or set it in Space secrets).

	### 2. Upload Your PDF
	* Navigate to the sidebar
	* Click "Browse files" to upload your PDF document
	* Click "🚀 Process Document"

	### 3. Start Chatting!
	* Wait for the "✅ Ready to chat!" notification
	* Type your question in the chat input at the bottom
	* Receive concise, context-aware answers from Gemma-2

	## 🛠️ Technical Stack

	* Frontend: Streamlit
	* LLM: google/gemma-2-2b-it
	* Embeddings: sentence-transformers/all-MiniLM-L6-v2
	* Vector Store: FAISS (Facebook AI Similarity Search)
	* PDF Processing: PyPDFLoader
	* Orchestration: LangChain

	## 📦 Installation (Local)

	To run this app on your own machine:

	https://huggingface.co/spaces/ChiragKaushikCK/Chat_with_PDF

	🌟 Features Breakdown
	FAISS Vector Search
	Replaces heavy database lookups with lightweight, in-memory similarity search.

	Ensures responses are strictly grounded in your uploaded document.

	Pre-loaded Models
	The embedding models are cached (@st.cache_resource) to ensure the app feels snappy after the initial cold start.

	Gemma-2-2B-IT
	Google's latest lightweight open model.

	Instruction-tuned for better Q&A performance compared to base models.

	Small enough (~2.6B params) to fit in standard RAM.

	⚠️ Limitations
	Speed: Since this runs on CPU, generating long answers may take a few seconds.

	Memory: Designed for standard PDFs. Extremely large files (500+ pages) might hit RAM limits on free tiers.

	Session: Chat history is cleared if the page is refreshed.

	🤝 Contributing
	Contributions are welcome! Please feel free to submit issues or pull requests to improve the UI or add new features.

	📄 License
	MIT License

	🔗 Links
	Google Gemma Models

	LangChain Documentation

	Streamlit

	<div align="center"> Made with ❤️ with Streamlit and Gemma model, by Tannu Yadav </div>