Tannuyadav commited on
Commit
651b4cf
Β·
verified Β·
1 Parent(s): c15510c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -14
README.md CHANGED
@@ -1,20 +1,97 @@
1
  ---
2
- title: DocTalk-Chat With PDF
3
- emoji: πŸš€
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
  pinned: false
11
- short_description: ' DocTalk-Chat_With_PDF '
12
- license: mit
13
  ---
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: DocTalk - Chat With PDF
3
+ emoji: πŸ“—πŸ’¬
4
+ colorFrom: indigo
5
+ colorTo: pink
6
+ sdk: streamlit
7
+ sdk_version: "1.35.0"
8
+ app_file: app.py
 
9
  pinned: false
 
 
10
  ---
11
 
12
+ # πŸ“—πŸ’¬ DocTalk - Chat With PDF
13
 
14
+ An intelligent, completely free-to-run PDF chat application powered by Google's Gemma-2-2b-it model. Optimized for CPU usage on Hugging Face Spaces.
15
 
16
+ ## ✨ Features
17
+
18
+ ### πŸ€– **Core Engine**
19
+ * **Model:** Google Gemma-2-2B-IT (Instruction Tuned)
20
+ * **Architecture:** Runs entirely locally on CPU (no GPU required)
21
+ * **Performance:** Optimized with FAISS for instant vector retrieval
22
+
23
+ ### 🎯 **Key Capabilities**
24
+ * ⚑ **CPU Optimized** - Runs smoothly on Hugging Face Free Tier
25
+ * πŸ“€ **Easy Upload** - Simple sidebar PDF upload
26
+ * 🧠 **Smart Context** - Uses `all-MiniLM-L6-v2` for precise semantic search
27
+ * πŸ’¬ **Memory** - Maintains chat history within the session
28
+ * πŸ”’ **Secure** - Handles Hugging Face tokens via environment secrets
29
+
30
+ ## πŸš€ How to Use
31
+
32
+ ### 1. Set Up Authentication
33
+ * This app requires a **Hugging Face Access Token** (Read permissions) to download the Gemma model.
34
+ * **For Users:** Enter your token in the app sidebar if prompted (or set it in Space secrets).
35
+
36
+ ### 2. Upload Your PDF
37
+ * Navigate to the sidebar
38
+ * Click "Browse files" to upload your PDF document
39
+ * Click **"πŸš€ Process Document"**
40
+
41
+ ### 3. Start Chatting!
42
+ * Wait for the "βœ… Ready to chat!" notification
43
+ * Type your question in the chat input at the bottom
44
+ * Receive concise, context-aware answers from Gemma-2
45
+
46
+ ## πŸ› οΈ Technical Stack
47
+
48
+ * **Frontend**: Streamlit
49
+ * **LLM**: google/gemma-2-2b-it
50
+ * **Embeddings**: sentence-transformers/all-MiniLM-L6-v2
51
+ * **Vector Store**: FAISS (Facebook AI Similarity Search)
52
+ * **PDF Processing**: PyPDFLoader
53
+ * **Orchestration**: LangChain
54
+
55
+ ## πŸ“¦ Installation (Local)
56
+
57
+ To run this app on your own machine:
58
+
59
+ https://huggingface.co/spaces/ChiragKaushikCK/Chat_with_PDF
60
+
61
+ **🌟 Features Breakdown**
62
+ FAISS Vector Search
63
+ Replaces heavy database lookups with lightweight, in-memory similarity search.
64
+
65
+ Ensures responses are strictly grounded in your uploaded document.
66
+
67
+ Pre-loaded Models
68
+ The embedding models are cached (@st.cache_resource) to ensure the app feels snappy after the initial cold start.
69
+
70
+ Gemma-2-2B-IT
71
+ Google's latest lightweight open model.
72
+
73
+ Instruction-tuned for better Q&A performance compared to base models.
74
+
75
+ Small enough (~2.6B params) to fit in standard RAM.
76
+
77
+ **⚠️ Limitations**
78
+ Speed: Since this runs on CPU, generating long answers may take a few seconds.
79
+
80
+ Memory: Designed for standard PDFs. Extremely large files (500+ pages) might hit RAM limits on free tiers.
81
+
82
+ Session: Chat history is cleared if the page is refreshed.
83
+
84
+ 🀝 Contributing
85
+ Contributions are welcome! Please feel free to submit issues or pull requests to improve the UI or add new features.
86
+
87
+ πŸ“„ License
88
+ MIT License
89
+
90
+ πŸ”— Links
91
+ Google Gemma Models
92
+
93
+ LangChain Documentation
94
+
95
+ Streamlit
96
+
97
+ <div align="center"> Made with ❀️ with Streamlit and Gemma model, by Tannu Yadav </div>