Psytamaa commited on
Commit
7d89ecf
Β·
verified Β·
1 Parent(s): 0176fe0

Create README.MD

Browse files
Files changed (1) hide show
  1. README.MD +358 -0
README.MD ADDED
@@ -0,0 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: SAP Chatbot
3
+ emoji: πŸ€–
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: streamlit
7
+ sdk_version: 1.28.0
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # 🧩 SAP Intelligent Assistant
13
+
14
+ A free, open-source **RAG (Retrieval-Augmented Generation)** system for answering SAP-related questions using cloud LLMs and vector databases.
15
+
16
+ **Key Features:**
17
+ - βœ… 100% Free & Open Source (with paid options)
18
+ - βœ… Multi-source SAP data (Community, GitHub, StackOverflow, blogs)
19
+ - βœ… **Production-ready**: Supabase + pgvector for vector search
20
+ - βœ… HuggingFace Inference API for embeddings & generation
21
+ - βœ… Automatic ingestion via GitHub Actions
22
+ - βœ… Beautiful Streamlit UI
23
+ - βœ… Multi-user cloud hosting on HuggingFace Spaces
24
+ - βœ… Conversation history & source tracking
25
+
26
+ ---
27
+
28
+ ## πŸš€ Architecture
29
+
30
+ ```
31
+ Documents β†’ GitHub β†’ GitHub Actions β†’ Supabase (pgvector)
32
+ ↓
33
+ ingest.py
34
+ (embeddings)
35
+ ↓
36
+ Users β†’ HF Spaces
37
+ ↓
38
+ Streamlit App
39
+ (HF Inference API)
40
+ ↓
41
+ Vector Search (Supabase RPC)
42
+ ↓
43
+ Answer Generation
44
+ ```
45
+
46
+ ---
47
+
48
+ ## 🌐 Deploy to HuggingFace Spaces
49
+
50
+ **Share your chatbot with your entire team - for FREE!**
51
+
52
+ ### Quick Start (Production Setup)
53
+
54
+ πŸ‘‰ **[SUPABASE_SETUP.md](./SUPABASE_SETUP.md)** ← Start here for cloud deployment
55
+
56
+ ### Alternative: Local Setup (Offline)
57
+
58
+ Or follow: **[QUICKSTART_HF_SPACES.md](./QUICKSTART_HF_SPACES.md)**
59
+
60
+ **What you get:**
61
+ - βœ… Production database (Supabase pgvector)
62
+ - βœ… Automatic ingestion (GitHub Actions)
63
+ - βœ… Multi-user access (5+ concurrent)
64
+ - βœ… Zero cost (free tier)
65
+ - βœ… Auto-scaling infrastructure
66
+
67
+ ---
68
+
69
+ ### Option 1: Local (Offline) Setup with Ollama
70
+
71
+ **1. Install Ollama**
72
+ ```bash
73
+ # Download from https://ollama.ai
74
+ # Then start the server
75
+ ollama serve
76
+ ```
77
+
78
+ **2. Pull an LLM model**
79
+ ```bash
80
+ # Fast option (3B)
81
+ ollama pull neural-chat
82
+
83
+ # Or balanced (7B)
84
+ ollama pull mistral
85
+
86
+ # Or best quality (8x7B)
87
+ ollama pull dolphin-mixtral
88
+ ```
89
+
90
+ **3. Setup SAP Assistant**
91
+ ```bash
92
+ # Clone/setup the project
93
+ cd /Users/akshay/sap-chatboot
94
+
95
+ # Create virtual environment
96
+ python -m venv .venv
97
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
98
+
99
+ # Install dependencies
100
+ pip install -r requirements.txt
101
+
102
+ # Copy environment file
103
+ cp .env.example .env
104
+
105
+ # Build dataset from web
106
+ python tools/build_dataset.py
107
+
108
+ # Build vector index
109
+ python tools/embeddings.py
110
+
111
+ # Run the app
112
+ streamlit run app.py
113
+ ```
114
+
115
+ Open http://localhost:8501 in your browser!
116
+
117
+ ### Option 2: Cloud Setup (Replicate Free Tier)
118
+
119
+ **1. Get API Token**
120
+ - Sign up free at https://replicate.com
121
+ - Get your API token
122
+
123
+ **2. Setup**
124
+ ```bash
125
+ cd sap-chatboot
126
+ python -m venv .venv
127
+ source .venv/bin/activate
128
+ pip install -r requirements.txt
129
+
130
+ export REPLICATE_API_TOKEN="your_token_here"
131
+ python tools/build_dataset.py
132
+ python tools/embeddings.py
133
+
134
+ export LLM_PROVIDER=replicate
135
+ export LLM_MODEL=meta/llama-2-7b-chat
136
+ streamlit run app.py
137
+ ```
138
+
139
+ ### Option 3: HuggingFace Free Tier
140
+
141
+ **1. Get API Token**
142
+ - Create account at https://huggingface.co
143
+ - Get token from https://huggingface.co/settings/tokens
144
+
145
+ **2. Setup**
146
+ ```bash
147
+ cd sap-chatboot
148
+ python -m venv .venv
149
+ source .venv/bin/activate
150
+ pip install -r requirements.txt
151
+
152
+ export HF_API_TOKEN="your_token_here"
153
+ python tools/build_dataset.py
154
+ python tools/embeddings.py
155
+
156
+ export LLM_PROVIDER=huggingface
157
+ export LLM_MODEL="mistralai/Mistral-7B-Instruct-v0.1"
158
+ streamlit run app.py
159
+ ```
160
+
161
+ ## πŸ“Š Architecture
162
+
163
+ ```
164
+ Web Scraper (build_dataset.py)
165
+ β”œβ”€β”€ SAP Community
166
+ β”œβ”€β”€ GitHub Repos
167
+ β”œβ”€β”€ Dev.to
168
+ └── Tech Blogs
169
+ ↓
170
+ SAP Dataset (sap_dataset.json)
171
+ ↓
172
+ RAG Pipeline (embeddings.py)
173
+ β”œβ”€β”€ Chunk Management
174
+ β”œβ”€β”€ Embeddings (Sentence Transformers)
175
+ └── FAISS Vector Index
176
+ ↓
177
+ Vector Index (rag_index.faiss)
178
+ ↓
179
+ LLM Agent (agent.py)
180
+ β”œβ”€β”€ Ollama (Local)
181
+ β”œβ”€β”€ Replicate (Free)
182
+ └── HuggingFace (Free)
183
+ ↓
184
+ Streamlit UI (app.py)
185
+ β”œβ”€β”€ Chat Interface
186
+ └── Source Attribution
187
+ ```
188
+
189
+ ## πŸ“ Project Structure
190
+
191
+ ```
192
+ sap-chatboot/
193
+ β”œβ”€β”€ app.py # Main Streamlit UI
194
+ β”œβ”€β”€ config.py # Configuration & prompts
195
+ β”œβ”€β”€ requirements.txt # Python dependencies
196
+ β”œβ”€β”€ .env.example # Environment template
197
+ β”œβ”€β”€ README.md # This file
198
+ β”‚
199
+ β”œβ”€β”€ tools/
200
+ β”‚ β”œβ”€β”€ build_dataset.py # Web scraper for SAP data
201
+ β”‚ β”œβ”€β”€ embeddings.py # RAG pipeline & vector store
202
+ β”‚ └── agent.py # LLM agent with multiple providers
203
+ β”‚
204
+ └── data/
205
+ β”œβ”€β”€ sap_dataset.json # Scraped SAP knowledge base
206
+ β”œβ”€β”€ rag_index.faiss # Vector index
207
+ └── rag_metadata.pkl # Chunk metadata
208
+ ```
209
+
210
+ ## πŸ”§ Configuration
211
+
212
+ Create `.env` file (copy from `.env.example`):
213
+
214
+ ```env
215
+ # LLM Provider: ollama, replicate, or huggingface
216
+ LLM_PROVIDER=ollama
217
+ LLM_MODEL=mistral
218
+
219
+ # API Tokens (if using cloud providers)
220
+ REPLICATE_API_TOKEN=your_token
221
+ HF_API_TOKEN=your_token
222
+
223
+ # Embeddings model
224
+ EMBEDDINGS_MODEL=all-MiniLM-L6-v2
225
+
226
+ # RAG settings
227
+ RAG_TOP_K=5
228
+ RAG_CHUNK_SIZE=512
229
+ RAG_CHUNK_OVERLAP=100
230
+ ```
231
+
232
+ ## πŸ“š Available LLMs
233
+
234
+ ### Ollama (Local - Free)
235
+ | Model | Size | Speed | Quality |
236
+ |-------|------|-------|---------|
237
+ | Neural Chat | 3B | ⚑⚑⚑ | Good |
238
+ | Mistral | 7B | ⚑⚑ | Excellent |
239
+ | Dolphin Mixtral | 8x7B | ⚑ | Best |
240
+
241
+ ### Replicate (Free Tier)
242
+ - Llama 2 7B
243
+ - Mistral 7B
244
+ - And more open models
245
+
246
+ ### HuggingFace (Free Tier)
247
+ - Any HuggingFace text-generation model
248
+
249
+ ## πŸ” How It Works
250
+
251
+ 1. **Data Collection** (`build_dataset.py`)
252
+ - Scrapes SAP Community, StackOverflow, GitHub, dev.to, Medium, SAP Developers tutorials
253
+ - Saves structured JSON
254
+
255
+ 2. **Embeddings & Indexing** (`embeddings.py`)
256
+ - Splits documents into chunks
257
+ - Generates embeddings (Sentence Transformers)
258
+ - Builds FAISS vector index
259
+
260
+ 3. **Query & Answer** (`agent.py`)
261
+ - User asks question
262
+ - RAG retrieves relevant documents
263
+ - LLM generates answer with context
264
+ - Sources attributed
265
+
266
+ ## πŸ’‘ Supported Topics
267
+
268
+ βœ… SAP Basis Administration
269
+ βœ… SAP ABAP Development
270
+ βœ… SAP HANA
271
+ βœ… SAP Fiori & UI5
272
+ βœ… SAP Security & Authorization
273
+ βœ… SAP Configuration
274
+ βœ… SAP Performance Tuning
275
+ βœ… And more!
276
+
277
+ ## πŸš€ Deployment
278
+
279
+ ### Deploy on Streamlit Cloud (Free)
280
+
281
+ 1. Push code to GitHub
282
+ 2. Go to https://share.streamlit.io/
283
+ 3. Select your repository
284
+ 4. Add environment secrets
285
+ 5. Deploy!
286
+
287
+ ### Deploy on Your Server
288
+
289
+ ```bash
290
+ python -m venv .venv
291
+ source .venv/bin/activate
292
+ pip install -r requirements.txt
293
+ streamlit run app.py --server.port 8501
294
+ ```
295
+
296
+ ## πŸ› οΈ Advanced Usage
297
+
298
+ ### Programmatic Access
299
+
300
+ ```python
301
+ from tools.embeddings import load_rag_index
302
+ from tools.agent import SAPAgent, SAGAAssistant
303
+
304
+ rag = load_rag_index()
305
+ agent = SAPAgent(llm_provider="ollama", model="mistral")
306
+ assistant = SAGAAssistant(rag_pipeline=rag, llm_agent=agent)
307
+
308
+ response = assistant.answer("How to backup SAP database?")
309
+ print(response['answer'])
310
+ print(response['sources'])
311
+ ```
312
+
313
+ ## ⚠️ Important Notes
314
+
315
+ - **First Run**: Building dataset takes 5-10 minutes
316
+ - **Storage**: Dataset ~100MB-500MB depending on sources
317
+ - **Internet**: Only needed for initial scraping
318
+ - **Local Mode**: Works 100% offline with Ollama
319
+ - **Rate Limits**: Web scraper is respectful
320
+
321
+ ## πŸ“Š Performance Tips
322
+
323
+ | Goal | Setting |
324
+ |------|---------|
325
+ | **Fastest** | neural-chat + MiniLM |
326
+ | **Best Quality** | dolphin-mixtral + mpnet |
327
+ | **Memory Efficient** | MiniLM + small model |
328
+ | **Cloud Friendly** | Replicate or HuggingFace |
329
+
330
+ ## ❓ FAQ
331
+
332
+ **Q: Is this really free?**
333
+ A: Yes! All components are free and open-source.
334
+
335
+ **Q: Can I use offline?**
336
+ A: Yes! Use Ollama for completely offline operation.
337
+
338
+ **Q: How accurate?**
339
+ A: RAG provides sources so you can verify.
340
+
341
+ **Q: Can I add custom data?**
342
+ A: Yes! Edit `build_dataset.py` to add sources.
343
+
344
+ **Q: Privacy?**
345
+ A: Local mode: All on your machine.
346
+
347
+ ## πŸ”— Resources
348
+
349
+ - **Ollama**: https://ollama.ai
350
+ - **Replicate**: https://replicate.com
351
+ - **HuggingFace**: https://huggingface.co
352
+ - **SAP Community**: https://community.sap.com
353
+
354
+ ---
355
+
356
+ **Made with ❀️ for the SAP Community**
357
+
358
+ **Star ⭐ if you find this useful!**