github-actions[bot] commited on
Commit
a308534
Β·
1 Parent(s): 7d89ecf

Deploy from GitHub Actions 2025-12-11_02:27:23

Browse files
Files changed (2) hide show
  1. README.MD +0 -358
  2. README.md +125 -0
README.MD DELETED
@@ -1,358 +0,0 @@
1
- ---
2
- title: SAP Chatbot
3
- emoji: πŸ€–
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: streamlit
7
- sdk_version: 1.28.0
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- # 🧩 SAP Intelligent Assistant
13
-
14
- A free, open-source **RAG (Retrieval-Augmented Generation)** system for answering SAP-related questions using cloud LLMs and vector databases.
15
-
16
- **Key Features:**
17
- - βœ… 100% Free & Open Source (with paid options)
18
- - βœ… Multi-source SAP data (Community, GitHub, StackOverflow, blogs)
19
- - βœ… **Production-ready**: Supabase + pgvector for vector search
20
- - βœ… HuggingFace Inference API for embeddings & generation
21
- - βœ… Automatic ingestion via GitHub Actions
22
- - βœ… Beautiful Streamlit UI
23
- - βœ… Multi-user cloud hosting on HuggingFace Spaces
24
- - βœ… Conversation history & source tracking
25
-
26
- ---
27
-
28
- ## πŸš€ Architecture
29
-
30
- ```
31
- Documents β†’ GitHub β†’ GitHub Actions β†’ Supabase (pgvector)
32
- ↓
33
- ingest.py
34
- (embeddings)
35
- ↓
36
- Users β†’ HF Spaces
37
- ↓
38
- Streamlit App
39
- (HF Inference API)
40
- ↓
41
- Vector Search (Supabase RPC)
42
- ↓
43
- Answer Generation
44
- ```
45
-
46
- ---
47
-
48
- ## 🌐 Deploy to HuggingFace Spaces
49
-
50
- **Share your chatbot with your entire team - for FREE!**
51
-
52
- ### Quick Start (Production Setup)
53
-
54
- πŸ‘‰ **[SUPABASE_SETUP.md](./SUPABASE_SETUP.md)** ← Start here for cloud deployment
55
-
56
- ### Alternative: Local Setup (Offline)
57
-
58
- Or follow: **[QUICKSTART_HF_SPACES.md](./QUICKSTART_HF_SPACES.md)**
59
-
60
- **What you get:**
61
- - βœ… Production database (Supabase pgvector)
62
- - βœ… Automatic ingestion (GitHub Actions)
63
- - βœ… Multi-user access (5+ concurrent)
64
- - βœ… Zero cost (free tier)
65
- - βœ… Auto-scaling infrastructure
66
-
67
- ---
68
-
69
- ### Option 1: Local (Offline) Setup with Ollama
70
-
71
- **1. Install Ollama**
72
- ```bash
73
- # Download from https://ollama.ai
74
- # Then start the server
75
- ollama serve
76
- ```
77
-
78
- **2. Pull an LLM model**
79
- ```bash
80
- # Fast option (3B)
81
- ollama pull neural-chat
82
-
83
- # Or balanced (7B)
84
- ollama pull mistral
85
-
86
- # Or best quality (8x7B)
87
- ollama pull dolphin-mixtral
88
- ```
89
-
90
- **3. Setup SAP Assistant**
91
- ```bash
92
- # Clone/setup the project
93
- cd /Users/akshay/sap-chatboot
94
-
95
- # Create virtual environment
96
- python -m venv .venv
97
- source .venv/bin/activate # On Windows: .venv\Scripts\activate
98
-
99
- # Install dependencies
100
- pip install -r requirements.txt
101
-
102
- # Copy environment file
103
- cp .env.example .env
104
-
105
- # Build dataset from web
106
- python tools/build_dataset.py
107
-
108
- # Build vector index
109
- python tools/embeddings.py
110
-
111
- # Run the app
112
- streamlit run app.py
113
- ```
114
-
115
- Open http://localhost:8501 in your browser!
116
-
117
- ### Option 2: Cloud Setup (Replicate Free Tier)
118
-
119
- **1. Get API Token**
120
- - Sign up free at https://replicate.com
121
- - Get your API token
122
-
123
- **2. Setup**
124
- ```bash
125
- cd sap-chatboot
126
- python -m venv .venv
127
- source .venv/bin/activate
128
- pip install -r requirements.txt
129
-
130
- export REPLICATE_API_TOKEN="your_token_here"
131
- python tools/build_dataset.py
132
- python tools/embeddings.py
133
-
134
- export LLM_PROVIDER=replicate
135
- export LLM_MODEL=meta/llama-2-7b-chat
136
- streamlit run app.py
137
- ```
138
-
139
- ### Option 3: HuggingFace Free Tier
140
-
141
- **1. Get API Token**
142
- - Create account at https://huggingface.co
143
- - Get token from https://huggingface.co/settings/tokens
144
-
145
- **2. Setup**
146
- ```bash
147
- cd sap-chatboot
148
- python -m venv .venv
149
- source .venv/bin/activate
150
- pip install -r requirements.txt
151
-
152
- export HF_API_TOKEN="your_token_here"
153
- python tools/build_dataset.py
154
- python tools/embeddings.py
155
-
156
- export LLM_PROVIDER=huggingface
157
- export LLM_MODEL="mistralai/Mistral-7B-Instruct-v0.1"
158
- streamlit run app.py
159
- ```
160
-
161
- ## πŸ“Š Architecture
162
-
163
- ```
164
- Web Scraper (build_dataset.py)
165
- β”œβ”€β”€ SAP Community
166
- β”œβ”€β”€ GitHub Repos
167
- β”œβ”€β”€ Dev.to
168
- └── Tech Blogs
169
- ↓
170
- SAP Dataset (sap_dataset.json)
171
- ↓
172
- RAG Pipeline (embeddings.py)
173
- β”œβ”€β”€ Chunk Management
174
- β”œβ”€β”€ Embeddings (Sentence Transformers)
175
- └── FAISS Vector Index
176
- ↓
177
- Vector Index (rag_index.faiss)
178
- ↓
179
- LLM Agent (agent.py)
180
- β”œβ”€β”€ Ollama (Local)
181
- β”œβ”€β”€ Replicate (Free)
182
- └── HuggingFace (Free)
183
- ↓
184
- Streamlit UI (app.py)
185
- β”œβ”€β”€ Chat Interface
186
- └── Source Attribution
187
- ```
188
-
189
- ## πŸ“ Project Structure
190
-
191
- ```
192
- sap-chatboot/
193
- β”œβ”€β”€ app.py # Main Streamlit UI
194
- β”œβ”€β”€ config.py # Configuration & prompts
195
- β”œβ”€β”€ requirements.txt # Python dependencies
196
- β”œβ”€β”€ .env.example # Environment template
197
- β”œβ”€β”€ README.md # This file
198
- β”‚
199
- β”œβ”€β”€ tools/
200
- β”‚ β”œβ”€β”€ build_dataset.py # Web scraper for SAP data
201
- β”‚ β”œβ”€β”€ embeddings.py # RAG pipeline & vector store
202
- β”‚ └── agent.py # LLM agent with multiple providers
203
- β”‚
204
- └── data/
205
- β”œβ”€β”€ sap_dataset.json # Scraped SAP knowledge base
206
- β”œβ”€β”€ rag_index.faiss # Vector index
207
- └── rag_metadata.pkl # Chunk metadata
208
- ```
209
-
210
- ## πŸ”§ Configuration
211
-
212
- Create `.env` file (copy from `.env.example`):
213
-
214
- ```env
215
- # LLM Provider: ollama, replicate, or huggingface
216
- LLM_PROVIDER=ollama
217
- LLM_MODEL=mistral
218
-
219
- # API Tokens (if using cloud providers)
220
- REPLICATE_API_TOKEN=your_token
221
- HF_API_TOKEN=your_token
222
-
223
- # Embeddings model
224
- EMBEDDINGS_MODEL=all-MiniLM-L6-v2
225
-
226
- # RAG settings
227
- RAG_TOP_K=5
228
- RAG_CHUNK_SIZE=512
229
- RAG_CHUNK_OVERLAP=100
230
- ```
231
-
232
- ## πŸ“š Available LLMs
233
-
234
- ### Ollama (Local - Free)
235
- | Model | Size | Speed | Quality |
236
- |-------|------|-------|---------|
237
- | Neural Chat | 3B | ⚑⚑⚑ | Good |
238
- | Mistral | 7B | ⚑⚑ | Excellent |
239
- | Dolphin Mixtral | 8x7B | ⚑ | Best |
240
-
241
- ### Replicate (Free Tier)
242
- - Llama 2 7B
243
- - Mistral 7B
244
- - And more open models
245
-
246
- ### HuggingFace (Free Tier)
247
- - Any HuggingFace text-generation model
248
-
249
- ## πŸ” How It Works
250
-
251
- 1. **Data Collection** (`build_dataset.py`)
252
- - Scrapes SAP Community, StackOverflow, GitHub, dev.to, Medium, SAP Developers tutorials
253
- - Saves structured JSON
254
-
255
- 2. **Embeddings & Indexing** (`embeddings.py`)
256
- - Splits documents into chunks
257
- - Generates embeddings (Sentence Transformers)
258
- - Builds FAISS vector index
259
-
260
- 3. **Query & Answer** (`agent.py`)
261
- - User asks question
262
- - RAG retrieves relevant documents
263
- - LLM generates answer with context
264
- - Sources attributed
265
-
266
- ## πŸ’‘ Supported Topics
267
-
268
- βœ… SAP Basis Administration
269
- βœ… SAP ABAP Development
270
- βœ… SAP HANA
271
- βœ… SAP Fiori & UI5
272
- βœ… SAP Security & Authorization
273
- βœ… SAP Configuration
274
- βœ… SAP Performance Tuning
275
- βœ… And more!
276
-
277
- ## πŸš€ Deployment
278
-
279
- ### Deploy on Streamlit Cloud (Free)
280
-
281
- 1. Push code to GitHub
282
- 2. Go to https://share.streamlit.io/
283
- 3. Select your repository
284
- 4. Add environment secrets
285
- 5. Deploy!
286
-
287
- ### Deploy on Your Server
288
-
289
- ```bash
290
- python -m venv .venv
291
- source .venv/bin/activate
292
- pip install -r requirements.txt
293
- streamlit run app.py --server.port 8501
294
- ```
295
-
296
- ## πŸ› οΈ Advanced Usage
297
-
298
- ### Programmatic Access
299
-
300
- ```python
301
- from tools.embeddings import load_rag_index
302
- from tools.agent import SAPAgent, SAGAAssistant
303
-
304
- rag = load_rag_index()
305
- agent = SAPAgent(llm_provider="ollama", model="mistral")
306
- assistant = SAGAAssistant(rag_pipeline=rag, llm_agent=agent)
307
-
308
- response = assistant.answer("How to backup SAP database?")
309
- print(response['answer'])
310
- print(response['sources'])
311
- ```
312
-
313
- ## ⚠️ Important Notes
314
-
315
- - **First Run**: Building dataset takes 5-10 minutes
316
- - **Storage**: Dataset ~100MB-500MB depending on sources
317
- - **Internet**: Only needed for initial scraping
318
- - **Local Mode**: Works 100% offline with Ollama
319
- - **Rate Limits**: Web scraper is respectful
320
-
321
- ## πŸ“Š Performance Tips
322
-
323
- | Goal | Setting |
324
- |------|---------|
325
- | **Fastest** | neural-chat + MiniLM |
326
- | **Best Quality** | dolphin-mixtral + mpnet |
327
- | **Memory Efficient** | MiniLM + small model |
328
- | **Cloud Friendly** | Replicate or HuggingFace |
329
-
330
- ## ❓ FAQ
331
-
332
- **Q: Is this really free?**
333
- A: Yes! All components are free and open-source.
334
-
335
- **Q: Can I use offline?**
336
- A: Yes! Use Ollama for completely offline operation.
337
-
338
- **Q: How accurate?**
339
- A: RAG provides sources so you can verify.
340
-
341
- **Q: Can I add custom data?**
342
- A: Yes! Edit `build_dataset.py` to add sources.
343
-
344
- **Q: Privacy?**
345
- A: Local mode: All on your machine.
346
-
347
- ## πŸ”— Resources
348
-
349
- - **Ollama**: https://ollama.ai
350
- - **Replicate**: https://replicate.com
351
- - **HuggingFace**: https://huggingface.co
352
- - **SAP Community**: https://community.sap.com
353
-
354
- ---
355
-
356
- **Made with ❀️ for the SAP Community**
357
-
358
- **Star ⭐ if you find this useful!**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: SAP Chatbot
3
+ emoji: πŸ€–
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: streamlit
7
+ sdk_version: 1.28.0
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # 🧩 SAP Intelligent Assistant
13
+
14
+ A free, open-source **RAG (Retrieval-Augmented Generation)** system for answering SAP-related questions using cloud LLMs and vector databases.
15
+
16
+ ## ✨ Key Features
17
+
18
+ - βœ… 100% Free & Open Source
19
+ - βœ… Multi-source SAP data (Community, GitHub, StackOverflow, Dev.to, Medium)
20
+ - βœ… Production-ready: Supabase + pgvector vector database
21
+ - βœ… HuggingFace Inference API for fast responses
22
+ - βœ… Automatic data ingestion via GitHub Actions
23
+ - βœ… Beautiful Streamlit UI
24
+ - βœ… Multi-user cloud hosting
25
+ - βœ… Conversation history with source attribution
26
+
27
+ ## πŸš€ How It Works
28
+
29
+ ```
30
+ 1. Data Collection β†’ 2. Embeddings β†’ 3. Vector Search β†’ 4. Answer Generation
31
+ (SAP sources) (sentence- (Supabase (HF Inference
32
+ transformers) pgvector) API)
33
+ ```
34
+
35
+ **Supported Topics:**
36
+ - SAP Basis Administration
37
+ - SAP ABAP Development
38
+ - SAP HANA
39
+ - SAP Fiori & UI5
40
+ - SAP Security & Authorization
41
+ - SAP BTP (Business Technology Platform)
42
+ - SAP Integration Suite
43
+ - SAP Performance Tuning
44
+ - And more!
45
+
46
+ ## πŸ”§ Setup
47
+
48
+ ### 1. Local Development (with Ollama)
49
+
50
+ ```bash
51
+ # Clone repo
52
+ git clone https://github.com/Akshay-S-PY/sap-chatboot
53
+ cd sap-chatboot
54
+
55
+ # Create virtual environment
56
+ python -m venv .venv
57
+ source .venv/bin/activate
58
+
59
+ # Install dependencies
60
+ pip install -r requirements.txt
61
+
62
+ # Build dataset
63
+ python tools/build_dataset.py
64
+
65
+ # Run locally
66
+ streamlit run app.py
67
+ ```
68
+
69
+ ### 2. Production (Supabase + HF Spaces)
70
+
71
+ See [SUPABASE_SETUP.md](./SUPABASE_SETUP.md) for step-by-step cloud deployment.
72
+
73
+ ## πŸ“Š Architecture
74
+
75
+ ```
76
+ GitHub Repository (sap-chatboot)
77
+ ↓
78
+ GitHub Actions Workflows:
79
+ 1. build_dataset.yml β†’ Dataset + Upload to HF Hub
80
+ 2. ingest.yml β†’ Ingest to Supabase
81
+ 3. deploy_spaces.yml β†’ Deploy to HF Spaces
82
+ ↓
83
+ Supabase Database (pgvector + RLS)
84
+ ↓
85
+ Streamlit App (HF Spaces)
86
+ ↓
87
+ User Query β†’ Vector Search β†’ LLM Response + Sources
88
+ ```
89
+
90
+ ## πŸ“š Tech Stack
91
+
92
+ | Component | Technology | Cost |
93
+ |-----------|-----------|------|
94
+ | Vector Database | Supabase (pgvector) | Free |
95
+ | Embeddings | sentence-transformers | Free |
96
+ | LLM API | HuggingFace Inference | Free |
97
+ | App Hosting | HF Spaces | Free |
98
+ | Data Pipeline | GitHub Actions | Free |
99
+
100
+ ## πŸ’‘ Use Cases
101
+
102
+ - **Quick SAP Questions**: Get instant answers about SAP config, ABAP, Basis
103
+ - **Learning**: Understand SAP concepts with cited sources
104
+ - **Team Knowledge Base**: Share with your entire team
105
+ - **Integration**: Use programmatically via Python API
106
+
107
+ ## πŸ”— Resources
108
+
109
+ - πŸ“– [GitHub Repository](https://github.com/Akshay-S-PY/sap-chatboot)
110
+ - πŸ”— [Supabase](https://supabase.com)
111
+ - πŸ€— [HuggingFace](https://huggingface.co)
112
+ - πŸ’¬ [SAP Community](https://community.sap.com)
113
+
114
+ ## ⚠️ Important Notes
115
+
116
+ - First run builds dataset (~5-10 min)
117
+ - Works 100% offline with Ollama
118
+ - All data sources are publicly available and respectfully scraped
119
+ - No personal data is stored
120
+
121
+ ---
122
+
123
+ **Made with ❀️ for the SAP Community**
124
+
125
+ Have questions? Check the [documentation](./SUPABASE_SETUP.md) or create an [issue](https://github.com/Akshay-S-PY/sap-chatboot/issues).