Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.52.1
GETTING_STARTED.md
π Getting Started with SAP Intelligent Assistant
This guide will help you get the SAP Chatbot running in less than 30 minutes.
Prerequisites
- Python 3.8+ - Check with:
python3 --version - Internet Connection - For initial setup and data collection
- ~2GB Storage - For dataset and models
Step 1: Clone & Initial Setup (5 minutes)
# Navigate to your workspace
cd /Users/akshay/sap-chatboot
# Run setup script (handles everything)
bash setup.sh
# Or manual setup:
# 1. Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# 2. Install dependencies
pip install -r requirements.txt
# 3. Copy environment file
cp .env.example .env
Step 2: Choose Your LLM Option
Option A: Ollama (Recommended for Offline)
Best for: Local development, offline usage, privacy
# 1. Install Ollama from https://ollama.ai
# 2. Start Ollama server (in a separate terminal)
ollama serve
# 3. Pull a model (in another terminal)
# Pick one:
ollama pull neural-chat # Fast (3B)
ollama pull mistral # Balanced (7B)
ollama pull dolphin-mixtral # Best quality (8x7B)
# 4. Update .env
LLM_PROVIDER=ollama
LLM_MODEL=mistral
Option B: Replicate (Easiest Cloud Option)
Best for: Cloud deployment, zero local setup
# 1. Sign up free at https://replicate.com
# 2. Get your API token
# 3. Set environment variable
export REPLICATE_API_TOKEN="your_token_here"
# 4. Update .env
LLM_PROVIDER=replicate
LLM_MODEL=meta/llama-2-7b-chat
Option C: HuggingFace (Most Flexibility)
Best for: Testing different models easily
# 1. Sign up at https://huggingface.co
# 2. Get token from https://huggingface.co/settings/tokens
# 3. Set environment variable
export HF_API_TOKEN="your_token_here"
# 4. Update .env
LLM_PROVIDER=huggingface
LLM_MODEL="mistralai/Mistral-7B-Instruct-v0.1"
Step 3: Build the Knowledge Base (10 minutes)
# Activate virtual environment (if not already)
source .venv/bin/activate
# Build SAP dataset from web sources
# This scrapes: SAP Community, GitHub, Dev.to, etc.
python tools/build_dataset.py
# This creates: data/sap_dataset.json
Step 4: Build the Vector Index (5 minutes)
# Create embeddings and FAISS vector index
python tools/embeddings.py
# This creates:
# - data/rag_index.faiss
# - data/rag_metadata.pkl
Step 5: Run the App (2 minutes)
# Option 1: Quick start (automatic)
python quick_start.py
# Option 2: Manual
streamlit run app.py
# The app opens at: http://localhost:8501
Troubleshooting
"Ollama not running"
# In a separate terminal:
ollama serve
"REPLICATE_API_TOKEN not set"
export REPLICATE_API_TOKEN="your_token"
# Or add to .env file
"No such file: sap_dataset.json"
# Rebuild dataset
python tools/build_dataset.py
python tools/embeddings.py
"Memory error"
# Use lighter embeddings model in config.py:
EMBEDDINGS_MODEL = "all-MiniLM-L6-v2" # Already default (light)
# Or use faster LLM:
ollama pull neural-chat # 3B instead of 7B
"Very slow responses"
# For faster responses, use:
LLM_MODEL=neural-chat # 3B is 2-3x faster
# Or use cloud provider:
# Replicate or HuggingFace (but need API token)
Quick Test
Once running, try these questions:
"How do I monitor background jobs in SAP?"
- Tests: Data retrieval, LLM quality
"What is SAP Basis?"
- Tests: General knowledge
"How to debug ABAP programs?"
- Tests: Developer knowledge
Next Steps
After First Run
Customize the dataset:
- Edit
tools/build_dataset.py - Add your own SAP documentation URLs
- Edit
Deploy to cloud:
- Push to GitHub
- Deploy on Streamlit Cloud
- See README.md for details
Fine-tune performance:
- Adjust
RAG_TOP_Kin config.py - Change embeddings model
- Optimize chunk size
- Adjust
Development
# Run in development mode
streamlit run app.py --logger.level=debug
# Check logs
tail -f logs/app.log
Architecture Summary
Your Question
β
Vector Search (FAISS)
β
Top 5 Similar Chunks
β
LLM (Ollama/Replicate/HF)
β
Answer + Sources
Configuration Tips
| Use Case | Setting |
|---|---|
| Fastest | neural-chat + all-MiniLM-L6-v2 |
| Best Quality | mistral + all-mpnet-base-v2 |
| Offline | Ollama + any model |
| Cloud | Replicate + Mistral |
| Low Memory | Keep current settings |
Common Issues & Solutions
| Problem | Solution |
|---|---|
| Slow on first run | Building dataset is normal, takes 5-10 min |
| Timeout errors | Increase timeout in tools/build_dataset.py |
| Empty responses | Check if dataset was built successfully |
| Memory errors | Use smaller model or embeddings |
| API errors | Check token and internet connection |
Getting Help
- Check README.md - Comprehensive documentation
- FAQ Section - Common questions answered
- GitHub Issues - Report bugs
- Configuration - See
config.pyfor all options
What's Next?
- β Your system is ready!
- π Start asking SAP questions
- π Deploy when comfortable
- π Read README.md for advanced usage
Happy learning! π§©
For more details, see README.md