Spaces:

Psytamaa
/

sap-chatbot

Sleeping

# Navigate to your workspace
cd /Users/akshay/sap-chatboot

# Run setup script (handles everything)
bash setup.sh

# Or manual setup:
# 1. Create virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 2. Install dependencies
pip install -r requirements.txt

# 3. Copy environment file
cp .env.example .env

Step 2: Choose Your LLM Option

Option A: Ollama (Recommended for Offline)

Best for: Local development, offline usage, privacy

# 1. Install Ollama from https://ollama.ai

# 2. Start Ollama server (in a separate terminal)
ollama serve

# 3. Pull a model (in another terminal)
# Pick one:
ollama pull neural-chat      # Fast (3B)
ollama pull mistral          # Balanced (7B)
ollama pull dolphin-mixtral  # Best quality (8x7B)

# 4. Update .env
LLM_PROVIDER=ollama
LLM_MODEL=mistral

Option B: Replicate (Easiest Cloud Option)

Best for: Cloud deployment, zero local setup

# 1. Sign up free at https://replicate.com
# 2. Get your API token

# 3. Set environment variable
export REPLICATE_API_TOKEN="your_token_here"

# 4. Update .env
LLM_PROVIDER=replicate
LLM_MODEL=meta/llama-2-7b-chat

Option C: HuggingFace (Most Flexibility)

Best for: Testing different models easily

# 1. Sign up at https://huggingface.co
# 2. Get token from https://huggingface.co/settings/tokens

# 3. Set environment variable
export HF_API_TOKEN="your_token_here"

# 4. Update .env
LLM_PROVIDER=huggingface
LLM_MODEL="mistralai/Mistral-7B-Instruct-v0.1"

Step 3: Build the Knowledge Base (10 minutes)

# Activate virtual environment (if not already)
source .venv/bin/activate

# Build SAP dataset from web sources
# This scrapes: SAP Community, GitHub, Dev.to, etc.
python tools/build_dataset.py

# This creates: data/sap_dataset.json

Step 4: Build the Vector Index (5 minutes)

# Create embeddings and FAISS vector index
python tools/embeddings.py

# This creates:
# - data/rag_index.faiss
# - data/rag_metadata.pkl

Step 5: Run the App (2 minutes)

# Option 1: Quick start (automatic)
python quick_start.py

# Option 2: Manual
streamlit run app.py

# The app opens at: http://localhost:8501

Troubleshooting

"Ollama not running"

# In a separate terminal:
ollama serve

"REPLICATE_API_TOKEN not set"

export REPLICATE_API_TOKEN="your_token"
# Or add to .env file

"No such file: sap_dataset.json"

# Rebuild dataset
python tools/build_dataset.py
python tools/embeddings.py

"Memory error"

# Use lighter embeddings model in config.py:
EMBEDDINGS_MODEL = "all-MiniLM-L6-v2"  # Already default (light)

# Or use faster LLM:
ollama pull neural-chat  # 3B instead of 7B

"Very slow responses"

# For faster responses, use:
LLM_MODEL=neural-chat  # 3B is 2-3x faster

# Or use cloud provider:
# Replicate or HuggingFace (but need API token)

Quick Test

Once running, try these questions:

"How do I monitor background jobs in SAP?"
- Tests: Data retrieval, LLM quality
"What is SAP Basis?"
- Tests: General knowledge
"How to debug ABAP programs?"
- Tests: Developer knowledge

Next Steps

After First Run

Customize the dataset:
- Edit tools/build_dataset.py
- Add your own SAP documentation URLs
Deploy to cloud:
- Push to GitHub
- Deploy on Streamlit Cloud
- See README.md for details
Fine-tune performance:
- Adjust RAG_TOP_K in config.py
- Change embeddings model
- Optimize chunk size

Development

# Run in development mode
streamlit run app.py --logger.level=debug

# Check logs
tail -f logs/app.log

Architecture Summary

Your Question
    ↓
Vector Search (FAISS)
    ↓
Top 5 Similar Chunks
    ↓
LLM (Ollama/Replicate/HF)
    ↓
Answer + Sources

Configuration Tips

Use Case	Setting
Fastest	neural-chat + all-MiniLM-L6-v2
Best Quality	mistral + all-mpnet-base-v2
Offline	Ollama + any model
Cloud	Replicate + Mistral
Low Memory	Keep current settings

Common Issues & Solutions

Problem	Solution
Slow on first run	Building dataset is normal, takes 5-10 min
Timeout errors	Increase timeout in `tools/build_dataset.py`
Empty responses	Check if dataset was built successfully
Memory errors	Use smaller model or embeddings
API errors	Check token and internet connection

Getting Help

Check README.md - Comprehensive documentation
FAQ Section - Common questions answered
GitHub Issues - Report bugs
Configuration - See config.py for all options

What's Next?

✅ Your system is ready!
📚 Start asking SAP questions
🚀 Deploy when comfortable
📖 Read README.md for advanced usage

Happy learning! 🧩

For more details, see README.md