sap-chatbot / TROUBLESHOOTING.md
github-actions[bot]
Deploy from GitHub Actions 2025-12-11_00:05:39
0f77bc1

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

πŸ”§ Troubleshooting Guide

Common Issues & Solutions

1. Setup Issues

"ModuleNotFoundError: No module named 'streamlit'"

Problem: Dependencies not installed Solution:

source .venv/bin/activate
pip install -r requirements.txt

"python3: command not found"

Problem: Python not installed or not in PATH Solution:

# Install Python 3.8+
# macOS: brew install python3
# Ubuntu/Debian: sudo apt install python3
# Windows: Download from python.org

# Verify:
python3 --version

"virtualenv not found"

Problem: venv module missing Solution:

# Install it:
# macOS: brew install python3-venv
# Ubuntu: sudo apt install python3-venv
# Then recreate venv:
python3 -m venv .venv

2. Dataset Building Issues

"No article URLs found"

Problem: Website structure changed or connection failed Solution:

# Check internet connection
ping community.sap.com

# Try rebuilding with debug
python tools/build_dataset.py

# Check if data directory exists
ls -la data/

"Connection timeout"

Problem: Website taking too long to respond Solution:

# Modify timeout in tools/build_dataset.py:
# Change: timeout=10
# To: timeout=30

# Or add delay
import time
time.sleep(5)  # Between requests

"Permission denied" error

Problem: Can't write to data directory Solution:

# Fix permissions
mkdir -p data
chmod 755 data/

# Or run with sudo (not recommended)
sudo python tools/build_dataset.py

3. Embeddings/Index Issues

"ModuleNotFoundError: No module named 'faiss'"

Problem: FAISS not installed correctly Solution:

pip uninstall faiss-cpu
pip install faiss-cpu --no-cache-dir

# Or use GPU version if available:
# pip install faiss-gpu

"CUDA error" / "GPU not found"

Problem: GPU version installed but no GPU available Solution:

# Use CPU version instead
pip uninstall faiss-gpu
pip install faiss-cpu

"MemoryError during embeddings"

Problem: System ran out of memory Solution:

# In tools/embeddings.py, reduce batch size:
# Change: batch_size=32
# To: batch_size=8 or 4

# Or use smaller model:
# Change: model_name="all-MiniLM-L6-v2"
# To: model_name="sentence-transformers/all-MiniLM-L12-v2"

"Index not found" error

Problem: RAG index not built Solution:

# Rebuild the index
python tools/embeddings.py

# Verify files exist
ls -la data/rag_index.faiss
ls -la data/rag_metadata.pkl

4. LLM Provider Issues

Ollama

"ConnectionRefusedError: [Errno 111] Connection refused"

# Ollama server not running
# Start it in a new terminal:
ollama serve

# Or use nohup to background it:
nohup ollama serve &

"Model not found"

# Pull the model first:
ollama pull mistral
# Or
ollama pull neural-chat
ollama pull dolphin-mixtral

# List available models:
ollama list

"Out of memory"

# Use smaller model:
ollama pull neural-chat  # 3B instead of 7B

# Or configure in config.py:
DEFAULT_MODEL = "neural-chat"

Replicate

"REPLICATE_API_TOKEN not set"

# Set token in terminal:
export REPLICATE_API_TOKEN="your_token_here"

# Or add to .env:
REPLICATE_API_TOKEN=your_token_here

# Verify:
echo $REPLICATE_API_TOKEN

"401 Unauthorized"

# Token is invalid or expired
# 1. Get new token from https://replicate.com/account
# 2. Update environment variable
# 3. Try again

"Rate limit exceeded"

# Wait a bit, then try again
# Or use Ollama/HuggingFace instead

HuggingFace

"HF_API_TOKEN not set"

# Set token:
export HF_API_TOKEN="your_token_here"

# Or add to .env:
HF_API_TOKEN=your_token_here

# Verify:
echo $HF_API_TOKEN

"Model not found" on HuggingFace

# Verify model ID exists:
# Go to https://huggingface.co/models
# Find a text-generation model
# Example: mistralai/Mistral-7B-Instruct-v0.1

# Update config:
LLM_MODEL="mistralai/Mistral-7B-Instruct-v0.1"

5. Streamlit Issues

"streamlit: command not found"

Problem: Streamlit not installed Solution:

source .venv/bin/activate
pip install streamlit>=1.28.0

Port 8501 already in use

Problem: Another app using port 8501 Solution:

# Use different port:
streamlit run app.py --server.port 8502

# Or kill the process using 8501:
lsof -i :8501  # See what's using it
kill -9 <PID>  # Kill it

"Cache resource initialization failed"

Problem: Session state issue Solution:

# Clear Streamlit cache:
rm -rf ~/.streamlit/cache/

# Restart the app:
streamlit run app.py

App not responding / frozen

Problem: Long-running operation blocking UI Solution:

# Wait for current operation to complete
# Or restart:
# 1. Press Ctrl+C
# 2. Run: streamlit run app.py again

6. Runtime Issues

"Empty search results"

Problem: No relevant documents found Solution:

# 1. Verify dataset exists:
ls -la data/sap_dataset.json

# 2. Verify index exists:
ls -la data/rag_index.faiss

# 3. Try a different query:
# "SAP Basis administration" instead of "help"

# 4. Rebuild dataset:
python tools/build_dataset.py
python tools/embeddings.py

"Very slow responses"

Problem: LLM taking too long Solution:

# Use faster model in config.py:
DEFAULT_MODEL = "neural-chat"  # 3B is 2-3x faster

# Or use cloud provider (usually faster):
LLM_PROVIDER = "replicate"

"Inaccurate or irrelevant answers"

Problem: RAG not finding good sources or LLM quality Solution:

# 1. Improve RAG:
# In config.py, increase sources:
RAG_TOP_K = 10  # From 5

# 2. Use better embeddings:
EMBEDDINGS_MODEL = "all-mpnet-base-v2"  # Better quality

# 3. Use better LLM:
DEFAULT_MODEL = "mistral"  # From neural-chat

# 4. Rebuild index:
python tools/embeddings.py

"API rate limit exceeded"

Problem: Using cloud provider too frequently Solution:

# 1. Wait a bit
# 2. Use Ollama (no rate limits)
# 3. Or try different cloud provider

7. Configuration Issues

"Settings not taking effect"

Problem: Configuration changes not applied Solution:

# 1. Make sure you edited the right file:
cat .env

# 2. Restart the app:
# Ctrl+C and run again

# 3. Clear cache:
rm -rf ~/.streamlit/cache/
streamlit run app.py

"Environment variables not loading"

Problem: .env file not being read Solution:

# Verify in app.py or config.py:
# from dotenv import load_dotenv
# load_dotenv()  # Must be called

# Or set manually:
export VAR_NAME="value"
streamlit run app.py

8. Performance Issues

"High CPU usage"

Problem: Embeddings or search consuming CPU Solution:

# Use batch processing in embeddings.py:
# Already optimized with batch_size=32

# Or use pre-built index (don't rebuild often)

"High memory usage"

Problem: Large dataset or model in memory Solution:

# Use lighter model in config.py:
EMBEDDINGS_MODEL = "all-MiniLM-L6-v2"

# Reduce chunk size:
RAG_CHUNK_SIZE = 256  # From 512

# Use Ollama 3B model:
ollama pull neural-chat

"Slow search"

Problem: FAISS search taking too long Solution:

# Should be fast already, but:

# 1. Reduce results:
RAG_TOP_K = 3  # From 5

# 2. Check if index is corrupted:
# Rebuild it:
python tools/embeddings.py

9. Deployment Issues

Streamlit Cloud deployment fails

Problem: Missing secrets or dependencies Solution:

# 1. Add secrets in Streamlit Cloud:
# Settings β†’ Secrets
# LLM_PROVIDER=replicate
# REPLICATE_API_TOKEN=xxx

# 2. Make sure requirements.txt is in repo
# 3. Commit data files or download on deploy

# 4. Check build logs:
# Deploy β†’ Manage app β†’ Logs

Docker container issues

Problem: Can't build or run Docker image Solution:

# Create Dockerfile (if not exists)
# Build: docker build -t sap-chatbot .
# Run: docker run -p 8501:8501 sap-chatbot

# Or provide Docker guide

10. Data Issues

"Dataset is outdated"

Problem: Knowledge base needs refresh Solution:

# Rebuild dataset:
rm data/sap_dataset.json
python tools/build_dataset.py
python tools/embeddings.py

# Takes 10-15 minutes but gets latest content

"Too much data (slow startup)"

Problem: Large dataset causing slow startup Solution:

# Limit dataset in build_dataset.py:
# Change: for repo in repos (all repos)
# To: for repo in repos[:10] (first 10 only)

# Or reduce sources scraped

"Data format error"

Problem: JSON file corrupted Solution:

# Verify JSON:
python -c "import json; json.load(open('data/sap_dataset.json'))"

# If error, rebuild:
rm data/sap_dataset.json
python tools/build_dataset.py

Quick Diagnosis

System Check Script

#!/bin/bash
echo "SAP Chatbot System Check"
echo "========================"
echo ""

echo "1. Python:"
python3 --version

echo ""
echo "2. Virtual Environment:"
if [ -d ".venv" ]; then
    echo "βœ… Exists"
else
    echo "❌ Missing"
fi

echo ""
echo "3. Dependencies:"
pip list | grep -E "streamlit|transformers|faiss|ollama"

echo ""
echo "4. Dataset:"
ls -lh data/sap_dataset.json 2>/dev/null || echo "❌ Not found"

echo ""
echo "5. Index:"
ls -lh data/rag_index.faiss 2>/dev/null || echo "❌ Not found"

echo ""
echo "6. .env file:"
[ -f ".env" ] && echo "βœ… Exists" || echo "❌ Missing"

echo ""
echo "7. Ollama:"
curl -s http://localhost:11434/ > /dev/null && echo "βœ… Running" || echo "❌ Not running"

echo ""
echo "Check complete!"

Save as check_system.sh and run:

bash check_system.sh

Getting Help

  1. Check this guide - Most issues documented
  2. Read GETTING_STARTED.md - Step-by-step setup
  3. Check README.md - Architecture & concepts
  4. Check config.py - All configuration options
  5. Look at code - Well-commented Python files
  6. Open GitHub issue - Report bugs with details

Debug Mode

Enable debug logging:

# In app.py or any module:
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
logger.debug("Debug message here")

Then run:

streamlit run app.py --logger.level=debug

Still stuck? Check the GitHub issues or create a new one with:

  • Python version
  • OS (Windows/Mac/Linux)
  • Error message (full traceback)
  • Steps to reproduce
  • What you've already tried

Good luck! πŸš€