AISVIZ-BOT / CLAUDE.md
vaishnav
Fix HuggingFace provider crash and harden chain init
332bfab

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AIVIZ-BOT is a RAG-based conversational chatbot built with Gradio that answers questions about AISdb (Automatic Identification System Database) documentation. The assistant is named "Stormy" and helps users learn about AIS data processing, machine learning research, and maritime vessel tracking.

Running the Application

python app.py

Required environment variables (in .env), depending on which provider is selected:

  • HF_TOKEN - HuggingFace inference token (default provider)
  • GOOGLE_API_KEY - Google Generative AI API key
  • OPENAI_API_KEY - OpenAI API key
  • ANTHROPIC_API_KEY - Anthropic API key
  • USER_AGENT=myagent
  • GRPC_VERBOSITY=ERROR
  • GLOG_minloglevel=2

set_envs() is a no-op β€” keys are read from the environment at LLM construction time, or supplied via the Model Settings panel in the UI. Never re-introduce a getpass prompt there; it will hang in Docker / HF Spaces.

Architecture

Data Flow

  1. Initialization: Scrapes 50+ AISdb documentation URLs β†’ chunks documents β†’ creates embeddings β†’ stores in ChromaDB
  2. Chat Request: User input β†’ session lookup in LFU cache β†’ contextualize with chat history β†’ retrieve from vector store (k = MAX_SIZE) β†’ generate response via RAG chain β†’ stream in 8-char chunks

Key Components

  • app.py: Main Gradio interface with ocean/maritime themed UI, async echo() that offloads respond() via asyncio.to_thread, streams in 8-char chunks (10 ms delay), example questions, model settings panel, and collapsible help section
  • configs/config.py: URLs to scrape, LLM settings, embedding model config, system prompt, MODEL_REGISTRY and PROVIDER_ENV_KEYS for the multi-provider switcher
  • llm_setup/llm_setup.py: Conversational RAG chain setup with LangChain, manages session-based chat history. create_llm() is a factory over Google Gemini / OpenAI / Anthropic / HuggingFace. The HuggingFace branch must pass provider="hf-inference" and task="conversational" to HuggingFaceEndpoint, otherwise huggingface_hub raises StopIteration β†’ RuntimeError: generator raised StopIteration
  • services/scraper.py: Web scraping service that preserves per-document source URL metadata
  • stores/chroma.py: ChromaDB vector store with HuggingFace embeddings (BAAI/bge-base-en-v1.5), skips re-ingestion if already populated
  • processing/documents.py: Document loading with RecursiveCharacterTextSplitter using configurable chunk size/overlap and structure-aware separators
  • processing/texts.py: Text cleaning that preserves document structure (newlines, paragraphs) while removing control characters
  • caching/lfu.py: LFU cache for session-based chat histories (capacity: 50 sessions). Exposes get / put / delete β€” never replace llm_svc.store with a plain dict, the rest of the code calls these methods.

Tech Stack

  • LLM: Pluggable β€” default is HuggingFace (meta-llama/Llama-3.1-8B-Instruct via hf-inference); also supports Google Gemini, OpenAI, Anthropic
  • Embeddings: HuggingFace BAAI/bge-base-en-v1.5 (CPU)
  • RAG Framework: LangChain
  • Vector Store: ChromaDB
  • UI: Gradio 5.x
  • Deployment: HuggingFace Spaces

Configuration Values (in configs/config.py)

  • Chunk size: 768 chars with 100 char overlap
  • Chunk separators: \n\n, \n, . , , `` (structure-aware)
  • Max retrieved documents: 100
  • LFU cache capacity: 50 sessions
  • ChromaDB deduplication: skips ingestion on restart if data exists