locus-rag-bot / README.md
khagu's picture
docs: add hf metadata and exclude assets
18a78ec
metadata
title: Locus RAG Bot
emoji: πŸ€–
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false

Locus RAG Bot

A RAG-based question-answering bot designed to provide information about LOCUS 2026 and past events. It uses web-crawled data, Medium articles, and local embeddings to answer user queries accurately.

Features

  • Data Ingestion: Crawls locus.com.np and related Medium articles.
  • Data Cleaning: Removes boilerplate (navbars, footers) for better RAG performance.
  • Local Embeddings: Uses Sentence Transformers (all-MiniLM-L6-v2) for free, local embedding generation.
  • Mistral AI: Uses Mistral AI's large model for high-quality answer generation.
  • Vector Store: Powered by Pinecone for fast and scalable retrieval.
  • FastAPI: Clean and efficient API for querying the bot.

Project Structure

locus-rag-bot/
β”œβ”€β”€ app/                    # FastAPI backend + RAG logic
β”‚   β”œβ”€β”€ engine.py           # Core RAG engine (LangChain + Pinecone + Mistral)
β”‚   └── main.py             # FastAPI application
β”œβ”€β”€ data/                   # Raw & cleaned markdown data
β”œβ”€β”€ scripts/                # Utility scripts
β”‚   β”œβ”€β”€ ingest.py           # Crawls website content
β”‚   β”œβ”€β”€ clean_data.py       # Cleans markdown boilerplate
β”‚   └── index_data.py       # Indexes data to Pinecone
β”œβ”€β”€ .env.example            # Template for environment variables
β”œβ”€β”€ Dockerfile              # For containerized deployment
└── requirements.txt        # Project dependencies

Setup

1. Local Environment

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

2. Configuration

Copy .env.example to .env and fill in your keys:

cp .env.example .env

Required keys:

Data Pipeline

1. Ingest Data

Crawls the website and saves content to data/.

python3 scripts/ingest.py

2. Clean Data

Removes boilerplate from the crawled markdown.

python3 scripts/clean_data.py

3. Index Data

Generates embeddings and uploads them to Pinecone.

python3 scripts/index_data.py

Running the API

Locally

uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload

The API docs will be available at http://localhost:7860/docs.

Using Docker

# Build the image
docker build -t locus-rag-bot .

# Run the container
docker run -p 7860:7860 --env-file .env locus-rag-bot

API Usage

Health Check

GET /health

Query the Bot

POST /query

curl -X POST "http://localhost:7860/query" \
     -H "Content-Type: application/json" \
     -d '{"question": "What is LOCUS?"}'

Response:

{
  "answer": "LOCUS is an annual national technological festival organized by students of Pulchowk Campus...",
  "context": ["..."]
}