Spaces:
Sleeping
Sleeping
metadata
title: Locus RAG Bot
emoji: π€
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
Locus RAG Bot
A RAG-based question-answering bot designed to provide information about LOCUS 2026 and past events. It uses web-crawled data, Medium articles, and local embeddings to answer user queries accurately.
Features
- Data Ingestion: Crawls
locus.com.npand related Medium articles. - Data Cleaning: Removes boilerplate (navbars, footers) for better RAG performance.
- Local Embeddings: Uses
Sentence Transformers(all-MiniLM-L6-v2) for free, local embedding generation. - Mistral AI: Uses Mistral AI's large model for high-quality answer generation.
- Vector Store: Powered by Pinecone for fast and scalable retrieval.
- FastAPI: Clean and efficient API for querying the bot.
Project Structure
locus-rag-bot/
βββ app/ # FastAPI backend + RAG logic
β βββ engine.py # Core RAG engine (LangChain + Pinecone + Mistral)
β βββ main.py # FastAPI application
βββ data/ # Raw & cleaned markdown data
βββ scripts/ # Utility scripts
β βββ ingest.py # Crawls website content
β βββ clean_data.py # Cleans markdown boilerplate
β βββ index_data.py # Indexes data to Pinecone
βββ .env.example # Template for environment variables
βββ Dockerfile # For containerized deployment
βββ requirements.txt # Project dependencies
Setup
1. Local Environment
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
2. Configuration
Copy .env.example to .env and fill in your keys:
cp .env.example .env
Required keys:
MISTRAL_API_KEY: Get from Mistral AI Console.PINECONE_API_KEY: Get from Pinecone Dashboard.PINECONE_INDEX_NAME: Your Pinecone index name (dimension: 384).
Data Pipeline
1. Ingest Data
Crawls the website and saves content to data/.
python3 scripts/ingest.py
2. Clean Data
Removes boilerplate from the crawled markdown.
python3 scripts/clean_data.py
3. Index Data
Generates embeddings and uploads them to Pinecone.
python3 scripts/index_data.py
Running the API
Locally
uvicorn app.main:app --host 0.0.0.0 --port 7860 --reload
The API docs will be available at http://localhost:7860/docs.
Using Docker
# Build the image
docker build -t locus-rag-bot .
# Run the container
docker run -p 7860:7860 --env-file .env locus-rag-bot
API Usage
Health Check
GET /health
Query the Bot
POST /query
curl -X POST "http://localhost:7860/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is LOCUS?"}'
Response:
{
"answer": "LOCUS is an annual national technological festival organized by students of Pulchowk Campus...",
"context": ["..."]
}