Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.8.0
title: AgentCSQuery
emoji: π
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: RAG-powered chatbot for competence standard
π AskCSQuery - RAG-Powered Academic Assistant
A Retrieval-Augmented Generation (RAG) chatbot built with Gradio, LangChain, and ChromaDB for answering competence standard and academic questions based on uploaded documents.
π Features
- Document-based Q&A: Ask questions and get answers based on your uploaded documents
- Intelligent Retrieval: Uses ChromaDB for efficient document storage and retrieval
- Multiple LLM Support: Primary support for Ollama (Mistral) with HuggingFace fallback
- Modern Web Interface: Built with Gradio for easy interaction
- Source Citation: Shows which documents were used to generate answers
π Quick Start
Prerequisites
- Python 3.8+
- (Optional) Ollama with Mistral model for better performance
- HuggingFace API token (set as
ACCESS_TOKENenvironment variable)
Installation
Clone or navigate to the project directory
Install dependencies:
pip install -r requirements.txtSet up environment variables:
# Windows PowerShell $env:ACCESS_TOKEN="your_huggingface_token_here" # Linux/Mac export ACCESS_TOKEN="your_huggingface_token_here"Run the setup script:
python setup.pyThis will:
- Install all required packages
- Populate the ChromaDB database with documents from the
data/folder - Prepare the system for use
Start the chatbot:
python app.py
Manual Setup (Alternative)
If you prefer manual setup:
Populate the database:
python populate_db.py --resetStart the application:
python app.py
π Project Structure
AskCSQuery/
βββ app.py # Main Gradio application
βββ populate_db.py # Database population script
βββ setup.py # Automated setup script
βββ requirements.txt # Python dependencies
βββ config.json # Configuration file (currently empty)
βββ README.md # This file
βββ data/ # Document storage directory
β βββ *.pdf # PDF documents
β βββ *.docx # Word documents
β βββ *.html # HTML documents
β βββ ... # Other supported formats
βββ chroma_db/ # ChromaDB storage (created automatically)
π‘ Usage
- Start the application and open the provided URL in your browser
- Ask questions related to the content in your documents
- Review answers with source citations
- Adjust parameters using the sidebar controls:
- System message: Customize the AI's behavior
- Max new tokens: Control response length
- Temperature: Adjust creativity (0.1 = focused, 1.0 = creative)
- Top-p: Control diversity of responses
Example Questions
- "What are reasonable adjustments for students with disabilities?"
- "What does the Equality Act say about education?"
- "Tell me about competence standards in higher education."
- "What are the implications of the University of Bristol case?"
π§ Configuration
Adding New Documents
- Place new documents in the
data/directory - Run the population script:
python populate_db.py
Supported Document Formats
- PDF files (.pdf)
- Word documents (.docx)
- HTML files (.html)
- Text files (.txt)
- And more via LangChain's document loaders
Using Different Models
Ollama (Recommended)
- Install Ollama from ollama.ai
- Pull the Mistral model:
ollama pull mistral
HuggingFace (Fallback)
The system automatically falls back to HuggingFace's API if Ollama is unavailable. Make sure to set your ACCESS_TOKEN environment variable.
π οΈ Troubleshooting
Common Issues
"No module named 'langchain'"
- Run:
pip install -r requirements.txt
- Run:
"Ollama connection failed"
- Install Ollama and pull the mistral model
- Or rely on HuggingFace fallback
"No relevant documents found"
- Check if documents are in the
data/directory - Run
python populate_db.py --resetto rebuild the database
- Check if documents are in the
"Access token error"
- Set your HuggingFace token:
$env:ACCESS_TOKEN="your_token_here"
- Set your HuggingFace token:
Database Reset
To completely reset the database:
python populate_db.py --reset
π Technical Details
- Embeddings: Uses
sentence-transformers/all-mpnet-base-v2for document embeddings - Vector Store: ChromaDB for efficient similarity search
- Text Splitting: Recursive character splitter with 1000 char chunks, 200 overlap
- LLM Integration: Primary Ollama (Mistral), fallback HuggingFace API
π€ Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
π License
This project is for academic/educational purposes. Please ensure you have appropriate licenses for any documents you upload.
Note: This chatbot is designed for academic research and educational purposes. Always verify important information from original sources.