Agent_CS / README.md
daniel-was-taken's picture
Update README.md
3a7c270 verified

A newer version of the Gradio SDK is available: 6.8.0

Upgrade
metadata
title: AgentCSQuery
emoji: πŸŽ“
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: RAG-powered chatbot for competence standard

πŸŽ“ AskCSQuery - RAG-Powered Academic Assistant

A Retrieval-Augmented Generation (RAG) chatbot built with Gradio, LangChain, and ChromaDB for answering competence standard and academic questions based on uploaded documents.

πŸ“‹ Features

  • Document-based Q&A: Ask questions and get answers based on your uploaded documents
  • Intelligent Retrieval: Uses ChromaDB for efficient document storage and retrieval
  • Multiple LLM Support: Primary support for Ollama (Mistral) with HuggingFace fallback
  • Modern Web Interface: Built with Gradio for easy interaction
  • Source Citation: Shows which documents were used to generate answers

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • (Optional) Ollama with Mistral model for better performance
  • HuggingFace API token (set as ACCESS_TOKEN environment variable)

Installation

  1. Clone or navigate to the project directory

  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Set up environment variables:

    # Windows PowerShell
    $env:ACCESS_TOKEN="your_huggingface_token_here"
    
    # Linux/Mac
    export ACCESS_TOKEN="your_huggingface_token_here"
    
  4. Run the setup script:

    python setup.py
    

    This will:

    • Install all required packages
    • Populate the ChromaDB database with documents from the data/ folder
    • Prepare the system for use
  5. Start the chatbot:

    python app.py
    

Manual Setup (Alternative)

If you prefer manual setup:

  1. Populate the database:

    python populate_db.py --reset
    
  2. Start the application:

    python app.py
    

πŸ“ Project Structure

AskCSQuery/
β”œβ”€β”€ app.py                 # Main Gradio application
β”œβ”€β”€ populate_db.py         # Database population script
β”œβ”€β”€ setup.py              # Automated setup script
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ config.json           # Configuration file (currently empty)
β”œβ”€β”€ README.md             # This file
β”œβ”€β”€ data/                 # Document storage directory
β”‚   β”œβ”€β”€ *.pdf            # PDF documents
β”‚   β”œβ”€β”€ *.docx           # Word documents
β”‚   β”œβ”€β”€ *.html           # HTML documents
β”‚   └── ...              # Other supported formats
└── chroma_db/           # ChromaDB storage (created automatically)

πŸ’‘ Usage

  1. Start the application and open the provided URL in your browser
  2. Ask questions related to the content in your documents
  3. Review answers with source citations
  4. Adjust parameters using the sidebar controls:
    • System message: Customize the AI's behavior
    • Max new tokens: Control response length
    • Temperature: Adjust creativity (0.1 = focused, 1.0 = creative)
    • Top-p: Control diversity of responses

Example Questions

  • "What are reasonable adjustments for students with disabilities?"
  • "What does the Equality Act say about education?"
  • "Tell me about competence standards in higher education."
  • "What are the implications of the University of Bristol case?"

πŸ”§ Configuration

Adding New Documents

  1. Place new documents in the data/ directory
  2. Run the population script:
    python populate_db.py
    

Supported Document Formats

  • PDF files (.pdf)
  • Word documents (.docx)
  • HTML files (.html)
  • Text files (.txt)
  • And more via LangChain's document loaders

Using Different Models

Ollama (Recommended)

  1. Install Ollama from ollama.ai
  2. Pull the Mistral model:
    ollama pull mistral
    

HuggingFace (Fallback)

The system automatically falls back to HuggingFace's API if Ollama is unavailable. Make sure to set your ACCESS_TOKEN environment variable.

πŸ› οΈ Troubleshooting

Common Issues

  1. "No module named 'langchain'"

    • Run: pip install -r requirements.txt
  2. "Ollama connection failed"

    • Install Ollama and pull the mistral model
    • Or rely on HuggingFace fallback
  3. "No relevant documents found"

    • Check if documents are in the data/ directory
    • Run python populate_db.py --reset to rebuild the database
  4. "Access token error"

    • Set your HuggingFace token: $env:ACCESS_TOKEN="your_token_here"

Database Reset

To completely reset the database:

python populate_db.py --reset

πŸ“ Technical Details

  • Embeddings: Uses sentence-transformers/all-mpnet-base-v2 for document embeddings
  • Vector Store: ChromaDB for efficient similarity search
  • Text Splitting: Recursive character splitter with 1000 char chunks, 200 overlap
  • LLM Integration: Primary Ollama (Mistral), fallback HuggingFace API

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

This project is for academic/educational purposes. Please ensure you have appropriate licenses for any documents you upload.


Note: This chatbot is designed for academic research and educational purposes. Always verify important information from original sources.