---
title: AgentCSQuery
emoji: 🎓
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: RAG-powered chatbot for competence standard
---

# 🎓 AskCSQuery - RAG-Powered Academic Assistant

A Retrieval-Augmented Generation (RAG) chatbot built with Gradio, LangChain, and ChromaDB for answering competence standard and academic questions based on uploaded documents.

## 📋 Features

- **Document-based Q&A**: Ask questions and get answers based on your uploaded documents
- **Intelligent Retrieval**: Uses ChromaDB for efficient document storage and retrieval
- **Multiple LLM Support**: Primary support for Ollama (Mistral) with HuggingFace fallback
- **Modern Web Interface**: Built with Gradio for easy interaction
- **Source Citation**: Shows which documents were used to generate answers

## 🚀 Quick Start

### Prerequisites

- Python 3.8+
- (Optional) Ollama with Mistral model for better performance
- HuggingFace API token (set as `ACCESS_TOKEN` environment variable)

### Installation

1. **Clone or navigate to the project directory**
2. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

3. **Set up environment variables**:
   ```bash
   # Windows PowerShell
   $env:ACCESS_TOKEN="your_huggingface_token_here"
   
   # Linux/Mac
   export ACCESS_TOKEN="your_huggingface_token_here"
   ```

4. **Run the setup script**:
   ```bash
   python setup.py
   ```

   This will:
   - Install all required packages
   - Populate the ChromaDB database with documents from the `data/` folder
   - Prepare the system for use

5. **Start the chatbot**:
   ```bash
   python app.py
   ```

### Manual Setup (Alternative)

If you prefer manual setup:

1. **Populate the database**:
   ```bash
   python populate_db.py --reset
   ```

2. **Start the application**:
   ```bash
   python app.py
   ```

## 📁 Project Structure

```
AskCSQuery/
├── app.py                 # Main Gradio application
├── populate_db.py         # Database population script
├── setup.py              # Automated setup script
├── requirements.txt       # Python dependencies
├── config.json           # Configuration file (currently empty)
├── README.md             # This file
├── data/                 # Document storage directory
│   ├── *.pdf            # PDF documents
│   ├── *.docx           # Word documents
│   ├── *.html           # HTML documents
│   └── ...              # Other supported formats
└── chroma_db/           # ChromaDB storage (created automatically)
```

## 💡 Usage

1. **Start the application** and open the provided URL in your browser
2. **Ask questions** related to the content in your documents
3. **Review answers** with source citations
4. **Adjust parameters** using the sidebar controls:
   - **System message**: Customize the AI's behavior
   - **Max new tokens**: Control response length
   - **Temperature**: Adjust creativity (0.1 = focused, 1.0 = creative)
   - **Top-p**: Control diversity of responses

### Example Questions

- "What are reasonable adjustments for students with disabilities?"
- "What does the Equality Act say about education?"
- "Tell me about competence standards in higher education."
- "What are the implications of the University of Bristol case?"

## 🔧 Configuration

### Adding New Documents

1. Place new documents in the `data/` directory
2. Run the population script:
   ```bash
   python populate_db.py
   ```

### Supported Document Formats

- PDF files (.pdf)
- Word documents (.docx)
- HTML files (.html)
- Text files (.txt)
- And more via LangChain's document loaders

### Using Different Models

#### Ollama (Recommended)

1. Install Ollama from [ollama.ai](https://ollama.ai)
2. Pull the Mistral model:
   ```bash
   ollama pull mistral
   ```

#### HuggingFace (Fallback)

The system automatically falls back to HuggingFace's API if Ollama is unavailable. Make sure to set your `ACCESS_TOKEN` environment variable.

## 🛠️ Troubleshooting

### Common Issues

1. **"No module named 'langchain'"**
   - Run: `pip install -r requirements.txt`

2. **"Ollama connection failed"**
   - Install Ollama and pull the mistral model
   - Or rely on HuggingFace fallback

3. **"No relevant documents found"**
   - Check if documents are in the `data/` directory
   - Run `python populate_db.py --reset` to rebuild the database

4. **"Access token error"**
   - Set your HuggingFace token: `$env:ACCESS_TOKEN="your_token_here"`

### Database Reset

To completely reset the database:
```bash
python populate_db.py --reset
```

## 📝 Technical Details

- **Embeddings**: Uses `sentence-transformers/all-mpnet-base-v2` for document embeddings
- **Vector Store**: ChromaDB for efficient similarity search
- **Text Splitting**: Recursive character splitter with 1000 char chunks, 200 overlap
- **LLM Integration**: Primary Ollama (Mistral), fallback HuggingFace API

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## 📄 License

This project is for academic/educational purposes. Please ensure you have appropriate licenses for any documents you upload.

---

**Note**: This chatbot is designed for academic research and educational purposes. Always verify important information from original sources.