Agent_CS / README.md
daniel-was-taken's picture
Update README.md
3a7c270 verified
---
title: AgentCSQuery
emoji: πŸŽ“
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: RAG-powered chatbot for competence standard
---
# πŸŽ“ AskCSQuery - RAG-Powered Academic Assistant
A Retrieval-Augmented Generation (RAG) chatbot built with Gradio, LangChain, and ChromaDB for answering competence standard and academic questions based on uploaded documents.
## πŸ“‹ Features
- **Document-based Q&A**: Ask questions and get answers based on your uploaded documents
- **Intelligent Retrieval**: Uses ChromaDB for efficient document storage and retrieval
- **Multiple LLM Support**: Primary support for Ollama (Mistral) with HuggingFace fallback
- **Modern Web Interface**: Built with Gradio for easy interaction
- **Source Citation**: Shows which documents were used to generate answers
## πŸš€ Quick Start
### Prerequisites
- Python 3.8+
- (Optional) Ollama with Mistral model for better performance
- HuggingFace API token (set as `ACCESS_TOKEN` environment variable)
### Installation
1. **Clone or navigate to the project directory**
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Set up environment variables**:
```bash
# Windows PowerShell
$env:ACCESS_TOKEN="your_huggingface_token_here"
# Linux/Mac
export ACCESS_TOKEN="your_huggingface_token_here"
```
4. **Run the setup script**:
```bash
python setup.py
```
This will:
- Install all required packages
- Populate the ChromaDB database with documents from the `data/` folder
- Prepare the system for use
5. **Start the chatbot**:
```bash
python app.py
```
### Manual Setup (Alternative)
If you prefer manual setup:
1. **Populate the database**:
```bash
python populate_db.py --reset
```
2. **Start the application**:
```bash
python app.py
```
## πŸ“ Project Structure
```
AskCSQuery/
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ populate_db.py # Database population script
β”œβ”€β”€ setup.py # Automated setup script
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ config.json # Configuration file (currently empty)
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ data/ # Document storage directory
β”‚ β”œβ”€β”€ *.pdf # PDF documents
β”‚ β”œβ”€β”€ *.docx # Word documents
β”‚ β”œβ”€β”€ *.html # HTML documents
β”‚ └── ... # Other supported formats
└── chroma_db/ # ChromaDB storage (created automatically)
```
## πŸ’‘ Usage
1. **Start the application** and open the provided URL in your browser
2. **Ask questions** related to the content in your documents
3. **Review answers** with source citations
4. **Adjust parameters** using the sidebar controls:
- **System message**: Customize the AI's behavior
- **Max new tokens**: Control response length
- **Temperature**: Adjust creativity (0.1 = focused, 1.0 = creative)
- **Top-p**: Control diversity of responses
### Example Questions
- "What are reasonable adjustments for students with disabilities?"
- "What does the Equality Act say about education?"
- "Tell me about competence standards in higher education."
- "What are the implications of the University of Bristol case?"
## πŸ”§ Configuration
### Adding New Documents
1. Place new documents in the `data/` directory
2. Run the population script:
```bash
python populate_db.py
```
### Supported Document Formats
- PDF files (.pdf)
- Word documents (.docx)
- HTML files (.html)
- Text files (.txt)
- And more via LangChain's document loaders
### Using Different Models
#### Ollama (Recommended)
1. Install Ollama from [ollama.ai](https://ollama.ai)
2. Pull the Mistral model:
```bash
ollama pull mistral
```
#### HuggingFace (Fallback)
The system automatically falls back to HuggingFace's API if Ollama is unavailable. Make sure to set your `ACCESS_TOKEN` environment variable.
## πŸ› οΈ Troubleshooting
### Common Issues
1. **"No module named 'langchain'"**
- Run: `pip install -r requirements.txt`
2. **"Ollama connection failed"**
- Install Ollama and pull the mistral model
- Or rely on HuggingFace fallback
3. **"No relevant documents found"**
- Check if documents are in the `data/` directory
- Run `python populate_db.py --reset` to rebuild the database
4. **"Access token error"**
- Set your HuggingFace token: `$env:ACCESS_TOKEN="your_token_here"`
### Database Reset
To completely reset the database:
```bash
python populate_db.py --reset
```
## πŸ“ Technical Details
- **Embeddings**: Uses `sentence-transformers/all-mpnet-base-v2` for document embeddings
- **Vector Store**: ChromaDB for efficient similarity search
- **Text Splitting**: Recursive character splitter with 1000 char chunks, 200 overlap
- **LLM Integration**: Primary Ollama (Mistral), fallback HuggingFace API
## 🀝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request
## πŸ“„ License
This project is for academic/educational purposes. Please ensure you have appropriate licenses for any documents you upload.
---
**Note**: This chatbot is designed for academic research and educational purposes. Always verify important information from original sources.