--- title: AgentCSQuery emoji: 🎓 colorFrom: yellow colorTo: purple sdk: gradio sdk_version: 5.0.1 app_file: app.py pinned: false license: mit short_description: RAG-powered chatbot for competence standard --- # 🎓 AskCSQuery - RAG-Powered Academic Assistant A Retrieval-Augmented Generation (RAG) chatbot built with Gradio, LangChain, and ChromaDB for answering competence standard and academic questions based on uploaded documents. ## 📋 Features - **Document-based Q&A**: Ask questions and get answers based on your uploaded documents - **Intelligent Retrieval**: Uses ChromaDB for efficient document storage and retrieval - **Multiple LLM Support**: Primary support for Ollama (Mistral) with HuggingFace fallback - **Modern Web Interface**: Built with Gradio for easy interaction - **Source Citation**: Shows which documents were used to generate answers ## 🚀 Quick Start ### Prerequisites - Python 3.8+ - (Optional) Ollama with Mistral model for better performance - HuggingFace API token (set as `ACCESS_TOKEN` environment variable) ### Installation 1. **Clone or navigate to the project directory** 2. **Install dependencies**: ```bash pip install -r requirements.txt ``` 3. **Set up environment variables**: ```bash # Windows PowerShell $env:ACCESS_TOKEN="your_huggingface_token_here" # Linux/Mac export ACCESS_TOKEN="your_huggingface_token_here" ``` 4. **Run the setup script**: ```bash python setup.py ``` This will: - Install all required packages - Populate the ChromaDB database with documents from the `data/` folder - Prepare the system for use 5. **Start the chatbot**: ```bash python app.py ``` ### Manual Setup (Alternative) If you prefer manual setup: 1. **Populate the database**: ```bash python populate_db.py --reset ``` 2. **Start the application**: ```bash python app.py ``` ## 📁 Project Structure ``` AskCSQuery/ ├── app.py # Main Gradio application ├── populate_db.py # Database population script ├── setup.py # Automated setup script ├── requirements.txt # Python dependencies ├── config.json # Configuration file (currently empty) ├── README.md # This file ├── data/ # Document storage directory │ ├── *.pdf # PDF documents │ ├── *.docx # Word documents │ ├── *.html # HTML documents │ └── ... # Other supported formats └── chroma_db/ # ChromaDB storage (created automatically) ``` ## 💡 Usage 1. **Start the application** and open the provided URL in your browser 2. **Ask questions** related to the content in your documents 3. **Review answers** with source citations 4. **Adjust parameters** using the sidebar controls: - **System message**: Customize the AI's behavior - **Max new tokens**: Control response length - **Temperature**: Adjust creativity (0.1 = focused, 1.0 = creative) - **Top-p**: Control diversity of responses ### Example Questions - "What are reasonable adjustments for students with disabilities?" - "What does the Equality Act say about education?" - "Tell me about competence standards in higher education." - "What are the implications of the University of Bristol case?" ## 🔧 Configuration ### Adding New Documents 1. Place new documents in the `data/` directory 2. Run the population script: ```bash python populate_db.py ``` ### Supported Document Formats - PDF files (.pdf) - Word documents (.docx) - HTML files (.html) - Text files (.txt) - And more via LangChain's document loaders ### Using Different Models #### Ollama (Recommended) 1. Install Ollama from [ollama.ai](https://ollama.ai) 2. Pull the Mistral model: ```bash ollama pull mistral ``` #### HuggingFace (Fallback) The system automatically falls back to HuggingFace's API if Ollama is unavailable. Make sure to set your `ACCESS_TOKEN` environment variable. ## 🛠️ Troubleshooting ### Common Issues 1. **"No module named 'langchain'"** - Run: `pip install -r requirements.txt` 2. **"Ollama connection failed"** - Install Ollama and pull the mistral model - Or rely on HuggingFace fallback 3. **"No relevant documents found"** - Check if documents are in the `data/` directory - Run `python populate_db.py --reset` to rebuild the database 4. **"Access token error"** - Set your HuggingFace token: `$env:ACCESS_TOKEN="your_token_here"` ### Database Reset To completely reset the database: ```bash python populate_db.py --reset ``` ## 📝 Technical Details - **Embeddings**: Uses `sentence-transformers/all-mpnet-base-v2` for document embeddings - **Vector Store**: ChromaDB for efficient similarity search - **Text Splitting**: Recursive character splitter with 1000 char chunks, 200 overlap - **LLM Integration**: Primary Ollama (Mistral), fallback HuggingFace API ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch 3. Make your changes 4. Test thoroughly 5. Submit a pull request ## 📄 License This project is for academic/educational purposes. Please ensure you have appropriate licenses for any documents you upload. --- **Note**: This chatbot is designed for academic research and educational purposes. Always verify important information from original sources.