Spaces:
Sleeping
Sleeping
| title: AgentCSQuery | |
| emoji: π | |
| colorFrom: yellow | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.0.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: RAG-powered chatbot for competence standard | |
| # π AskCSQuery - RAG-Powered Academic Assistant | |
| A Retrieval-Augmented Generation (RAG) chatbot built with Gradio, LangChain, and ChromaDB for answering competence standard and academic questions based on uploaded documents. | |
| ## π Features | |
| - **Document-based Q&A**: Ask questions and get answers based on your uploaded documents | |
| - **Intelligent Retrieval**: Uses ChromaDB for efficient document storage and retrieval | |
| - **Multiple LLM Support**: Primary support for Ollama (Mistral) with HuggingFace fallback | |
| - **Modern Web Interface**: Built with Gradio for easy interaction | |
| - **Source Citation**: Shows which documents were used to generate answers | |
| ## π Quick Start | |
| ### Prerequisites | |
| - Python 3.8+ | |
| - (Optional) Ollama with Mistral model for better performance | |
| - HuggingFace API token (set as `ACCESS_TOKEN` environment variable) | |
| ### Installation | |
| 1. **Clone or navigate to the project directory** | |
| 2. **Install dependencies**: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Set up environment variables**: | |
| ```bash | |
| # Windows PowerShell | |
| $env:ACCESS_TOKEN="your_huggingface_token_here" | |
| # Linux/Mac | |
| export ACCESS_TOKEN="your_huggingface_token_here" | |
| ``` | |
| 4. **Run the setup script**: | |
| ```bash | |
| python setup.py | |
| ``` | |
| This will: | |
| - Install all required packages | |
| - Populate the ChromaDB database with documents from the `data/` folder | |
| - Prepare the system for use | |
| 5. **Start the chatbot**: | |
| ```bash | |
| python app.py | |
| ``` | |
| ### Manual Setup (Alternative) | |
| If you prefer manual setup: | |
| 1. **Populate the database**: | |
| ```bash | |
| python populate_db.py --reset | |
| ``` | |
| 2. **Start the application**: | |
| ```bash | |
| python app.py | |
| ``` | |
| ## π Project Structure | |
| ``` | |
| AskCSQuery/ | |
| βββ app.py # Main Gradio application | |
| βββ populate_db.py # Database population script | |
| βββ setup.py # Automated setup script | |
| βββ requirements.txt # Python dependencies | |
| βββ config.json # Configuration file (currently empty) | |
| βββ README.md # This file | |
| βββ data/ # Document storage directory | |
| β βββ *.pdf # PDF documents | |
| β βββ *.docx # Word documents | |
| β βββ *.html # HTML documents | |
| β βββ ... # Other supported formats | |
| βββ chroma_db/ # ChromaDB storage (created automatically) | |
| ``` | |
| ## π‘ Usage | |
| 1. **Start the application** and open the provided URL in your browser | |
| 2. **Ask questions** related to the content in your documents | |
| 3. **Review answers** with source citations | |
| 4. **Adjust parameters** using the sidebar controls: | |
| - **System message**: Customize the AI's behavior | |
| - **Max new tokens**: Control response length | |
| - **Temperature**: Adjust creativity (0.1 = focused, 1.0 = creative) | |
| - **Top-p**: Control diversity of responses | |
| ### Example Questions | |
| - "What are reasonable adjustments for students with disabilities?" | |
| - "What does the Equality Act say about education?" | |
| - "Tell me about competence standards in higher education." | |
| - "What are the implications of the University of Bristol case?" | |
| ## π§ Configuration | |
| ### Adding New Documents | |
| 1. Place new documents in the `data/` directory | |
| 2. Run the population script: | |
| ```bash | |
| python populate_db.py | |
| ``` | |
| ### Supported Document Formats | |
| - PDF files (.pdf) | |
| - Word documents (.docx) | |
| - HTML files (.html) | |
| - Text files (.txt) | |
| - And more via LangChain's document loaders | |
| ### Using Different Models | |
| #### Ollama (Recommended) | |
| 1. Install Ollama from [ollama.ai](https://ollama.ai) | |
| 2. Pull the Mistral model: | |
| ```bash | |
| ollama pull mistral | |
| ``` | |
| #### HuggingFace (Fallback) | |
| The system automatically falls back to HuggingFace's API if Ollama is unavailable. Make sure to set your `ACCESS_TOKEN` environment variable. | |
| ## π οΈ Troubleshooting | |
| ### Common Issues | |
| 1. **"No module named 'langchain'"** | |
| - Run: `pip install -r requirements.txt` | |
| 2. **"Ollama connection failed"** | |
| - Install Ollama and pull the mistral model | |
| - Or rely on HuggingFace fallback | |
| 3. **"No relevant documents found"** | |
| - Check if documents are in the `data/` directory | |
| - Run `python populate_db.py --reset` to rebuild the database | |
| 4. **"Access token error"** | |
| - Set your HuggingFace token: `$env:ACCESS_TOKEN="your_token_here"` | |
| ### Database Reset | |
| To completely reset the database: | |
| ```bash | |
| python populate_db.py --reset | |
| ``` | |
| ## π Technical Details | |
| - **Embeddings**: Uses `sentence-transformers/all-mpnet-base-v2` for document embeddings | |
| - **Vector Store**: ChromaDB for efficient similarity search | |
| - **Text Splitting**: Recursive character splitter with 1000 char chunks, 200 overlap | |
| - **LLM Integration**: Primary Ollama (Mistral), fallback HuggingFace API | |
| ## π€ Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Make your changes | |
| 4. Test thoroughly | |
| 5. Submit a pull request | |
| ## π License | |
| This project is for academic/educational purposes. Please ensure you have appropriate licenses for any documents you upload. | |
| --- | |
| **Note**: This chatbot is designed for academic research and educational purposes. Always verify important information from original sources. |