Spaces:

daniel-was-taken
/

Agent_CS

Sleeping

App Files Files Community

Agent_CS / README.md

daniel-was-taken

Update README.md

3a7c270 verified 8 months ago

preview code

raw

history blame contribute delete

5.43 kB

	---
	title: AgentCSQuery
	emoji: 🎓
	colorFrom: yellow
	colorTo: purple
	sdk: gradio
	sdk_version: 5.0.1
	app_file: app.py
	pinned: false
	license: mit
	short_description: RAG-powered chatbot for competence standard
	---

	# 🎓 AskCSQuery - RAG-Powered Academic Assistant

	A Retrieval-Augmented Generation (RAG) chatbot built with Gradio, LangChain, and ChromaDB for answering competence standard and academic questions based on uploaded documents.

	## 📋 Features

	- Document-based Q&A: Ask questions and get answers based on your uploaded documents
	- Intelligent Retrieval: Uses ChromaDB for efficient document storage and retrieval
	- Multiple LLM Support: Primary support for Ollama (Mistral) with HuggingFace fallback
	- Modern Web Interface: Built with Gradio for easy interaction
	- Source Citation: Shows which documents were used to generate answers

	## 🚀 Quick Start

	### Prerequisites

	- Python 3.8+
	- (Optional) Ollama with Mistral model for better performance
	- HuggingFace API token (set as `ACCESS_TOKEN` environment variable)

	### Installation

	1. Clone or navigate to the project directory
	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Set up environment variables:
	```bash
	# Windows PowerShell
	$env:ACCESS_TOKEN="your_huggingface_token_here"

	# Linux/Mac
	export ACCESS_TOKEN="your_huggingface_token_here"
	```

	4. Run the setup script:
	```bash
	python setup.py
	```

	This will:
	- Install all required packages
	- Populate the ChromaDB database with documents from the `data/` folder
	- Prepare the system for use

	5. Start the chatbot:
	```bash
	python app.py
	```

	### Manual Setup (Alternative)

	If you prefer manual setup:

	1. Populate the database:
	```bash
	python populate_db.py --reset
	```

	2. Start the application:
	```bash
	python app.py
	```

	## 📁 Project Structure

	```
	AskCSQuery/
	├── app.py # Main Gradio application
	├── populate_db.py # Database population script
	├── setup.py # Automated setup script
	├── requirements.txt # Python dependencies
	├── config.json # Configuration file (currently empty)
	├── README.md # This file
	├── data/ # Document storage directory
	│ ├── *.pdf # PDF documents
	│ ├── *.docx # Word documents
	│ ├── *.html # HTML documents
	│ └── ... # Other supported formats
	└── chroma_db/ # ChromaDB storage (created automatically)
	```

	## 💡 Usage

	1. Start the application and open the provided URL in your browser
	2. Ask questions related to the content in your documents
	3. Review answers with source citations
	4. Adjust parameters using the sidebar controls:
	- System message: Customize the AI's behavior
	- Max new tokens: Control response length
	- Temperature: Adjust creativity (0.1 = focused, 1.0 = creative)
	- Top-p: Control diversity of responses

	### Example Questions

	- "What are reasonable adjustments for students with disabilities?"
	- "What does the Equality Act say about education?"
	- "Tell me about competence standards in higher education."
	- "What are the implications of the University of Bristol case?"

	## 🔧 Configuration

	### Adding New Documents

	1. Place new documents in the `data/` directory
	2. Run the population script:
	```bash
	python populate_db.py
	```

	### Supported Document Formats

	- PDF files (.pdf)
	- Word documents (.docx)
	- HTML files (.html)
	- Text files (.txt)
	- And more via LangChain's document loaders

	### Using Different Models

	#### Ollama (Recommended)

	1. Install Ollama from [ollama.ai](https://ollama.ai)
	2. Pull the Mistral model:
	```bash
	ollama pull mistral
	```

	#### HuggingFace (Fallback)

	The system automatically falls back to HuggingFace's API if Ollama is unavailable. Make sure to set your `ACCESS_TOKEN` environment variable.

	## 🛠️ Troubleshooting

	### Common Issues

	1. "No module named 'langchain'"
	- Run: `pip install -r requirements.txt`

	2. "Ollama connection failed"
	- Install Ollama and pull the mistral model
	- Or rely on HuggingFace fallback

	3. "No relevant documents found"
	- Check if documents are in the `data/` directory
	- Run `python populate_db.py --reset` to rebuild the database

	4. "Access token error"
	- Set your HuggingFace token: `$env:ACCESS_TOKEN="your_token_here"`

	### Database Reset

	To completely reset the database:
	```bash
	python populate_db.py --reset
	```

	## 📝 Technical Details

	- Embeddings: Uses `sentence-transformers/all-mpnet-base-v2` for document embeddings
	- Vector Store: ChromaDB for efficient similarity search
	- Text Splitting: Recursive character splitter with 1000 char chunks, 200 overlap
	- LLM Integration: Primary Ollama (Mistral), fallback HuggingFace API

	## 🤝 Contributing

	1. Fork the repository
	2. Create a feature branch
	3. Make your changes
	4. Test thoroughly
	5. Submit a pull request

	## 📄 License

	This project is for academic/educational purposes. Please ensure you have appropriate licenses for any documents you upload.

	---

	Note: This chatbot is designed for academic research and educational purposes. Always verify important information from original sources.