Spaces:

daniel-was-taken
/

Agent_CS

Sleeping

App Files Files Community

Agent_CS / README.md

daniel-was-taken

Update README.md

3a7c270 verified 8 months ago

preview code

raw

history blame contribute delete

5.43 kB

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

metadata

title: AgentCSQuery
emoji: 🎓
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.0.1
app_file: app.py
pinned: false
license: mit
short_description: RAG-powered chatbot for competence standard

🎓 AskCSQuery - RAG-Powered Academic Assistant

A Retrieval-Augmented Generation (RAG) chatbot built with Gradio, LangChain, and ChromaDB for answering competence standard and academic questions based on uploaded documents.

📋 Features

Document-based Q&A: Ask questions and get answers based on your uploaded documents
Intelligent Retrieval: Uses ChromaDB for efficient document storage and retrieval
Multiple LLM Support: Primary support for Ollama (Mistral) with HuggingFace fallback
Modern Web Interface: Built with Gradio for easy interaction
Source Citation: Shows which documents were used to generate answers

🚀 Quick Start

Prerequisites

Python 3.8+
(Optional) Ollama with Mistral model for better performance
HuggingFace API token (set as ACCESS_TOKEN environment variable)

Installation

Clone or navigate to the project directory
Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

# Windows PowerShell
$env:ACCESS_TOKEN="your_huggingface_token_here"

# Linux/Mac
export ACCESS_TOKEN="your_huggingface_token_here"

Run the setup script:
```
python setup.py
```
This will:
- Install all required packages
- Populate the ChromaDB database with documents from the data/ folder
- Prepare the system for use
Start the chatbot:
```
python app.py
```

Manual Setup (Alternative)

If you prefer manual setup:

Populate the database:
```
python populate_db.py --reset
```
Start the application:
```
python app.py
```

📁 Project Structure

AskCSQuery/
├── app.py                 # Main Gradio application
├── populate_db.py         # Database population script
├── setup.py              # Automated setup script
├── requirements.txt       # Python dependencies
├── config.json           # Configuration file (currently empty)
├── README.md             # This file
├── data/                 # Document storage directory
│   ├── *.pdf            # PDF documents
│   ├── *.docx           # Word documents
│   ├── *.html           # HTML documents
│   └── ...              # Other supported formats
└── chroma_db/           # ChromaDB storage (created automatically)

💡 Usage

Start the application and open the provided URL in your browser
Ask questions related to the content in your documents
Review answers with source citations
Adjust parameters using the sidebar controls:
- System message: Customize the AI's behavior
- Max new tokens: Control response length
- Temperature: Adjust creativity (0.1 = focused, 1.0 = creative)
- Top-p: Control diversity of responses

Example Questions

"What are reasonable adjustments for students with disabilities?"
"What does the Equality Act say about education?"
"Tell me about competence standards in higher education."
"What are the implications of the University of Bristol case?"

🔧 Configuration

Adding New Documents

Place new documents in the data/ directory
Run the population script:
```
python populate_db.py
```

Supported Document Formats

PDF files (.pdf)
Word documents (.docx)
HTML files (.html)
Text files (.txt)
And more via LangChain's document loaders

Using Different Models

Ollama (Recommended)

Install Ollama from ollama.ai
Pull the Mistral model:
```
ollama pull mistral
```

HuggingFace (Fallback)

The system automatically falls back to HuggingFace's API if Ollama is unavailable. Make sure to set your ACCESS_TOKEN environment variable.

🛠️ Troubleshooting

Common Issues

"No module named 'langchain'"
- Run: pip install -r requirements.txt
"Ollama connection failed"
- Install Ollama and pull the mistral model
- Or rely on HuggingFace fallback
"No relevant documents found"
- Check if documents are in the data/ directory
- Run python populate_db.py --reset to rebuild the database
"Access token error"
- Set your HuggingFace token: $env:ACCESS_TOKEN="your_token_here"

Database Reset

To completely reset the database:

python populate_db.py --reset

📝 Technical Details

Embeddings: Uses sentence-transformers/all-mpnet-base-v2 for document embeddings
Vector Store: ChromaDB for efficient similarity search
Text Splitting: Recursive character splitter with 1000 char chunks, 200 overlap
LLM Integration: Primary Ollama (Mistral), fallback HuggingFace API

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📄 License

This project is for academic/educational purposes. Please ensure you have appropriate licenses for any documents you upload.

Note: This chatbot is designed for academic research and educational purposes. Always verify important information from original sources.