Spaces:

daniel-was-taken
/

CompifAI

Runtime error

App Files Files Community

CompifAI / README.md

Daniel Ferreira

Update README to remove introductory description

7ccfa59 13 days ago

preview code

raw

history blame contribute delete

5.75 kB

metadata

title: CompifAI
emoji: 💻
colorFrom: gray
colorTo: green
sdk: docker
pinned: false
license: apache-2.0

Competence Standards RAG Chatbot

📋 Table of Contents

About the Project
Features
Prerequisites
Installation
Configuration
Starting the Application
Database Population
Data Management
Testing

🎯 About the Project

This RAG chatbot is specifically designed to assist with competence standards in higher education settings. It provides intelligent responses based on a curated knowledge base of documents.

The system uses advanced natural language processing to understand queries and retrieve relevant information from the document corpus, providing contextually appropriate responses.

✨ Features

Interactive Chat Interface: Built with Chainlit for an intuitive user experience
Vector Search: Powered by Milvus for efficient similarity search
Advanced Embeddings: Uses Nebius AI Qwen3-Embedding-8B model
Document Processing: Supports multiple formats (PDF, DOCX, HTML)
Authentication: Integrated with Chainlit's authentication system
Persistent Storage: Database integration with PostgreSQL
Containerized Deployment: Docker Compose setup for easy deployment
Testing Suite: Comprehensive testing with RAGAS evaluation

🚀 Installation

Clone the repository

git clone https://github.com/daniel-was-taken/prod-rag-chat.git
cd prod-rag-chat

Create a virtual environment

python -m venv venv

# On Windows
.\venv\Scripts\activate

# On macOS/Linux
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

⚙️ Configuration

Create environment file

cp .env.example .env  # If available, or create manually

Configure environment variables Create a .env file in the root directory with the following variables:

# Nebius AI Configuration
NEBIUS_API_KEY=your_nebius_api_key_here
OPENAI_API_KEY=your_openai_api_key_here  # Fallback

# Milvus Configuration
MILVUS_URI=http://localhost:19530

# Chainlit Configuration
CHAINLIT_AUTH_SECRET=your_auth_secret_here

# Database Configuration (if using PostgreSQL)
DATABASE_URL=postgresql://user:password@localhost:5432/dbname

🎬 Starting the Application

Method 1: Using Docker Compose (Recommended)

Start the infrastructure services
```
docker-compose up -d
```
This will start:
- Milvus vector database
- etcd (for Milvus coordination)
- MinIO (for Milvus storage)
Wait for services to be ready
Start the Chainlit application
```
chainlit run app.py -w
```

Method 2: Manual Setup

Install and configure Milvus standalone Follow the Milvus installation guide
Start the application
```
chainlit run app.py -w
```

The application will be available at http://localhost:8000

📊 Database Population

The system automatically populates the vector database on first startup. However, you can manually manage the data:

Automatic Population

The application automatically checks if the Milvus collection exists and has data. If not, it runs the population script automatically.

Manual Population

To manually populate or repopulate the database:

python populate_db.py

Adding New Documents

Add documents to the data directory

# Place your documents in the data/ folder
cp your_new_document.pdf data/
cp your_new_document.docx data/

Supported file formats:
- PDF files (.pdf)
- Microsoft Word documents (.docx)
- HTML files (.html)

Repopulate the database

# Delete existing collection
python delete_collection.py

# Repopulate with new documents
python populate_db.py

Database Configuration

The population script uses the following configuration:

Embedding Model: Qwen/Qwen3-Embedding-8B (4096 dimensions)
Chunk Size: 1500 characters maximum
Combine Threshold: 200 characters minimum
Batch Size: 5 documents per batch
Collection Name: my_rag_collection

Document Processing Pipeline

Loading: Documents are loaded using UnstructuredLoader
Cleaning: Text is cleaned and normalized
Chunking: Documents are split into manageable chunks
Embedding: Chunks are converted to vector embeddings
Storage: Embeddings are stored in Milvus with metadata

🗃️ Data Management

Deleting the Collection

python delete_collection.py

Updating Documents

To update the document corpus:

Add/remove documents in the data/ directory
Delete the existing collection: python delete_collection.py
Restart the application (it will automatically repopulate)

🧪 Testing

The project includes comprehensive testing:

Running Unit Tests


# Run specific test files
python -m unittest tests/test_chainlit.py -v

RAGAS Evaluation

Evaluate the RAG system performance:

# Run RAGAS evaluation
python tests/test_ragas.py

# Or use the Jupyter notebook
jupyter notebook tests/test_ragas.ipynb

Manual Testing

Test individual components:

# Test vector search functionality
python tests/test_vector_search.py

📄 License

This project is licensed under the MIT license.