rag-w-binary-quant / README.md
serverdaun's picture
Add badges for continuous deployment and Hugging Face Spaces in README.md to enhance visibility and accessibility of project status.
fc565cd
|
raw
history blame
5.74 kB
---
title: Rag with Binary Quantization
emoji: πŸ“œ
colorFrom: yellow
colorTo: indigo
sdk: gradio
sdk_version: 5.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: RAG with Binary Quantization for enhanced performance
---
![CD to HF Space](https://github.com/serverdaun/rag-w-binary-quant/actions/workflows/cd-hf.yml/badge.svg)
[![View on Hugging Face Spaces](https://img.shields.io/badge/Hugging%20Face-Spaces-blue?logo=huggingface)](https://huggingface.co/spaces/serverdaun/rag-w-binary-quant)
# RAG with Binary Quantization
A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.
## πŸš€ Features
- **Binary Quantization**: Converts high-dimensional embeddings to binary vectors for memory efficiency
- **Milvus Vector Database**: Uses Milvus for scalable vector storage and similarity search
- **Gradio Web Interface**: User-friendly web UI for document upload and chat
- **BGE Embeddings**: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
- **OpenAI Integration**: Uses GPT-4.1 for intelligent question answering
- **Batch Processing**: Efficient document processing with configurable batch sizes
## πŸ—οΈ Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Documents │───▢│ BGE Embeddings │───▢│ Binary Vectors β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User Query │───▢│ Query Embedding │───▢│ Milvus Search β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Retrieved Docs │◀───│ Context Fusion │◀───│ LLM Answer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
## πŸ› οΈ Installation
1. **Clone the repository**:
```bash
git clone <repository-url>
cd rag-w-binary-quant
```
2. **Install dependencies**:
```bash
uv sync
```
3. **Set up environment variables**:
Create a `.env` file with your OpenAI API key:
```env
OPENAI_API_KEY=your_openai_api_key_here
```
## πŸš€ Usage
### Starting the Application
Run the Gradio web interface:
```bash
uv run app.py
```
The application will be available at `http://localhost:7860`
### Using the Interface
1. **Upload Documents**:
- Go to the "Upload & Index" tab
- Upload your documents (supports multiple file formats)
- Click "Update Index" to process and index the documents
2. **Chat with Documents**:
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get intelligent answers based on the document content
## πŸ”§ Configuration
Key configuration parameters in `src/config.py`:
- `EMBEDDING_MODEL_NAME`: BAAI/bge-large-en-v1.5
- `COLLECTION_NAME`: "fast_rag"
- `MILVUS_DB_PATH`: "milvus_binary_quantized.db"
- `MODEL_NAME`: "gpt-4.1"
- `TEMPERATURE`: 0.2
## πŸ“Š Performance Benefits
- **Memory Efficiency**: Binary vectors use 8x less memory than float32 embeddings
- **Fast Search**: Hamming distance computation is highly optimized
- **Scalable**: Milvus provides enterprise-grade vector database capabilities
- **Accurate**: BGE embeddings provide high-quality semantic representations
## πŸ›οΈ Project Structure
```
rag-w-binary-quant/
β”œβ”€β”€ app.py # Gradio web interface
β”œβ”€β”€ main.py # Main application entry point
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ config.py # Configuration settings
β”‚ β”œβ”€β”€ data_loader.py # Document loading utilities
β”‚ β”œβ”€β”€ embedding_generator.py # Binary embedding generation
β”‚ β”œβ”€β”€ vector_store.py # Milvus vector database operations
β”‚ └── rag_pipeline.py # RAG question answering pipeline
β”œβ”€β”€ documents/ # Uploaded document storage
└── README.md
```
## πŸ” Technical Details
### Binary Quantization Process
1. **Float32 Embeddings**: Generate embeddings using BGE model
2. **Binary Conversion**: Convert to binary using threshold (positive values β†’ 1, negative β†’ 0)
3. **Packing**: Pack binary vectors into bytes for efficient storage
4. **Hamming Distance**: Use Hamming distance for similarity search
### Vector Search
- **Index Type**: BIN_FLAT (exact search for binary vectors)
- **Metric**: Hamming distance
- **Retrieval**: Top-k most similar documents
## πŸ™ Acknowledgments
- [BAAI](https://github.com/FlagOpen/FlagEmbedding) for the BGE embedding model
- [Milvus](https://milvus.io/) for the vector database
- [Gradio](https://gradio.app/) for the web interface
- [OpenAI](https://openai.com/) for the language model