Spaces:

serverdaun
/

rag-w-binary-quant

Sleeping

rag-w-binary-quant / README.md

Add badges for continuous deployment and Hugging Face Spaces in README.md to enhance visibility and accessibility of project status.

fc565cd 4 months ago

preview code

raw

history blame contribute delete

5.74 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

metadata

title: Rag with Binary Quantization
emoji: 📜
colorFrom: yellow
colorTo: indigo
sdk: gradio
sdk_version: 5.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: RAG with Binary Quantization for enhanced performance

RAG with Binary Quantization

A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.

🚀 Features

Binary Quantization: Converts high-dimensional embeddings to binary vectors for memory efficiency
Milvus Vector Database: Uses Milvus for scalable vector storage and similarity search
Gradio Web Interface: User-friendly web UI for document upload and chat
BGE Embeddings: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
OpenAI Integration: Uses GPT-4.1 for intelligent question answering
Batch Processing: Efficient document processing with configurable batch sizes

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Documents     │───▶│  BGE Embeddings  │───▶│ Binary Vectors  │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   User Query    │───▶│  Query Embedding │───▶│  Milvus Search  │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                                         │
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  Retrieved Docs │◀───│  Context Fusion  │◀───│  LLM Answer     │
└─────────────────┘    └──────────────────┘    └─────────────────┘

🛠️ Installation

Clone the repository:

git clone <repository-url>
cd rag-w-binary-quant

Install dependencies:
```
uv sync
```
Set up environment variables: Create a .env file with your OpenAI API key:
```
OPENAI_API_KEY=your_openai_api_key_here
```

🚀 Usage

Starting the Application

Run the Gradio web interface:

uv run app.py

The application will be available at http://localhost:7860

Using the Interface

Upload Documents:
- Go to the "Upload & Index" tab
- Upload your documents (supports multiple file formats)
- Click "Update Index" to process and index the documents
Chat with Documents:
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get intelligent answers based on the document content

🔧 Configuration

Key configuration parameters in src/config.py:

EMBEDDING_MODEL_NAME: BAAI/bge-large-en-v1.5
COLLECTION_NAME: "fast_rag"
MILVUS_DB_PATH: "milvus_binary_quantized.db"
MODEL_NAME: "gpt-4.1"
TEMPERATURE: 0.2

📊 Performance Benefits

Memory Efficiency: Binary vectors use 8x less memory than float32 embeddings
Fast Search: Hamming distance computation is highly optimized
Scalable: Milvus provides enterprise-grade vector database capabilities
Accurate: BGE embeddings provide high-quality semantic representations

🏛️ Project Structure

rag-w-binary-quant/
├── app.py                 # Gradio web interface
├── main.py               # Main application entry point
├── src/
│   ├── config.py         # Configuration settings
│   ├── data_loader.py    # Document loading utilities
│   ├── embedding_generator.py  # Binary embedding generation
│   ├── vector_store.py   # Milvus vector database operations
│   └── rag_pipeline.py   # RAG question answering pipeline
├── documents/            # Uploaded document storage
└── README.md

🔍 Technical Details

Binary Quantization Process

Float32 Embeddings: Generate embeddings using BGE model
Binary Conversion: Convert to binary using threshold (positive values → 1, negative → 0)
Packing: Pack binary vectors into bytes for efficient storage
Hamming Distance: Use Hamming distance for similarity search

Vector Search

Index Type: BIN_FLAT (exact search for binary vectors)
Metric: Hamming distance
Retrieval: Top-k most similar documents

🙏 Acknowledgments

BAAI for the BGE embedding model
Milvus for the vector database
Gradio for the web interface
OpenAI for the language model