Spaces:
Sleeping
Sleeping
metadata
title: Rag with Binary Quantization
emoji: π
colorFrom: yellow
colorTo: indigo
sdk: gradio
sdk_version: 5.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: RAG with Binary Quantization for enhanced performance
RAG with Binary Quantization
A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities.
π Features
- Binary Quantization: Converts high-dimensional embeddings to binary vectors for memory efficiency
- Milvus Vector Database: Uses Milvus for scalable vector storage and similarity search
- Gradio Web Interface: User-friendly web UI for document upload and chat
- BGE Embeddings: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings
- OpenAI Integration: Uses GPT-4.1 for intelligent question answering
- Batch Processing: Efficient document processing with configurable batch sizes
ποΈ Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Documents βββββΆβ BGE Embeddings βββββΆβ Binary Vectors β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β User Query βββββΆβ Query Embedding βββββΆβ Milvus Search β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Retrieved Docs ββββββ Context Fusion ββββββ LLM Answer β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
π οΈ Installation
Clone the repository:
git clone <repository-url> cd rag-w-binary-quantInstall dependencies:
uv syncSet up environment variables: Create a
.envfile with your OpenAI API key:OPENAI_API_KEY=your_openai_api_key_here
π Usage
Starting the Application
Run the Gradio web interface:
uv run app.py
The application will be available at http://localhost:7860
Using the Interface
Upload Documents:
- Go to the "Upload & Index" tab
- Upload your documents (supports multiple file formats)
- Click "Update Index" to process and index the documents
Chat with Documents:
- Switch to the "Chat" tab
- Ask questions about your uploaded documents
- Get intelligent answers based on the document content
π§ Configuration
Key configuration parameters in src/config.py:
EMBEDDING_MODEL_NAME: BAAI/bge-large-en-v1.5COLLECTION_NAME: "fast_rag"MILVUS_DB_PATH: "milvus_binary_quantized.db"MODEL_NAME: "gpt-4.1"TEMPERATURE: 0.2
π Performance Benefits
- Memory Efficiency: Binary vectors use 8x less memory than float32 embeddings
- Fast Search: Hamming distance computation is highly optimized
- Scalable: Milvus provides enterprise-grade vector database capabilities
- Accurate: BGE embeddings provide high-quality semantic representations
ποΈ Project Structure
rag-w-binary-quant/
βββ app.py # Gradio web interface
βββ main.py # Main application entry point
βββ src/
β βββ config.py # Configuration settings
β βββ data_loader.py # Document loading utilities
β βββ embedding_generator.py # Binary embedding generation
β βββ vector_store.py # Milvus vector database operations
β βββ rag_pipeline.py # RAG question answering pipeline
βββ documents/ # Uploaded document storage
βββ README.md
π Technical Details
Binary Quantization Process
- Float32 Embeddings: Generate embeddings using BGE model
- Binary Conversion: Convert to binary using threshold (positive values β 1, negative β 0)
- Packing: Pack binary vectors into bytes for efficient storage
- Hamming Distance: Use Hamming distance for similarity search
Vector Search
- Index Type: BIN_FLAT (exact search for binary vectors)
- Metric: Hamming distance
- Retrieval: Top-k most similar documents