Spaces:
Sleeping
Sleeping
| title: Rag with Binary Quantization | |
| emoji: π | |
| colorFrom: yellow | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.41.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| short_description: RAG with Binary Quantization for enhanced performance | |
| # RAG with Binary Quantization | |
| A high-performance Retrieval-Augmented Generation (RAG) system that uses binary quantization for efficient vector storage and similarity search. This project implements a document Q&A system with optimized memory usage and fast retrieval capabilities. | |
| ## π Features | |
| - **Binary Quantization**: Converts high-dimensional embeddings to binary vectors for memory efficiency | |
| - **Milvus Vector Database**: Uses Milvus for scalable vector storage and similarity search | |
| - **Gradio Web Interface**: User-friendly web UI for document upload and chat | |
| - **BGE Embeddings**: Leverages BAAI/bge-large-en-v1.5 for high-quality text embeddings | |
| - **OpenAI Integration**: Uses GPT-4.1 for intelligent question answering | |
| - **Batch Processing**: Efficient document processing with configurable batch sizes | |
| ## ποΈ Architecture | |
| ``` | |
| βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ | |
| β Documents βββββΆβ BGE Embeddings βββββΆβ Binary Vectors β | |
| βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ | |
| β | |
| βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ | |
| β User Query βββββΆβ Query Embedding βββββΆβ Milvus Search β | |
| βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ | |
| β | |
| βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ | |
| β Retrieved Docs ββββββ Context Fusion ββββββ LLM Answer β | |
| βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ | |
| ``` | |
| ## π οΈ Installation | |
| 1. **Clone the repository**: | |
| ```bash | |
| git clone <repository-url> | |
| cd rag-w-binary-quant | |
| ``` | |
| 2. **Install dependencies**: | |
| ```bash | |
| uv sync | |
| ``` | |
| 3. **Set up environment variables**: | |
| Create a `.env` file with your OpenAI API key: | |
| ```env | |
| OPENAI_API_KEY=your_openai_api_key_here | |
| ``` | |
| ## π Usage | |
| ### Starting the Application | |
| Run the Gradio web interface: | |
| ```bash | |
| uv run app.py | |
| ``` | |
| The application will be available at `http://localhost:7860` | |
| ### Using the Interface | |
| 1. **Upload Documents**: | |
| - Go to the "Upload & Index" tab | |
| - Upload your documents (supports multiple file formats) | |
| - Click "Update Index" to process and index the documents | |
| 2. **Chat with Documents**: | |
| - Switch to the "Chat" tab | |
| - Ask questions about your uploaded documents | |
| - Get intelligent answers based on the document content | |
| ## π§ Configuration | |
| Key configuration parameters in `src/config.py`: | |
| - `EMBEDDING_MODEL_NAME`: BAAI/bge-large-en-v1.5 | |
| - `COLLECTION_NAME`: "fast_rag" | |
| - `MILVUS_DB_PATH`: "milvus_binary_quantized.db" | |
| - `MODEL_NAME`: "gpt-4.1" | |
| - `TEMPERATURE`: 0.2 | |
| ## π Performance Benefits | |
| - **Memory Efficiency**: Binary vectors use 8x less memory than float32 embeddings | |
| - **Fast Search**: Hamming distance computation is highly optimized | |
| - **Scalable**: Milvus provides enterprise-grade vector database capabilities | |
| - **Accurate**: BGE embeddings provide high-quality semantic representations | |
| ## ποΈ Project Structure | |
| ``` | |
| rag-w-binary-quant/ | |
| βββ app.py # Gradio web interface | |
| βββ main.py # Main application entry point | |
| βββ src/ | |
| β βββ config.py # Configuration settings | |
| β βββ data_loader.py # Document loading utilities | |
| β βββ embedding_generator.py # Binary embedding generation | |
| β βββ vector_store.py # Milvus vector database operations | |
| β βββ rag_pipeline.py # RAG question answering pipeline | |
| βββ documents/ # Uploaded document storage | |
| βββ README.md | |
| ``` | |
| ## π Technical Details | |
| ### Binary Quantization Process | |
| 1. **Float32 Embeddings**: Generate embeddings using BGE model | |
| 2. **Binary Conversion**: Convert to binary using threshold (positive values β 1, negative β 0) | |
| 3. **Packing**: Pack binary vectors into bytes for efficient storage | |
| 4. **Hamming Distance**: Use Hamming distance for similarity search | |
| ### Vector Search | |
| - **Index Type**: BIN_FLAT (exact search for binary vectors) | |
| - **Metric**: Hamming distance | |
| - **Retrieval**: Top-k most similar documents | |
| ## π Acknowledgments | |
| - [BAAI](https://github.com/FlagOpen/FlagEmbedding) for the BGE embedding model | |
| - [Milvus](https://milvus.io/) for the vector database | |
| - [Gradio](https://gradio.app/) for the web interface | |
| - [OpenAI](https://openai.com/) for the language model | |