QuantaSparkLabs commited on
Commit
b782836
Β·
verified Β·
1 Parent(s): 2d9ebee

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +154 -0
README.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - rag
7
+ - retrieval
8
+ - semantic-search
9
+ - faiss
10
+ - bm25
11
+ - cross-encoder
12
+ - sentence-transformers
13
+ - hybrid-search
14
+ - dense-retrieval
15
+ - ai
16
+ - search
17
+ pipeline_tag: sentence-similarity
18
+ library_name: sentence-transformers
19
+ ---
20
+ # ApexRetriever
21
+
22
+ A lightweight hybrid retrieval system designed for fast semantic search and RAG pipelines.
23
+
24
+ Built for:
25
+ - semantic search
26
+ - lightweight RAG
27
+ - AI assistants
28
+ - retrieval systems
29
+ - local document QA
30
+
31
+ ---
32
+
33
+ # Architecture
34
+
35
+ ## Stage β‘  β€” BM25 Sparse Retrieval
36
+ Keyword-based retrieval for fast lexical matching.
37
+
38
+ ## Stage β‘‘ β€” Dense Semantic Search
39
+ Powered by:
40
+
41
+ - `BAAI/bge-small-en-v1.5`
42
+
43
+ Uses FAISS vector indexing.
44
+
45
+ ## Stage β‘’ β€” CrossEncoder Reranking
46
+ Final neural reranking using:
47
+
48
+ - `cross-encoder/ms-marco-MiniLM-L-6-v2`
49
+
50
+ ---
51
+
52
+ # Features
53
+
54
+ - Hybrid retrieval
55
+ - Fast indexing
56
+ - Dense semantic search
57
+ - Neural reranking
58
+ - Lightweight deployment
59
+ - GPU acceleration
60
+ - FAISS support
61
+ - Easy integration
62
+
63
+ ---
64
+
65
+ # Repository Structure
66
+
67
+ ```text
68
+ ApexRetriever/
69
+ β”‚
70
+ β”œβ”€β”€ bi_encoder/
71
+ β”œβ”€β”€ reranker/
72
+ β”œβ”€β”€ pipeline.py
73
+ └── README.md
74
+ ````
75
+
76
+ ---
77
+
78
+ # Installation
79
+
80
+ ```bash
81
+ pip install -U \
82
+ sentence-transformers \
83
+ transformers \
84
+ faiss-cpu \
85
+ rank-bm25 \
86
+ torch
87
+ ```
88
+
89
+ ---
90
+
91
+ # Quick Start
92
+
93
+ ```python
94
+ from pipeline import ApexRetriever
95
+
96
+ retriever = ApexRetriever(model_dir=".")
97
+
98
+ # Example documents
99
+
100
+ docs = [
101
+ "Python was created by Guido van Rossum.",
102
+ "Paris is the capital of France."
103
+ ]
104
+
105
+ retriever.index_documents(docs)
106
+
107
+ results = retriever.retrieve(
108
+ "Who created Python?",
109
+ top_k=3
110
+ )
111
+
112
+ print(results)
113
+ ```
114
+
115
+ ---
116
+
117
+ # Use Cases
118
+
119
+ * RAG systems
120
+ * Semantic search
121
+ * AI chatbots
122
+ * Knowledge retrieval
123
+ * Local search engines
124
+ * Memory systems
125
+
126
+ ---
127
+
128
+ # Performance
129
+
130
+ Recommended:
131
+
132
+ * CUDA GPU
133
+ * 8GB+ RAM
134
+ * Python 3.10+
135
+
136
+ ---
137
+
138
+ # Components
139
+
140
+ | Component | Model |
141
+ | ------------- | ------------------------------------ |
142
+ | Dense Encoder | BAAI/bge-small-en-v1.5 |
143
+ | Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 |
144
+ | Vector Engine | FAISS |
145
+ | Sparse Search | BM25 |
146
+
147
+ ---
148
+
149
+ # License
150
+
151
+ Apache 2.0
152
+
153
+ ---
154
+ > QuantaSparkLabs