Selennnn commited on
Commit
07449b9
·
verified ·
1 Parent(s): 0413d37

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -6
README.md CHANGED
@@ -1,10 +1,154 @@
1
  ---
 
 
2
  license: apache-2.0
3
- base_model: Qwen/Qwen2.5-0.5B-Instruct
4
  tags:
5
- - text-retrieval
6
- - perplexity-style
7
- - qwen
 
 
 
 
 
 
 
 
 
 
8
  ---
9
- # Rank-Embed-Qwen-0.6B
10
- Fine-tuned for Perplexity-style search retrieval.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ library_name: transformers
6
  tags:
7
+ - feature-extraction
8
+ - sentence-similarity
9
+ - search
10
+ - retrieval
11
+ - ranking
12
+ - embeddings
13
+ - semantic-search
14
+ - bi-encoder
15
+ - qwen
16
+ - pytorch
17
+ model_size: 0.6B
18
+ base_model: Qwen/Qwen2.5-0.5B-Instruct
19
+ pipeline_tag: feature-extraction
20
  ---
21
+
22
+ # Rank-Embed-0.6B
23
+
24
+ Rank-Embed-0.6B is a specialized **bi-encoder** model designed for semantic search and dense retrieval. Instead of relying only on keyword overlap, it maps queries and documents into a shared vector space so they can be compared based on meaning, context, and intent.
25
+
26
+ Built on top of [`Qwen/Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), the model is optimized for retrieval-first workloads such as semantic search, ranking, retrieval-augmented generation, clustering, and duplicate detection. It is compact enough for efficient deployment while retaining the language understanding needed for more complex search tasks.
27
+
28
+ ## Model Summary
29
+
30
+ | Property | Value |
31
+ |----------|-------|
32
+ | Architecture | Bi-encoder / two-tower embedding model |
33
+ | Base model | `Qwen/Qwen2.5-0.5B-Instruct` |
34
+ | Parameters | ~0.6B |
35
+ | Backbone hidden size | 896 |
36
+ | Embedding dimension | 768 |
37
+ | Pooling | Mean pooling |
38
+ | Projection head | `nn.Linear(896, 768)` |
39
+ | Similarity | Cosine similarity over L2-normalized vectors |
40
+ | Framework | PyTorch / Transformers |
41
+ | License | Apache 2.0 |
42
+
43
+ ## Key Capabilities
44
+
45
+ - Dense embedding generation for queries, passages, and documents
46
+ - Semantic search based on meaning rather than exact keyword matching
47
+ - Efficient cosine-similarity retrieval with normalized embeddings
48
+ - Strong support for complex and intent-heavy search queries
49
+ - Practical deployment footprint for production retrieval systems
50
+
51
+ ## What This Model Is
52
+
53
+ Rank-Embed-0.6B is designed to transform text into dense numerical vectors, or embeddings, that capture semantic meaning. In a traditional keyword-based system, retrieval depends on exact lexical overlap. In contrast, this model enables systems to compare text based on intent, topic, and contextual similarity.
54
+
55
+ As a compact retrieval model built on Qwen2.5-0.5B-Instruct, it provides an efficient balance between inference speed and semantic quality. This makes it a strong fit for production search systems that need to serve high-quality results without requiring unnecessarily large infrastructure.
56
+
57
+ Unlike a generative chatbot, Rank-Embed-0.6B is purpose-built for retrieval. Its role is not to generate responses, but to identify, compare, and surface the most relevant pieces of information from a corpus.
58
+
59
+ ## How It Works
60
+
61
+ ### 1. Bi-Encoder Architecture
62
+
63
+ The model uses a two-tower, or bi-encoder, design:
64
+
65
+ - **Query tower**: processes the user's search query
66
+ - **Document tower**: processes candidate documents or passages
67
+ - **Shared objective**: maps both into the same high-dimensional space so relevant pairs are positioned close together
68
+
69
+ In practice, if a document meaningfully answers a query, their embeddings should be near one another in the 768-dimensional representation space.
70
+
71
+ ### 2. Core Components
72
+
73
+ - **Backbone**: the model uses Qwen2.5-0.5B-Instruct as its language backbone, providing strong prior understanding of natural language and complex instruction-like phrasing.
74
+ - **Pooling layer**: because the backbone produces token-level representations, mean pooling is used to aggregate them into a single sentence-level embedding.
75
+ - **Projection head**: a linear projection layer, `nn.Linear(896, 768)`, reduces the backbone hidden size to a 768-dimensional embedding size suitable for vector search systems.
76
+ - **Normalization**: final embeddings are L2-normalized so similarity can be computed efficiently with cosine similarity.
77
+
78
+ ## What It Can Do
79
+
80
+ - **Semantic search**: retrieves relevant content even when the query and document use different wording.
81
+ - **Complex search**: handles nuanced, intent-rich queries where the best result depends on meaning rather than exact phrasing.
82
+ - **Retrieval-augmented generation**: serves as the retrieval layer in RAG systems by surfacing relevant context for downstream language models.
83
+ - **Clustering and organization**: groups documents, tickets, or records by semantic similarity.
84
+ - **Duplicate detection**: identifies differently worded inputs that express the same underlying meaning.
85
+
86
+ ## Quick Start
87
+
88
+ ### Installation
89
+
90
+ ```bash
91
+ pip install transformers torch
92
+ ```
93
+
94
+ ### Basic Usage
95
+
96
+ ```python
97
+ import torch
98
+ from transformers import AutoModel, AutoTokenizer
99
+
100
+ model_id = "GorankLabs/Rank-Embed-0.6B"
101
+
102
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
103
+ model = AutoModel.from_pretrained(
104
+ model_id,
105
+ trust_remote_code=True,
106
+ torch_dtype=torch.bfloat16,
107
+ )
108
+ model.eval()
109
+
110
+ def mean_pool(last_hidden_state, attention_mask):
111
+ mask = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
112
+ return (last_hidden_state * mask).sum(1) / torch.clamp(mask.sum(1), min=1e-9)
113
+
114
+ def embed(texts):
115
+ encoded = tokenizer(
116
+ texts,
117
+ padding=True,
118
+ truncation=True,
119
+ return_tensors="pt",
120
+ )
121
+ with torch.no_grad():
122
+ outputs = model(**encoded)
123
+ embeddings = mean_pool(outputs.last_hidden_state, encoded["attention_mask"])
124
+ return torch.nn.functional.normalize(embeddings, p=2, dim=-1)
125
+
126
+ queries = ["How do I fix a leaky faucet?"]
127
+ documents = [
128
+ "Steps to repair a leaking kitchen faucet at home.",
129
+ "How to replace brake pads on a bicycle.",
130
+ ]
131
+
132
+ query_embeddings = embed(queries)
133
+ document_embeddings = embed(documents)
134
+
135
+ scores = query_embeddings @ document_embeddings.T
136
+ print(scores.tolist())
137
+ ```
138
+
139
+ ## Architecture Notes
140
+
141
+ The model is designed around a retrieval-oriented embedding pipeline:
142
+
143
+ - token-level representations are produced by the Qwen backbone
144
+ - mean pooling converts them into a single sentence representation
145
+ - a learned projection maps the representation into a 768-dimensional embedding space
146
+ - L2 normalization makes the final vectors directly usable for cosine-similarity retrieval
147
+
148
+ This design keeps the model simple, efficient, and well aligned with modern vector database workflows.
149
+
150
+ ## License
151
+
152
+ This model is released under the **Apache License 2.0**.
153
+
154
+ The base model weights are derived from [`Qwen/Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct). Use of this repository must comply with the applicable Qwen license terms in addition to the license for this repository where required.