Update README.md

Browse files

Files changed (1) hide show

README.md +150 -6

README.md CHANGED Viewed

@@ -1,10 +1,154 @@
 ---
 license: apache-2.0
-base_model: Qwen/Qwen2.5-0.5B-Instruct
 tags:
-- text-retrieval
-- perplexity-style
-- qwen
 ---
-# Rank-Embed-Qwen-0.6B
-Fine-tuned for Perplexity-style search retrieval.

 ---
+language:
+  - en
 license: apache-2.0
+library_name: transformers
 tags:
+  - feature-extraction
+  - sentence-similarity
+  - search
+  - retrieval
+  - ranking
+  - embeddings
+  - semantic-search
+  - bi-encoder
+  - qwen
+  - pytorch
+model_size: 0.6B
+base_model: Qwen/Qwen2.5-0.5B-Instruct
+pipeline_tag: feature-extraction
 ---
+# Rank-Embed-0.6B
+Rank-Embed-0.6B is a specialized **bi-encoder** model designed for semantic search and dense retrieval. Instead of relying only on keyword overlap, it maps queries and documents into a shared vector space so they can be compared based on meaning, context, and intent.
+Built on top of [`Qwen/Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), the model is optimized for retrieval-first workloads such as semantic search, ranking, retrieval-augmented generation, clustering, and duplicate detection. It is compact enough for efficient deployment while retaining the language understanding needed for more complex search tasks.
+## Model Summary
+| Property | Value |
+|----------|-------|
+| Architecture | Bi-encoder / two-tower embedding model |
+| Base model | `Qwen/Qwen2.5-0.5B-Instruct` |
+| Parameters | ~0.6B |
+| Backbone hidden size | 896 |
+| Embedding dimension | 768 |
+| Pooling | Mean pooling |
+| Projection head | `nn.Linear(896, 768)` |
+| Similarity | Cosine similarity over L2-normalized vectors |
+| Framework | PyTorch / Transformers |
+| License | Apache 2.0 |
+## Key Capabilities
+- Dense embedding generation for queries, passages, and documents
+- Semantic search based on meaning rather than exact keyword matching
+- Efficient cosine-similarity retrieval with normalized embeddings
+- Strong support for complex and intent-heavy search queries
+- Practical deployment footprint for production retrieval systems
+## What This Model Is
+Rank-Embed-0.6B is designed to transform text into dense numerical vectors, or embeddings, that capture semantic meaning. In a traditional keyword-based system, retrieval depends on exact lexical overlap. In contrast, this model enables systems to compare text based on intent, topic, and contextual similarity.
+As a compact retrieval model built on Qwen2.5-0.5B-Instruct, it provides an efficient balance between inference speed and semantic quality. This makes it a strong fit for production search systems that need to serve high-quality results without requiring unnecessarily large infrastructure.
+Unlike a generative chatbot, Rank-Embed-0.6B is purpose-built for retrieval. Its role is not to generate responses, but to identify, compare, and surface the most relevant pieces of information from a corpus.
+## How It Works
+### 1. Bi-Encoder Architecture
+The model uses a two-tower, or bi-encoder, design:
+- **Query tower**: processes the user's search query
+- **Document tower**: processes candidate documents or passages
+- **Shared objective**: maps both into the same high-dimensional space so relevant pairs are positioned close together
+In practice, if a document meaningfully answers a query, their embeddings should be near one another in the 768-dimensional representation space.
+### 2. Core Components
+- **Backbone**: the model uses Qwen2.5-0.5B-Instruct as its language backbone, providing strong prior understanding of natural language and complex instruction-like phrasing.
+- **Pooling layer**: because the backbone produces token-level representations, mean pooling is used to aggregate them into a single sentence-level embedding.
+- **Projection head**: a linear projection layer, `nn.Linear(896, 768)`, reduces the backbone hidden size to a 768-dimensional embedding size suitable for vector search systems.
+- **Normalization**: final embeddings are L2-normalized so similarity can be computed efficiently with cosine similarity.
+## What It Can Do
+- **Semantic search**: retrieves relevant content even when the query and document use different wording.
+- **Complex search**: handles nuanced, intent-rich queries where the best result depends on meaning rather than exact phrasing.
+- **Retrieval-augmented generation**: serves as the retrieval layer in RAG systems by surfacing relevant context for downstream language models.
+- **Clustering and organization**: groups documents, tickets, or records by semantic similarity.
+- **Duplicate detection**: identifies differently worded inputs that express the same underlying meaning.
+## Quick Start
+### Installation
+```bash
+pip install transformers torch
+```
+### Basic Usage
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+model_id = "GorankLabs/Rank-Embed-0.6B"
+tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
+model = AutoModel.from_pretrained(
+    model_id,
+    trust_remote_code=True,
+    torch_dtype=torch.bfloat16,
+)
+model.eval()
+def mean_pool(last_hidden_state, attention_mask):
+    mask = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
+    return (last_hidden_state * mask).sum(1) / torch.clamp(mask.sum(1), min=1e-9)
+def embed(texts):
+    encoded = tokenizer(
+        texts,
+        padding=True,
+        truncation=True,
+        return_tensors="pt",
+    )
+    with torch.no_grad():
+        outputs = model(**encoded)
+    embeddings = mean_pool(outputs.last_hidden_state, encoded["attention_mask"])
+    return torch.nn.functional.normalize(embeddings, p=2, dim=-1)
+queries = ["How do I fix a leaky faucet?"]
+documents = [
+    "Steps to repair a leaking kitchen faucet at home.",
+    "How to replace brake pads on a bicycle.",
+]
+query_embeddings = embed(queries)
+document_embeddings = embed(documents)
+scores = query_embeddings @ document_embeddings.T
+print(scores.tolist())
+```
+## Architecture Notes
+The model is designed around a retrieval-oriented embedding pipeline:
+- token-level representations are produced by the Qwen backbone
+- mean pooling converts them into a single sentence representation
+- a learned projection maps the representation into a 768-dimensional embedding space
+- L2 normalization makes the final vectors directly usable for cosine-similarity retrieval
+This design keeps the model simple, efficient, and well aligned with modern vector database workflows.
+## License
+This model is released under the **Apache License 2.0**.
+The base model weights are derived from [`Qwen/Qwen2.5-0.5B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct). Use of this repository must comply with the applicable Qwen license terms in addition to the license for this repository where required.