Feature Extraction
Safetensors
sentence-transformers
candle
nomic_bert
code
code-retrieval
nomic-bert
f16
custom_code
Instructions to use sensiarion/CodeRankEmbed-f16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use sensiarion/CodeRankEmbed-f16 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("sensiarion/CodeRankEmbed-f16", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
| base_model: | |
| - nomic-ai/CodeRankEmbed | |
| base_model_relation: quantized | |
| license: mit | |
| library_name: candle | |
| tags: | |
| - code | |
| - code-retrieval | |
| - sentence-transformers | |
| - nomic-bert | |
| - f16 | |
| - candle | |
| pipeline_tag: feature-extraction | |
| # CodeRankEmbed-f16 | |
| An **f16** (half-precision) cast of | |
| [`nomic-ai/CodeRankEmbed`](https://huggingface.co/nomic-ai/CodeRankEmbed) | |
| β the 137M NomicBert bi-encoder for code retrieval β in `safetensors`, | |
| for GPU inference (e.g. candle on Apple-Silicon Metal) at roughly half | |
| the memory of the f32 base. | |
| This repo is **weights only, identical architecture**: every tensor is | |
| the base model cast f32 β f16, tensor names/shapes unchanged. Use it | |
| exactly like the base model (same `config.json`, `tokenizer.json`, CLS | |
| pooling, and the required query instruction prefix). | |
| ## Why | |
| The base repo ships f32 `safetensors` (~547 MB). On the Metal GPU the | |
| f16 weights halve the working set and matmul bandwidth with no change | |
| to retrieval quality, so it is the form used by | |
| [embedding-search](https://github.com/) on Apple Silicon. | |
| ## Validation (f16 vs f32, CodeSearchNet Python, N=300) | |
| Same code/corpus, dtype the only difference: | |
| | dtype | peak RSS | MRR@10 | Recall@1 | | |
| |-------|----------|--------|----------| | |
| | f32 (base) | 1116 MB | 0.9573 | 0.9367 | | |
| | **f16 (this)** | **570 MB** | **0.9573** | **0.9367** | | |
| - `cosine(f16, f32)` per-document: **mean 0.999998, min 0.999996** | |
| - top-1 retrieval agreement f16 vs f32: **1.0000** | |
| - MRR@10 / Recall@1 deltas: **0.0000** | |
| f16 is numerically a no-op for retrieval at about half the RAM. (The | |
| absolute MRR is high because the eval uses a small 300-doc distractor | |
| pool β it is an f16-vs-f32 *parity* check, not a full-CodeSearchNet | |
| reproduction of the base model's published score.) | |
| ## Usage | |
| The query **must** use the task instruction prefix (same as the base | |
| model); code/documents get no prefix: | |
| ``` | |
| Represent this query for searching relevant code: <your query> | |
| ``` | |
| CLS-pool the last hidden state and L2-normalize; cosine similarity for | |
| ranking. | |
| ## Provenance & license | |
| Produced by a pure dtype cast (CPU, `candle`) of | |
| `nomic-ai/CodeRankEmbed` `model.safetensors`; `config.json` and | |
| `tokenizer.json` copied unchanged. Inherits the base model's **MIT** | |
| license. Credit and citation belong to the original authors β see the | |
| [base model card](https://huggingface.co/nomic-ai/CodeRankEmbed) and | |
| the CoRNStack paper ([arXiv:2412.01007](https://arxiv.org/abs/2412.01007)). | |