---
base_model:
- nomic-ai/CodeRankEmbed
base_model_relation: quantized
license: mit
library_name: candle
tags:
- code
- code-retrieval
- sentence-transformers
- nomic-bert
- f16
- candle
pipeline_tag: feature-extraction
---

# CodeRankEmbed-f16

An **f16** (half-precision) cast of
[`nomic-ai/CodeRankEmbed`](https://huggingface.co/nomic-ai/CodeRankEmbed)
— the 137M NomicBert bi-encoder for code retrieval — in `safetensors`,
for GPU inference (e.g. candle on Apple-Silicon Metal) at roughly half
the memory of the f32 base.

This repo is **weights only, identical architecture**: every tensor is
the base model cast f32 → f16, tensor names/shapes unchanged. Use it
exactly like the base model (same `config.json`, `tokenizer.json`, CLS
pooling, and the required query instruction prefix).

## Why

The base repo ships f32 `safetensors` (~547 MB). On the Metal GPU the
f16 weights halve the working set and matmul bandwidth with no change
to retrieval quality, so it is the form used by
[embedding-search](https://github.com/) on Apple Silicon.

## Validation (f16 vs f32, CodeSearchNet Python, N=300)

Same code/corpus, dtype the only difference:

| dtype | peak RSS | MRR@10 | Recall@1 |
|-------|----------|--------|----------|
| f32 (base) | 1116 MB | 0.9573 | 0.9367 |
| **f16 (this)** | **570 MB** | **0.9573** | **0.9367** |

- `cosine(f16, f32)` per-document: **mean 0.999998, min 0.999996**
- top-1 retrieval agreement f16 vs f32: **1.0000**
- MRR@10 / Recall@1 deltas: **0.0000**

f16 is numerically a no-op for retrieval at about half the RAM. (The
absolute MRR is high because the eval uses a small 300-doc distractor
pool — it is an f16-vs-f32 *parity* check, not a full-CodeSearchNet
reproduction of the base model's published score.)

## Usage

The query **must** use the task instruction prefix (same as the base
model); code/documents get no prefix:

```
Represent this query for searching relevant code: <your query>
```

CLS-pool the last hidden state and L2-normalize; cosine similarity for
ranking.

## Provenance & license

Produced by a pure dtype cast (CPU, `candle`) of
`nomic-ai/CodeRankEmbed` `model.safetensors`; `config.json` and
`tokenizer.json` copied unchanged. Inherits the base model's **MIT**
license. Credit and citation belong to the original authors — see the
[base model card](https://huggingface.co/nomic-ai/CodeRankEmbed) and
the CoRNStack paper ([arXiv:2412.01007](https://arxiv.org/abs/2412.01007)).