CodeRankEmbed-f16 / README.md
sensiarion's picture
Upload README.md with huggingface_hub
3a206bd verified
|
Raw
History Blame Contribute Delete
2.46 kB
---
base_model:
- nomic-ai/CodeRankEmbed
base_model_relation: quantized
license: mit
library_name: candle
tags:
- code
- code-retrieval
- sentence-transformers
- nomic-bert
- f16
- candle
pipeline_tag: feature-extraction
---
# CodeRankEmbed-f16
An **f16** (half-precision) cast of
[`nomic-ai/CodeRankEmbed`](https://huggingface.co/nomic-ai/CodeRankEmbed)
β€” the 137M NomicBert bi-encoder for code retrieval β€” in `safetensors`,
for GPU inference (e.g. candle on Apple-Silicon Metal) at roughly half
the memory of the f32 base.
This repo is **weights only, identical architecture**: every tensor is
the base model cast f32 β†’ f16, tensor names/shapes unchanged. Use it
exactly like the base model (same `config.json`, `tokenizer.json`, CLS
pooling, and the required query instruction prefix).
## Why
The base repo ships f32 `safetensors` (~547 MB). On the Metal GPU the
f16 weights halve the working set and matmul bandwidth with no change
to retrieval quality, so it is the form used by
[embedding-search](https://github.com/) on Apple Silicon.
## Validation (f16 vs f32, CodeSearchNet Python, N=300)
Same code/corpus, dtype the only difference:
| dtype | peak RSS | MRR@10 | Recall@1 |
|-------|----------|--------|----------|
| f32 (base) | 1116 MB | 0.9573 | 0.9367 |
| **f16 (this)** | **570 MB** | **0.9573** | **0.9367** |
- `cosine(f16, f32)` per-document: **mean 0.999998, min 0.999996**
- top-1 retrieval agreement f16 vs f32: **1.0000**
- MRR@10 / Recall@1 deltas: **0.0000**
f16 is numerically a no-op for retrieval at about half the RAM. (The
absolute MRR is high because the eval uses a small 300-doc distractor
pool β€” it is an f16-vs-f32 *parity* check, not a full-CodeSearchNet
reproduction of the base model's published score.)
## Usage
The query **must** use the task instruction prefix (same as the base
model); code/documents get no prefix:
```
Represent this query for searching relevant code: <your query>
```
CLS-pool the last hidden state and L2-normalize; cosine similarity for
ranking.
## Provenance & license
Produced by a pure dtype cast (CPU, `candle`) of
`nomic-ai/CodeRankEmbed` `model.safetensors`; `config.json` and
`tokenizer.json` copied unchanged. Inherits the base model's **MIT**
license. Credit and citation belong to the original authors β€” see the
[base model card](https://huggingface.co/nomic-ai/CodeRankEmbed) and
the CoRNStack paper ([arXiv:2412.01007](https://arxiv.org/abs/2412.01007)).