--- base_model: - nomic-ai/CodeRankEmbed base_model_relation: quantized license: mit library_name: candle tags: - code - code-retrieval - sentence-transformers - nomic-bert - f16 - candle pipeline_tag: feature-extraction --- # CodeRankEmbed-f16 An **f16** (half-precision) cast of [`nomic-ai/CodeRankEmbed`](https://huggingface.co/nomic-ai/CodeRankEmbed) — the 137M NomicBert bi-encoder for code retrieval — in `safetensors`, for GPU inference (e.g. candle on Apple-Silicon Metal) at roughly half the memory of the f32 base. This repo is **weights only, identical architecture**: every tensor is the base model cast f32 → f16, tensor names/shapes unchanged. Use it exactly like the base model (same `config.json`, `tokenizer.json`, CLS pooling, and the required query instruction prefix). ## Why The base repo ships f32 `safetensors` (~547 MB). On the Metal GPU the f16 weights halve the working set and matmul bandwidth with no change to retrieval quality, so it is the form used by [embedding-search](https://github.com/) on Apple Silicon. ## Validation (f16 vs f32, CodeSearchNet Python, N=300) Same code/corpus, dtype the only difference: | dtype | peak RSS | MRR@10 | Recall@1 | |-------|----------|--------|----------| | f32 (base) | 1116 MB | 0.9573 | 0.9367 | | **f16 (this)** | **570 MB** | **0.9573** | **0.9367** | - `cosine(f16, f32)` per-document: **mean 0.999998, min 0.999996** - top-1 retrieval agreement f16 vs f32: **1.0000** - MRR@10 / Recall@1 deltas: **0.0000** f16 is numerically a no-op for retrieval at about half the RAM. (The absolute MRR is high because the eval uses a small 300-doc distractor pool — it is an f16-vs-f32 *parity* check, not a full-CodeSearchNet reproduction of the base model's published score.) ## Usage The query **must** use the task instruction prefix (same as the base model); code/documents get no prefix: ``` Represent this query for searching relevant code: ``` CLS-pool the last hidden state and L2-normalize; cosine similarity for ranking. ## Provenance & license Produced by a pure dtype cast (CPU, `candle`) of `nomic-ai/CodeRankEmbed` `model.safetensors`; `config.json` and `tokenizer.json` copied unchanged. Inherits the base model's **MIT** license. Credit and citation belong to the original authors — see the [base model card](https://huggingface.co/nomic-ai/CodeRankEmbed) and the CoRNStack paper ([arXiv:2412.01007](https://arxiv.org/abs/2412.01007)).