sensiarion
/

CodeRankEmbed-f16

Feature Extraction

sentence-transformers

Model card Files Files and versions

CodeRankEmbed-f16 / README.md

sensiarion's picture

Upload README.md with huggingface_hub

3a206bd verified about 1 month ago

|

History Blame Contribute Delete

2.46 kB

	---
	base_model:
	- nomic-ai/CodeRankEmbed
	base_model_relation: quantized
	license: mit
	library_name: candle
	tags:
	- code
	- code-retrieval
	- sentence-transformers
	- nomic-bert
	- f16
	- candle
	pipeline_tag: feature-extraction
	---

	# CodeRankEmbed-f16

	An f16 (half-precision) cast of
	[`nomic-ai/CodeRankEmbed`](https://huggingface.co/nomic-ai/CodeRankEmbed)
	— the 137M NomicBert bi-encoder for code retrieval — in `safetensors`,
	for GPU inference (e.g. candle on Apple-Silicon Metal) at roughly half
	the memory of the f32 base.

	This repo is weights only, identical architecture: every tensor is
	the base model cast f32 → f16, tensor names/shapes unchanged. Use it
	exactly like the base model (same `config.json`, `tokenizer.json`, CLS
	pooling, and the required query instruction prefix).

	## Why

	The base repo ships f32 `safetensors` (~547 MB). On the Metal GPU the
	f16 weights halve the working set and matmul bandwidth with no change
	to retrieval quality, so it is the form used by
	[embedding-search](https://github.com/) on Apple Silicon.

	## Validation (f16 vs f32, CodeSearchNet Python, N=300)

	Same code/corpus, dtype the only difference:

	\| dtype \| peak RSS \| MRR@10 \| Recall@1 \|
	\|-------\|----------\|--------\|----------\|
	\| f32 (base) \| 1116 MB \| 0.9573 \| 0.9367 \|
	\| f16 (this) \| 570 MB \| 0.9573 \| 0.9367 \|

	- `cosine(f16, f32)` per-document: mean 0.999998, min 0.999996
	- top-1 retrieval agreement f16 vs f32: 1.0000
	- MRR@10 / Recall@1 deltas: 0.0000

	f16 is numerically a no-op for retrieval at about half the RAM. (The
	absolute MRR is high because the eval uses a small 300-doc distractor
	pool — it is an f16-vs-f32 parity check, not a full-CodeSearchNet
	reproduction of the base model's published score.)

	## Usage

	The query must use the task instruction prefix (same as the base
	model); code/documents get no prefix:

	```
	Represent this query for searching relevant code: <your query>
	```

	CLS-pool the last hidden state and L2-normalize; cosine similarity for
	ranking.

	## Provenance & license

	Produced by a pure dtype cast (CPU, `candle`) of
	`nomic-ai/CodeRankEmbed` `model.safetensors`; `config.json` and
	`tokenizer.json` copied unchanged. Inherits the base model's MIT
	license. Credit and citation belong to the original authors — see the
	[base model card](https://huggingface.co/nomic-ai/CodeRankEmbed) and
	the CoRNStack paper ([arXiv:2412.01007](https://arxiv.org/abs/2412.01007)).