UniVec: openai-text-embedding-3-large -> gemini-text-embedding-004

Converts embedding vectors from openai-text-embedding-3-large (3072d) to gemini-text-embedding-004 (768d) without re-embedding the source text.

This is a vector conversion model. It takes pre-computed embeddings as input and writes out equivalent embeddings in the target model's vector space, preserving retrieval order. It does not process text. For an end-to-end text -> target-vector pipeline, pair it with the source model or use the hosted API at https://univec.ai.

What is vector conversion?

A corpus embedded with a particular model is bound to that model's vector space: queries must be encoded by the same model for nearest-neighbour search to remain meaningful. Migrating to a different embedder (whether driven by deprecation, an upgrade or a provider change) normally requires re-embedding every document. The cost scales with corpus size and recurs each time the underlying model changes.

A conversion model takes pre-computed source-space vectors and outputs target-space vectors. The training objective is retrieval-order preservation: top-K nearest neighbours in the converted space should align with top-K in the target space despite differences in dimensionality, distance distribution and noise structure.

When to use this

A corpus is already embedded with openai-text-embedding-3-large.
The target use case calls for gemini-text-embedding-004 (better quality, model deprecation, multi-vendor strategy or cost).
Re-embedding is impractical at scale: cost, time, rate limits or loss of access to the original provider.

For embedding new text from scratch this isn't the right tool. Go straight to gemini-text-embedding-004 instead.

Evaluation

Metrics measured on a held-out eval split, comparing converted vectors against ground-truth gemini-text-embedding-004 embeddings of the same texts.

Metric	Value	Description
MRR	1.0000	Mean reciprocal rank against the target-space corpus
P@1	1.0000	Top-1 retrieval precision
P@5	1.0000	Top-5 retrieval precision
P@10	1.0000	Top-10 retrieval precision
Cosine (mean)	0.8986	Mean cosine similarity to the ground-truth target embedding
Cosine (median)	0.9030	Median cosine similarity
Cosine (std)	0.0347	Standard deviation across the eval set
Kendall tau	0.6560	Rank correlation of pairwise similarities

MRR and P@K measure retrieval quality, which is what downstream search and RAG depend on. Cosine values report how close each converted vector sits to its ground-truth target embedding in the target space.

Training data

Field	Value
Training pairs	609,584
Held-out eval pairs	67,732

Inputs and outputs are unit-normalized 2D arrays with shape (batch, dim). The ONNX file is s2t.direct.inn.openai-text-embedding-3-large.gemini-text-embedding-004.onnx.

Quick start

Install dependencies

Via uv package manager:

# 1. Install uv (one-time, skip if already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh           # Linux / macOS
# brew install uv                                          # macOS via Homebrew
# powershell -c "irm https://astral.sh/uv/install.ps1 | iex"   # Windows

# 2. Create an isolated venv for this model
uv venv
source .venv/bin/activate                                  # Linux / macOS
# .venv\Scripts\activate                                   # Windows

# 3. Install the CPU inference dependencies pinned in requirements.txt
uv pip install -r requirements.txt

For NVIDIA GPU inference, swap the runtime. Don't install both, they conflict.

uv pip install onnxruntime-gpu numpy

GPU inference needs a working CUDA + cuDNN runtime on the host. The ONNX Runtime CUDA compatibility matrix lists which versions go together.

Plain pip:

pip install -r requirements.txt

Run inference

import numpy as np
import onnxruntime as ort

session = ort.InferenceSession(
    "s2t.direct.inn.openai-text-embedding-3-large.gemini-text-embedding-004.onnx",
    providers=["CPUExecutionProvider"],  # or ["CUDAExecutionProvider", "CPUExecutionProvider"]
)
input_name = session.get_inputs()[0].name

# openai-text-embedding-3-large embeddings, shape (N, 3072)
embeddings = np.random.randn(8, 3072).astype(np.float32)
embeddings /= np.linalg.norm(embeddings, axis=1, keepdims=True)

converted = session.run(None, {input_name: embeddings})[0]
converted /= np.linalg.norm(converted, axis=1, keepdims=True)

# converted has shape (N, 768) in gemini-text-embedding-004 space.
print(converted.shape)

For batching, GPU execution and .npy / .jsonl file IO, use the companion script univec_inference.py published alongside this model. The requirements.txt file in this repo pins the inference dependencies.

Reproduce or verify metrics

A self-contained evaluate.py is included in this repo. It runs the converter against a paired evaluation dataset and reports the same metrics shown in the table above (cosine, MRR, P@K, Kendall tau). Useful for verifying the published numbers or measuring quality on a different corpus.

The expected dataset is JSONL, one record per line, each holding both a source-space and a target-space embedding of the same text:

{"embeddings": {"openai-text-embedding-3-large": [/* 3072-d vector */], "gemini-text-embedding-004": [/* 768-d vector */]}}

Then run:

python evaluate.py \
  --model s2t.direct.inn.openai-text-embedding-3-large.gemini-text-embedding-004.onnx \
  --dataset eval.jsonl \
  --source openai-text-embedding-3-large \
  --target gemini-text-embedding-004 \
  --output metrics.json

Useful flags:

Flag	Default	Purpose
`--max-samples N`	all	cap the number of pairs evaluated (handy for very large eval files)
`--device {auto,cuda,cpu}`	auto	pick the ONNX execution provider
`--batch-size N`	1024	inference batch size
`--num-anchors N`	1024	number of query anchors used for MRR / P@K
`--kendall-subset N`	2048	sample size for Kendall tau pairwise rank correlation
`--seed N`	0	deterministic sampling seed
`--output FILE.json`	none	write metrics to JSON for downstream comparison

The script prints a summary table and writes the same numbers to JSON if --output is set. Without scipy installed, Kendall tau is skipped and the other metrics are still reported.

Limitations

One-way mapping. The reverse direction (gemini-text-embedding-004 -> openai-text-embedding-3-large) requires the corresponding reverse model.
Quality on out-of-distribution data (specialised jargon, languages outside the training mix) can drift away from the eval numbers above. Spot-check on real data before migrating production traffic.
Inputs are assumed to be unit-normalized. Stored embeddings that aren't normalized should be normalized first.

Production use

This release is one of several public conversion pairs published under Apache 2.0. The full UniVec catalog covers around 100 source/target pairs and includes bridge conversions (routing through an intermediary model when no direct pair is trained). Managed inference, batch processing and additional pairs live at https://univec.ai.

License

Apache 2.0. The weights are free to self-host, redistribute and use commercially.

Citation

@misc{univec2026,
  author = {UniVec},
  title  = {UniVec: Embedding interoperability for retrieval tasks},
  year   = {2026},
  url    = {https://univec.ai}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support