Read the disclaimer below before using this model.


harrier-oss-v1-0.6b -- ONNX for Teradata BYOM

This repository hosts an ONNX-converted version of the upstream model microsoft/harrier-oss-v1-0.6b, packaged for the Teradata Vantage mldb.ONNXEmbeddings BYOM function. It is not the original PyTorch model -- only the inference graph and tokenizer needed for in-database embedding generation.

What's different from upstream:

  • Format: ONNX (opset 14, IR version 8 -- BYOM 6+ compatible), produced from the upstream weights with architecture-aware post-processing baked in.
  • Precision: dynamic int8 quantization. See the variants table below for what is shipped for this model.
  • Pooling and post-processing: this graph emits the raw sentence_embedding tensor. Pooling rule is last_token and the model expects a query-time instruction prefix (see "Instruction prefix" below).
  • Verification: every variant's cosine fidelity vs. the upstream PyTorch reference is recorded on a fixed FLORES-200 sample. Numbers may not generalize to your data.

Model details

Upstream repo microsoft/harrier-oss-v1-0.6b
Architecture Qwen3Model (decoder)
Parameters 596,049,920
Output dimensions 1024
Pooling last_token
Instruction prefix yes
Max input tokens (advertised) 32768
Languages 93 (af, am, ar, +90 more (93))
License mit
ONNX opset 14
ONNX IR version 8 (BYOM 6+ compatible)
Full language list (93)
  • af
  • am
  • ar
  • as
  • az
  • be
  • bg
  • bn
  • br
  • bs
  • ca
  • cs
  • cy
  • da
  • de
  • el
  • en
  • eo
  • es
  • et
  • eu
  • fa
  • fi
  • fr
  • fy
  • ga
  • gd
  • gl
  • gu
  • ha
  • he
  • hi
  • hr
  • hu
  • hy
  • id
  • is
  • it
  • ja
  • jv
  • ka
  • kk
  • km
  • kn
  • ko
  • ku
  • ky
  • la
  • lo
  • lt
  • lv
  • mg
  • mk
  • ml
  • mn
  • mr
  • ms
  • my
  • ne
  • nl
  • no
  • om
  • or
  • pa
  • pl
  • ps
  • pt
  • ro
  • ru
  • sa
  • sd
  • si
  • sk
  • sl
  • so
  • sq
  • sr
  • su
  • sv
  • sw
  • ta
  • te
  • th
  • tl
  • tr
  • ug
  • uk
  • ur
  • uz
  • vi
  • xh
  • yi
  • zh

Instruction prefix

This model was trained to expect a short natural-language instruction prepended to each query at encode time. Document side stays unprefixed. The ONNX graph itself is prefix-agnostic -- the prefix is plain text that flows through the tokenizer. Downstream BYOM SQL is responsible for prepending it (typically with a CTE that concatenates the instruction with each input row).

The upstream model card configures these named prompts (snapshot at publish time -- see the upstream model card for the canonical list and any updates):

  • web_search_query -- web-search query retrieval
  • sts_query -- semantic textual similarity (STS)
  • bitext_query -- bitext mining / cross-lingual retrieval

You can also pass an ad-hoc instruction in the same Instruct: ... \nQuery: ... shape; for example, Instruct: Retrieve semantically similar text\nQuery: <your text>. The canonical and most up-to-date list lives in the upstream model card's config_sentence_transformers.json -- see microsoft/harrier-oss-v1-0.6b.

Quantization variants

This repository ships the following variants. Quality numbers are measured against the upstream PyTorch reference on a fixed FLORES-200 sample. The Size column is the on-disk size of the ONNX weight file in megabytes (MB, 10^6 bytes).

Variant Size (MB) p50 cosine R@1
ffn_skip 1391.8 0.997669 0.958

How to read the quality columns:

  • p50 cosine is the median cosine similarity between this variant's embeddings and the fp32 ONNX reference, computed over a fixed evaluation set. Higher means closer to the unquantized model; 1.0 is identical.
  • R@1 is top-1 retrieval consistency: if you use this variant as a search index, R@1 is the fraction of queries that get the same nearest neighbor as the fp32 reference would. Higher is better.

Notes:

  • ffn_skip: dynamic int8 with the feed-forward (FFN) MatMul layers kept in fp32, while attention and projection MatMuls stay quantized. The FFN layers are where most of the quantization error in transformer blocks concentrates; leaving them in fp32 recovers most of the quality loss for a modest size increase. The artifact is roughly 3x smaller than fp32 (larger than the per_channel int8 sibling). Ship this variant when retrieval quality is the priority and the per_channel drift on your workload is unacceptable.

Quickstart: using this model with Teradata BYOM

Requires Teradata Vantage with BYOM 6+ (mldb.ONNXEmbeddings).

import getpass
import teradataml as tdml
from huggingface_hub import hf_hub_download

repo_id   = "Teradata/harrier-oss-v1-0.6b"
model_id  = "harrier-oss-v1-0.6b"        # arbitrary, used as the BYOM model_id
onnx_file = "onnx/model-ffn_skip.onnx"

# 1. Download the ONNX + tokenizer for the chosen variant.
hf_hub_download(repo_id=repo_id, filename=onnx_file,       local_dir="./")
hf_hub_download(repo_id=repo_id, filename="tokenizer.json", local_dir="./")

# 2. Connect to Vantage.
tdml.create_context(
    host=input("host: "),
    username=input("user: "),
    password=getpass.getpass("password: "),
)

# 3. Load model + tokenizer into BYOM tables (one-time per model_id).
tdml.save_byom(model_id=model_id, model_file=onnx_file,
               table_name="embeddings_models")
tdml.save_byom(model_id=model_id, model_file="tokenizer.json",
               table_name="embeddings_tokenizers")

Then call mldb.ONNXEmbeddings against an input table whose txt column carries the strings to embed:

SELECT *
FROM mldb.ONNXEmbeddings(
    ON (SELECT id, txt FROM your_input_table) AS InputTable
    ON (SELECT model_id, model FROM embeddings_models
         WHERE model_id = 'harrier-oss-v1-0.6b') AS ModelTable DIMENSION
    ON (SELECT model_id, tokenizer FROM embeddings_tokenizers
         WHERE model_id = 'harrier-oss-v1-0.6b') AS TokenizerTable DIMENSION
    USING
        Accumulate('id')
        ModelOutputTensor('sentence_embedding')
        OutputFormat('FLOAT32(1024)')
        OverwriteCachedModel('*')
) AS t
ORDER BY id;

Pooling rule last_token is applied inside the converted ONNX graph -- the output tensor named above already contains the pooled, post-processed embedding vector. For instruction-prefix models, prepend the recommended instruction text to each input txt before calling ONNXEmbeddings; the prefix is plain text that the tokenizer handles unchanged.

Original model attribution

The original weights and training methodology belong to Microsoft. Please cite their work, not this repository, in academic contexts. The canonical upstream model card is at microsoft/harrier-oss-v1-0.6b; refer to it for benchmarks, training details, intended use, and citation information.

Reporting issues

For ONNX-conversion or BYOM-compatibility issues specific to this Teradata-converted artifact, please open a Discussion on this model's Hugging Face page. Questions about the underlying model quality, training, or intended use should go to the upstream maintainer's model card.


DISCLAIMER: The content herein ("Content") is provided "AS IS" and is not covered by any Teradata Operations, Inc. and its affiliates ("Teradata") agreements. Its listing here does not constitute certification or endorsement by Teradata.

To the extent any of the Content contains or is related to any artificial intelligence ("AI") or other language learning models ("Models") that interoperate with the products and services of Teradata, by accessing, bringing, deploying or using such Models, you acknowledge and agree that you are solely responsible for ensuring compliance with all applicable laws, regulations, and restrictions governing the use, deployment, and distribution of AI technologies. This includes, but is not limited to, AI Diffusion Rules, European Union AI Act, AI-related laws and regulations, privacy laws, export controls, and financial or sector-specific regulations.

While Teradata may provide support, guidance, or assistance in the deployment or implementation of Models to interoperate with Teradata's products and/or services, you remain fully responsible for ensuring that your Models, data, and applications comply with all relevant legal and regulatory obligations. Our assistance does not constitute legal or regulatory approval, and Teradata disclaims any liability arising from non-compliance with applicable laws.

You must determine the suitability of the Models for any purpose. Given the probabilistic nature of machine learning and modeling, the use of the Models may in some situations result in incorrect output that does not accurately reflect the action generated. You should evaluate the accuracy of any output as appropriate for your use case, including by using human review of the output.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Teradata/harrier-oss-v1-0.6b

Quantized
(14)
this model

Collection including Teradata/harrier-oss-v1-0.6b