HCAE-21M-v1.1-Instruct: Technical Specification

HCAE-21M-v1.1-Instruct is an instruction-tuned variant of the Hybrid Convolutional-Attention Encoder (HCAE). It is specifically engineered for asymmetric retrieval tasks and domain-intensive semantic analysis (e.g., scientific and medical corpora). By leveraging a symmetric 4+4 hybrid architecture, it maintains the efficiency required for edge deployment while achieving competitive performance on complex MTEB benchmarks.

Technical Abstract

Transitioning from v1.0, the Instruct variant in v1.1 utilizes multi-stage fine-tuning on NLI and specialized domain datasets (SciFact, Med-Tech). Structural refinements include the adoption of LayerScale (gating) and SwiGLU activation functions, which collectively improve the model's ability to delineate complex semantic boundaries in zero-shot retrieval scenarios.

Architecture: Symmetric 4+4 configuration (Depthwise Separable Convolutions / Multi-head Self-Attention).
Optimization: Multi-stage fine-tuning using Instruct-NLI, SciFact, and specialized technical datasets.
Parameters: 21.1M
Dimensions: 384
Instruction Support: Full support for query: and passage: instruction prefixes.

Benchmark Results (MTEB v2)

Task	Metric	Value
STSBenchmark	Spearman Correlation	0.656
SciFact	NDCG@10	0.413
SciFact	Recall@10	0.523

Usage

Retrieval Tasks

For optimal performance in retrieval tasks, it is recommended to use the following prefixes:

Query: query: [Your Question]
Corpus: passage: [Content Paragraph]

Implementation

from transformers import AutoModel, AutoTokenizer
import torch

model_name = "HeavensHackDev/HCAE-21M-v1.1-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)

queries = ["query: What are the primary applications of HCAE?"]
passages = ["passage: HCAE is effectively used in semantic retrieval and information extraction."]

inputs_q = tokenizer(queries, padding=True, truncation=True, return_tensors="pt")
inputs_p = tokenizer(passages, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    query_embeddings = model(**inputs_q)
    passage_embeddings = model(**inputs_p)

ONNX Inference

The model is also available in ONNX format for efficient edge deployment.

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model.onnx")

# Note: Always include instruction prefixes in in your text processing
# inputs = tokenizer(["query: your text"], ...)
inputs = {
    "input_ids": np.random.randint(0, 30522, (1, 128), dtype=np.int64),
    "attention_mask": np.ones((1, 128), dtype=np.int64)
}

outputs = session.run(None, inputs)

License

This model is licensed under the Apache License 2.0.

Downloads last month: 1

Safetensors

Model size

24.4M params

Tensor type

F32

Collection including HeavensHackDev/HCAE-21M-v1.1-Instruct

HCAE

Collection

HCAE (Hybrid Convolutional-Attention Encoder) is a next-generation family of lightweight text embedding models designed for extreme efficiency wit • 4 items • Updated Apr 9