HCAE-21M-v1.1-Instruct: Technical Specification

Hugging Face

HCAE-21M-v1.1-Instruct is an instruction-tuned variant of the Hybrid Convolutional-Attention Encoder (HCAE). It is specifically engineered for asymmetric retrieval tasks and domain-intensive semantic analysis (e.g., scientific and medical corpora). By leveraging a symmetric 4+4 hybrid architecture, it maintains the efficiency required for edge deployment while achieving competitive performance on complex MTEB benchmarks.

Technical Abstract

Transitioning from v1.0, the Instruct variant in v1.1 utilizes multi-stage fine-tuning on NLI and specialized domain datasets (SciFact, Med-Tech). Structural refinements include the adoption of LayerScale (gating) and SwiGLU activation functions, which collectively improve the model's ability to delineate complex semantic boundaries in zero-shot retrieval scenarios.

  • Architecture: Symmetric 4+4 configuration (Depthwise Separable Convolutions / Multi-head Self-Attention).
  • Optimization: Multi-stage fine-tuning using Instruct-NLI, SciFact, and specialized technical datasets.
  • Parameters: 21.1M
  • Dimensions: 384
  • Instruction Support: Full support for query: and passage: instruction prefixes.

Benchmark Results (MTEB v2)

Task Metric Value
STSBenchmark Spearman Correlation 0.656
SciFact NDCG@10 0.413
SciFact Recall@10 0.523

Usage

Retrieval Tasks

For optimal performance in retrieval tasks, it is recommended to use the following prefixes:

  • Query: query: [Your Question]
  • Corpus: passage: [Content Paragraph]

Implementation

from transformers import AutoModel, AutoTokenizer
import torch

model_name = "HeavensHackDev/HCAE-21M-v1.1-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)

queries = ["query: What are the primary applications of HCAE?"]
passages = ["passage: HCAE is effectively used in semantic retrieval and information extraction."]

inputs_q = tokenizer(queries, padding=True, truncation=True, return_tensors="pt")
inputs_p = tokenizer(passages, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    query_embeddings = model(**inputs_q)
    passage_embeddings = model(**inputs_p)

ONNX Inference

The model is also available in ONNX format for efficient edge deployment.

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("model.onnx")

# Note: Always include instruction prefixes in in your text processing
# inputs = tokenizer(["query: your text"], ...)
inputs = {
    "input_ids": np.random.randint(0, 30522, (1, 128), dtype=np.int64),
    "attention_mask": np.ones((1, 128), dtype=np.int64)
}

outputs = session.run(None, inputs)

License

This model is licensed under the Apache License 2.0.

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including HeavensHackDev/HCAE-21M-v1.1-Instruct