HCAE-21M-v1.1-Instruct: Technical Specification
HCAE-21M-v1.1-Instruct is an instruction-tuned variant of the Hybrid Convolutional-Attention Encoder (HCAE). It is specifically engineered for asymmetric retrieval tasks and domain-intensive semantic analysis (e.g., scientific and medical corpora). By leveraging a symmetric 4+4 hybrid architecture, it maintains the efficiency required for edge deployment while achieving competitive performance on complex MTEB benchmarks.
Technical Abstract
Transitioning from v1.0, the Instruct variant in v1.1 utilizes multi-stage fine-tuning on NLI and specialized domain datasets (SciFact, Med-Tech). Structural refinements include the adoption of LayerScale (gating) and SwiGLU activation functions, which collectively improve the model's ability to delineate complex semantic boundaries in zero-shot retrieval scenarios.
- Architecture: Symmetric 4+4 configuration (Depthwise Separable Convolutions / Multi-head Self-Attention).
- Optimization: Multi-stage fine-tuning using Instruct-NLI, SciFact, and specialized technical datasets.
- Parameters: 21.1M
- Dimensions: 384
- Instruction Support: Full support for
query:andpassage:instruction prefixes.
Benchmark Results (MTEB v2)
| Task | Metric | Value |
|---|---|---|
| STSBenchmark | Spearman Correlation | 0.656 |
| SciFact | NDCG@10 | 0.413 |
| SciFact | Recall@10 | 0.523 |
Usage
Retrieval Tasks
For optimal performance in retrieval tasks, it is recommended to use the following prefixes:
- Query:
query: [Your Question] - Corpus:
passage: [Content Paragraph]
Implementation
from transformers import AutoModel, AutoTokenizer
import torch
model_name = "HeavensHackDev/HCAE-21M-v1.1-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
queries = ["query: What are the primary applications of HCAE?"]
passages = ["passage: HCAE is effectively used in semantic retrieval and information extraction."]
inputs_q = tokenizer(queries, padding=True, truncation=True, return_tensors="pt")
inputs_p = tokenizer(passages, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
query_embeddings = model(**inputs_q)
passage_embeddings = model(**inputs_p)
ONNX Inference
The model is also available in ONNX format for efficient edge deployment.
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model.onnx")
# Note: Always include instruction prefixes in in your text processing
# inputs = tokenizer(["query: your text"], ...)
inputs = {
"input_ids": np.random.randint(0, 30522, (1, 128), dtype=np.int64),
"attention_mask": np.ones((1, 128), dtype=np.int64)
}
outputs = session.run(None, inputs)
License
This model is licensed under the Apache License 2.0.
- Downloads last month
- 42