Instructions to use HeavensHackDev/HCAE-21M-v1.1-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HeavensHackDev/HCAE-21M-v1.1-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="HeavensHackDev/HCAE-21M-v1.1-Instruct", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("HeavensHackDev/HCAE-21M-v1.1-Instruct", trust_remote_code=True, dtype="auto") - sentence-transformers
How to use HeavensHackDev/HCAE-21M-v1.1-Instruct with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("HeavensHackDev/HCAE-21M-v1.1-Instruct", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
HCAE-21M-v1.1-Instruct: Technical Specification
HCAE-21M-v1.1-Instruct is an instruction-tuned variant of the Hybrid Convolutional-Attention Encoder (HCAE). It is specifically engineered for asymmetric retrieval tasks and domain-intensive semantic analysis (e.g., scientific and medical corpora). By leveraging a symmetric 4+4 hybrid architecture, it maintains the efficiency required for edge deployment while achieving competitive performance on complex MTEB benchmarks.
Technical Abstract
Transitioning from v1.0, the Instruct variant in v1.1 utilizes multi-stage fine-tuning on NLI and specialized domain datasets (SciFact, Med-Tech). Structural refinements include the adoption of LayerScale (gating) and SwiGLU activation functions, which collectively improve the model's ability to delineate complex semantic boundaries in zero-shot retrieval scenarios.
- Architecture: Symmetric 4+4 configuration (Depthwise Separable Convolutions / Multi-head Self-Attention).
- Optimization: Multi-stage fine-tuning using Instruct-NLI, SciFact, and specialized technical datasets.
- Parameters: 21.1M
- Dimensions: 384
- Instruction Support: Full support for
query:andpassage:instruction prefixes.
Benchmark Results (MTEB v2)
| Task | Metric | Value |
|---|---|---|
| STSBenchmark | Spearman Correlation | 0.656 |
| SciFact | NDCG@10 | 0.413 |
| SciFact | Recall@10 | 0.523 |
Usage
Retrieval Tasks
For optimal performance in retrieval tasks, it is recommended to use the following prefixes:
- Query:
query: [Your Question] - Corpus:
passage: [Content Paragraph]
Implementation
from transformers import AutoModel, AutoTokenizer
import torch
model_name = "HeavensHackDev/HCAE-21M-v1.1-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
queries = ["query: What are the primary applications of HCAE?"]
passages = ["passage: HCAE is effectively used in semantic retrieval and information extraction."]
inputs_q = tokenizer(queries, padding=True, truncation=True, return_tensors="pt")
inputs_p = tokenizer(passages, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
query_embeddings = model(**inputs_q)
passage_embeddings = model(**inputs_p)
ONNX Inference
The model is also available in ONNX format for efficient edge deployment.
import onnxruntime as ort
import numpy as np
session = ort.InferenceSession("model.onnx")
# Note: Always include instruction prefixes in in your text processing
# inputs = tokenizer(["query: your text"], ...)
inputs = {
"input_ids": np.random.randint(0, 30522, (1, 128), dtype=np.int64),
"attention_mask": np.ones((1, 128), dtype=np.int64)
}
outputs = session.run(None, inputs)
License
This model is licensed under the Apache License 2.0.
- Downloads last month
- 1