HCAE-21M-v1.1-Base: Technical Specification

Hugging Face

HCAE-21M-v1.1-Base is a specialized text embedding model utilizing a Hybrid Convolutional-Attention Encoder architecture. This iteration optimizes the trade-off between local contextual feature extraction and global dependency modeling through a symmetric block configuration. By integrating Depthwise Separable Convolutions with Multi-head Self-Attention, HCAE achieves high representational fidelity at a compact scale of 21 million parameters.

Technical Abstract

The HCAE series is engineered to address the parameter-inefficiency of standard Transformers at small scales. Version 1.1-Base provides the foundation for general-purpose semantic similarity, leveraging refinements in normalization and non-linear mapping (LayerScale & SwiGLU) to ensure better convergence and downstream task performance.

  • Architecture: Symmetric 4+4 configuration (4 Depthwise Separable Convolution layers / 4 Multi-head Self-Attention layers).
  • Optimization: Integration of LayerScale for training stability and SwiGLU activation for improved representational mapping.
  • Parameters: 21.1M
  • Dimensions: 384
  • Max Sequence Length: 512 tokens
  • Input Format: Standard text input

Benchmark Results (MTEB v2)

Evaluation results on the Massive Text Embedding Benchmark:

Task Metric Value
STSBenchmark Spearman Correlation 0.644
SciFact NDCG@10 0.383
SciFact Recall@10 0.485

Usage

Loading via Transformers

from transformers import AutoModel, AutoTokenizer
import torch

model_name = "HeavensHackDev/HCAE-21M-v1.1-Base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)

sentences = ["HCAE-Base provides robust text embeddings.", "The model uses a hybrid architecture."]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors="pt")

with torch.no_grad():
    embeddings = model(**inputs)

ONNX Inference

The model is also available in ONNX format for efficient edge deployment and cross-platform compatibility.

import onnxruntime as ort
import numpy as np

# Load the session (ensure model.onnx and model.onnx.data are in the same directory)
session = ort.InferenceSession("model.onnx")

# Inputs should be numpy arrays (int64)
inputs = {
    "input_ids": np.random.randint(0, 30522, (1, 128), dtype=np.int64),
    "attention_mask": np.ones((1, 128), dtype=np.int64)
}

outputs = session.run(None, inputs)
embeddings = outputs[0]

Architecture Technical Details

The HCAE architecture utilizes 1D Depthwise Separable Convolutions to capture local context efficiently, followed by Self-Attention blocks for global dependency modeling. The model incorporates LayerScale and SwiGLU activation functions for improved training stability and representational capacity.

License

This model is licensed under the Apache License 2.0.

Downloads last month
55
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including HeavensHackDev/HCAE-21M-v1.1-Base