HCAE-21M-Instruct / README.md
HeavensHackDev's picture
Update README.md
d975cb7 verified
metadata
license: apache-2.0
library_name: pytorch
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - mteb
  - hcae

HCAE-21M (Hybrid Convolutional-Attention Encoder)

HCAE-21M is a mid-scale (21 Million parameters) text embedding model combining Depthwise Separable Convolutions and Self-Attention layers. It achieves high performance on Semantic Textual Similarity and Retrieval tasks while remaining extremely memory-efficient.

Architecture Description

  • Size: ~21M parameters (d_model=384)
  • Lower Layers: 5 layers of Depthwise Separable Conv1d + FFN.
  • Upper Layers: 3 layers of Multihead Self-Attention.
  • Pooling Strategy: Global Mean Pooling.

Benchmark Comparison (MTEB)

This table delineates the performance disparities between architectural iterations:

Model Revision STSBenchmark (Spearman) SciFact (Recall@10) Description
HCAE-21M-Base 0.507 0.324 Baseline configuration trained extensively on the MS MARCO dataset.
HCAE-21M-Instruct 0.591 0.393 Multi-stage tuning incorporating ArXiv, STS-B, and SQuAD instruction tuning paradigms.

Utilization Guidelines (Instruction Format)

For optimal retrieval performance, prepend the instruction mapping to the query text: Instruction: Retrieve the exact document that answers the following question. Query: [Your Query]