| --- |
| license: apache-2.0 |
| library_name: pytorch |
| tags: |
| - sentence-transformers |
| - feature-extraction |
| - sentence-similarity |
| - mteb |
| - hcae |
| --- |
| |
| # HCAE-21M (Hybrid Convolutional-Attention Encoder) |
| **HCAE-21M** is a mid-scale (21 Million parameters) text embedding model combining Depthwise Separable Convolutions and Self-Attention layers. It achieves high performance on Semantic Textual Similarity and Retrieval tasks while remaining extremely memory-efficient. |
|
|
|
|
|
|
| <img src="https://cdn-uploads.huggingface.co/production/uploads/680c9127408ea47e6c1dd6e8/0KKsVpqg2Id01nxh8zRjO.png" width="400"> |
|
|
|
|
| ## Architecture Description |
| - **Size:** ~21M parameters (d_model=384) |
| - **Lower Layers:** 5 layers of Depthwise Separable Conv1d + FFN. |
| - **Upper Layers:** 3 layers of Multihead Self-Attention. |
| - **Pooling Strategy:** Global Mean Pooling. |
| |
| |
| |
| ## Benchmark Comparison (MTEB) |
| |
| This table delineates the performance disparities between architectural iterations: |
| |
| | Model Revision | STSBenchmark (Spearman) | SciFact (Recall@10) | Description | |
| |---|---|---|---| |
| | **HCAE-21M-Base** | `0.507` | `0.324` | Baseline configuration trained extensively on the MS MARCO dataset. | |
| | **HCAE-21M-Instruct** | `0.591` | `0.393` | Multi-stage tuning incorporating ArXiv, STS-B, and SQuAD instruction tuning paradigms. | |
| |
| |
| <p align="center"> |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/680c9127408ea47e6c1dd6e8/VuJ6ayS--Ot8i-715fT3a.png" width="800" style="border-radius: 10px; box-shadow: 0 4px 20px rgba(0,0,0,0.3);"> |
| </p> |
| |
| |
| |
| ## Utilization Guidelines (Instruction Format) |
| For optimal retrieval performance, prepend the instruction mapping to the query text: |
| `Instruction: Retrieve the exact document that answers the following question. Query: [Your Query]` |
| |