Sentence Similarity
ONNX
Safetensors
English
ogma
embeddings
dense-retrieval
matryoshka
rag
agents
mteb
semantic-search
text-embeddings
text-embedding
vector-search
document-retrieval
similarity-search
classification
clustering
edge-ai
on-device
local-inference
efficient-ai
rag-retrieval
custom_code
Eval Results (legacy)
Fix: relative imports for Hub loading, corrected model cards with accurate numbers and usage examples
Browse files- README.md +89 -146
- __init__.py +0 -0
- embeddings.py +1 -1
- ogma_model.py +10 -10
- pooling.py +1 -1
- variants/transformer.py +2 -2
README.md
CHANGED
|
@@ -744,183 +744,126 @@ model-index:
|
|
| 744 |
|
| 745 |
# ogma-micro
|
| 746 |
|
| 747 |
-
**2.
|
| 748 |
|
| 749 |
-
2-layer transformer, 128 hidden dim,
|
| 750 |
|
| 751 |
## Highlights
|
| 752 |
|
| 753 |
-
- **
|
| 754 |
-
- **
|
| 755 |
-
- **
|
| 756 |
-
- **
|
| 757 |
-
- **
|
| 758 |
-
- **Pure PyTorch** — no external transformer library dependencies
|
| 759 |
|
| 760 |
-
##
|
| 761 |
-
|
| 762 |
-
| Component | Details |
|
| 763 |
-
|-----------|---------|
|
| 764 |
-
| Parameters | 2.3M |
|
| 765 |
-
| Layers | 2 |
|
| 766 |
-
| Hidden dim (d_model) | 128 |
|
| 767 |
-
| Embedding dim (d_embed) | 64 |
|
| 768 |
-
| Output dim (d_output) | 128 |
|
| 769 |
-
| Attention heads | 2 |
|
| 770 |
-
| Max sequence length | 1024 |
|
| 771 |
-
| Matryoshka dims | [32, 64, 128] |
|
| 772 |
-
| Pooling | Mean (mask-aware) |
|
| 773 |
-
| Position encoding | RoPE |
|
| 774 |
-
| FFN | SwiGLU |
|
| 775 |
-
| Normalization | Pre-LayerNorm |
|
| 776 |
-
| Tokenizer | SentencePiece Unigram (30K vocab) |
|
| 777 |
-
| Training | Knowledge distillation from teacher model |
|
| 778 |
-
|
| 779 |
-
## MTEB Results
|
| 780 |
-
|
| 781 |
-
### Category-Level Scores
|
| 782 |
-
|
| 783 |
-
| Category | ogma-micro | Potion-32M | Potion-8M | vs Potion-32M |
|
| 784 |
-
|----------|------------|------------|-----------|---------------|
|
| 785 |
-
| Classification | **59.49** | 66.01 | 64.46 | -6.52 |
|
| 786 |
-
| Clustering | **36.88** | 39.24 | 36.88 | -2.36 |
|
| 787 |
-
| PairClassification | **78.62** | 78.17 | 76.62 | +0.45 |
|
| 788 |
-
| Reranking | **49.74** | 50.92 | 49.73 | -1.18 |
|
| 789 |
-
| Retrieval | **33.09** | 32.21 | 30.43 | +0.88 |
|
| 790 |
-
| STS | **75.63** | 73.86 | 72.93 | +1.77 |
|
| 791 |
-
| Summarization | **31.77** | 29.77 | 29.26 | +2.00 |
|
| 792 |
-
| **Overall** | **49.77** | 51.22 | 49.58 | **-1.45** |
|
| 793 |
-
|
| 794 |
-
> **Potion scores are locally reproduced** using the same evaluation pipeline and hardware for fair head-to-head comparison. These are not self-reported numbers from the Potion model card.
|
| 795 |
-
|
| 796 |
-
## Usage
|
| 797 |
-
|
| 798 |
-
### Quick Start
|
| 799 |
|
| 800 |
```python
|
| 801 |
import torch
|
| 802 |
-
|
| 803 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 804 |
|
| 805 |
-
# Load model
|
| 806 |
from ogma_model import OgmaModel
|
| 807 |
-
from config import OgmaConfig
|
| 808 |
from tokenizer import OgmaTokenizer
|
| 809 |
|
| 810 |
-
# Load
|
| 811 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 812 |
model.eval()
|
| 813 |
|
| 814 |
-
# Load tokenizer
|
| 815 |
-
|
| 816 |
-
tokenizer = OgmaTokenizer("path/to/tokenizer.model")
|
| 817 |
|
| 818 |
# Encode text
|
| 819 |
-
|
| 820 |
-
|
| 821 |
-
|
| 822 |
-
|
| 823 |
-
attention_mask = torch.tensor(encoded["attention_mask"])
|
| 824 |
-
|
| 825 |
-
# Use task tokens for asymmetric encoding
|
| 826 |
-
from config import TaskToken
|
| 827 |
|
| 828 |
with torch.no_grad():
|
| 829 |
-
|
| 830 |
-
embeddings = model.encode(token_ids, attention_mask, task=TaskToken.SYM)
|
| 831 |
-
|
| 832 |
-
# For retrieval — encode queries and documents separately
|
| 833 |
-
query_embs = model.encode(token_ids[:1], attention_mask[:1], task=TaskToken.QRY)
|
| 834 |
-
doc_embs = model.encode(token_ids[1:], attention_mask[1:], task=TaskToken.DOC)
|
| 835 |
|
| 836 |
-
|
|
|
|
|
|
|
|
|
|
| 837 |
```
|
| 838 |
|
| 839 |
-
##
|
| 840 |
|
| 841 |
```python
|
| 842 |
-
|
| 843 |
-
|
| 844 |
-
|
| 845 |
-
# Reduce to any Matryoshka dimension: [32, 64, 128]
|
| 846 |
-
dim = 64
|
| 847 |
-
reduced_embs = torch.nn.functional.normalize(full_embs[:, :dim], p=2, dim=-1)
|
| 848 |
-
# These reduced embeddings are trained to be effective at lower dims
|
| 849 |
-
```
|
| 850 |
|
| 851 |
-
|
|
|
|
| 852 |
|
| 853 |
-
|
| 854 |
-
|
| 855 |
-
|
| 856 |
-
|
| 857 |
-
|
| 858 |
-
from config import OgmaConfig
|
| 859 |
-
|
| 860 |
-
# Load config
|
| 861 |
-
with open("path/to/ogma-micro/config.json") as f:
|
| 862 |
-
import json
|
| 863 |
-
config_dict = json.load(f)
|
| 864 |
-
|
| 865 |
-
config = OgmaConfig.from_dict(config_dict)
|
| 866 |
-
model = OgmaModel(config)
|
| 867 |
|
| 868 |
-
|
| 869 |
-
|
| 870 |
-
model.load_state_dict(state_dict)
|
| 871 |
-
model.eval()
|
| 872 |
```
|
| 873 |
|
| 874 |
-
##
|
| 875 |
-
|
| 876 |
-
Ogma uses task-specific prefix tokens for asymmetric encoding:
|
| 877 |
-
|
| 878 |
-
| Token | ID | Use Case |
|
| 879 |
-
|-------|-----|----------|
|
| 880 |
-
| `[QRY]` | 4 | Query encoding for retrieval |
|
| 881 |
-
| `[DOC]` | 5 | Document/passage encoding for retrieval |
|
| 882 |
-
| `[SYM]` | 6 | Symmetric tasks (STS, classification, clustering) |
|
| 883 |
-
|
| 884 |
-
For retrieval tasks, encode queries with `[QRY]` and documents with `[DOC]`. For all other tasks, use `[SYM]`.
|
| 885 |
-
|
| 886 |
-
## Training
|
| 887 |
-
|
| 888 |
-
Ogma is trained via **knowledge distillation** from a larger teacher embedding model. The training pipeline:
|
| 889 |
|
| 890 |
-
|
| 891 |
-
|
| 892 |
-
|
| 893 |
-
|
| 894 |
-
|
| 895 |
-
## Files
|
| 896 |
-
|
| 897 |
-
| File | Description |
|
| 898 |
-
|------|-------------|
|
| 899 |
-
| `model.safetensors` | Model weights (safetensors format) |
|
| 900 |
-
| `model.pt` | Model weights (PyTorch format) |
|
| 901 |
-
| `config.json` | Model configuration |
|
| 902 |
-
| `config.yaml` | Original training config |
|
| 903 |
-
| `tokenizer.json` | HuggingFace tokenizer |
|
| 904 |
-
| `tokenizer_config.json` | Tokenizer configuration |
|
| 905 |
-
| `token_embeds_128d.npy` | Pre-computed token embeddings (30K × 128, float16) |
|
| 906 |
-
| `ogma_model.py` | OgmaModel class |
|
| 907 |
-
| `config.py` | OgmaConfig dataclass |
|
| 908 |
-
| `embeddings.py` | Token embedding + RoPE |
|
| 909 |
-
| `pooling.py` | Pooling strategies |
|
| 910 |
-
| `variants/transformer.py` | Transformer encoder variant |
|
| 911 |
-
| `tokenizer.py` | OgmaTokenizer wrapper |
|
| 912 |
-
| `results/` | MTEB result JSONs |
|
| 913 |
|
| 914 |
-
##
|
| 915 |
|
| 916 |
-
|
| 917 |
-
|
| 918 |
-
|
| 919 |
-
|
| 920 |
-
|
| 921 |
-
|
| 922 |
-
|
| 923 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 924 |
|
| 925 |
## License
|
| 926 |
|
|
|
|
| 744 |
|
| 745 |
# ogma-micro
|
| 746 |
|
| 747 |
+
**2.32M parameter text embedding model** by [Axiotic AI](https://axiotic.ai), achieving **49.77 average** on MTEB English (54/54 tasks).
|
| 748 |
|
| 749 |
+
2-layer transformer, 128 hidden dim, mean pooling — smallest model for extreme edge deployment.
|
| 750 |
|
| 751 |
## Highlights
|
| 752 |
|
| 753 |
+
- **49.77 MTEB average** — comparable to Potion-8M (49.58) at 3.4x fewer parameters
|
| 754 |
+
- **Matryoshka embeddings** — dimensions [32, 64, 128] for flexible storage/compute tradeoffs
|
| 755 |
+
- **Asymmetric encoding** — dedicated `[QRY]`, `[DOC]`, `[SYM]` task tokens
|
| 756 |
+
- **1024 token context** — handles longer passages than typical small models
|
| 757 |
+
- **HuggingFace Hub** — load directly, no local package installation needed
|
|
|
|
| 758 |
|
| 759 |
+
## Quick Start
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 760 |
|
| 761 |
```python
|
| 762 |
import torch
|
| 763 |
+
from huggingface_hub import snapshot_download
|
| 764 |
+
import sys, yaml
|
| 765 |
+
|
| 766 |
+
# Download model from HuggingFace
|
| 767 |
+
model_path = snapshot_download("axiotic/ogma-micro")
|
| 768 |
+
sys.path.insert(0, model_path)
|
| 769 |
|
|
|
|
| 770 |
from ogma_model import OgmaModel
|
| 771 |
+
from config import OgmaConfig, TaskToken
|
| 772 |
from tokenizer import OgmaTokenizer
|
| 773 |
|
| 774 |
+
# Load model
|
| 775 |
+
with open(f"{model_path}/config.yaml") as f:
|
| 776 |
+
cfg = yaml.safe_load(f)
|
| 777 |
+
config = OgmaConfig.from_dict(cfg)
|
| 778 |
+
model = OgmaModel(config)
|
| 779 |
+
state = torch.load(f"{model_path}/model.pt", map_location="cpu", weights_only=True)
|
| 780 |
+
model.load_state_dict(state)
|
| 781 |
model.eval()
|
| 782 |
|
| 783 |
+
# Load tokenizer
|
| 784 |
+
tokenizer = OgmaTokenizer(f"{model_path}/tokenizer.json")
|
|
|
|
| 785 |
|
| 786 |
# Encode text
|
| 787 |
+
sentences = ["The quick brown fox", "A fast auburn canine"]
|
| 788 |
+
enc = tokenizer.batch_encode(sentences, max_length=1024)
|
| 789 |
+
ids = torch.tensor(enc["input_ids"])
|
| 790 |
+
mask = torch.tensor(enc["attention_mask"])
|
|
|
|
|
|
|
|
|
|
|
|
|
| 791 |
|
| 792 |
with torch.no_grad():
|
| 793 |
+
embs = model.encode(ids, mask, task=TaskToken.SYM)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 794 |
|
| 795 |
+
# Cosine similarity
|
| 796 |
+
sim = torch.nn.functional.cosine_similarity(embs[0], embs[1], dim=0)
|
| 797 |
+
print(f"Similarity: {sim.item():.4f}")
|
| 798 |
+
print(f"Shape: {embs.shape}") # (2, 128)
|
| 799 |
```
|
| 800 |
|
| 801 |
+
## Retrieval (Asymmetric Encoding)
|
| 802 |
|
| 803 |
```python
|
| 804 |
+
queries = ["What is machine learning?"]
|
| 805 |
+
documents = ["ML is a subset of AI...", "The weather is sunny today"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 806 |
|
| 807 |
+
q_enc = tokenizer.batch_encode(queries, max_length=1024)
|
| 808 |
+
d_enc = tokenizer.batch_encode(documents, max_length=1024)
|
| 809 |
|
| 810 |
+
with torch.no_grad():
|
| 811 |
+
q_embs = model.encode(torch.tensor(q_enc["input_ids"]),
|
| 812 |
+
torch.tensor(q_enc["attention_mask"]), task=TaskToken.QRY)
|
| 813 |
+
d_embs = model.encode(torch.tensor(d_enc["input_ids"]),
|
| 814 |
+
torch.tensor(d_enc["attention_mask"]), task=TaskToken.DOC)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 815 |
|
| 816 |
+
scores = q_embs @ d_embs.T
|
| 817 |
+
print(f"Relevance scores: {scores}")
|
|
|
|
|
|
|
| 818 |
```
|
| 819 |
|
| 820 |
+
## Matryoshka Dimensionality Reduction
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 821 |
|
| 822 |
+
```python
|
| 823 |
+
full = model.encode(ids, mask, task=TaskToken.SYM) # (128d)
|
| 824 |
+
small = torch.nn.functional.normalize(full[:, :32], p=2, dim=-1) # (32d)
|
| 825 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 826 |
|
| 827 |
+
## Architecture
|
| 828 |
|
| 829 |
+
| Component | Details |
|
| 830 |
+
|-----------|---------|
|
| 831 |
+
| Parameters | 2.32M |
|
| 832 |
+
| Layers | 2 |
|
| 833 |
+
| Hidden dim | 128 |
|
| 834 |
+
| Output dim | 128 |
|
| 835 |
+
| Heads | 2 |
|
| 836 |
+
| Max seq len | 1024 |
|
| 837 |
+
| Matryoshka | [32, 64, 128] |
|
| 838 |
+
| Pooling | Mean |
|
| 839 |
+
| Positional | RoPE |
|
| 840 |
+
| FFN | SwiGLU |
|
| 841 |
+
| Tokenizer | SentencePiece Unigram (30K) |
|
| 842 |
+
|
| 843 |
+
## MTEB Results (54/54 tasks)
|
| 844 |
+
|
| 845 |
+
| Category | ogma-micro | Potion-32M | Potion-8M | vs P-32M |
|
| 846 |
+
|----------|------------|------------|-----------|----------|
|
| 847 |
+
| Classification | 59.5 | 66.0 | 64.5 | -6.5 |
|
| 848 |
+
| Clustering | 36.9 | 39.2 | 36.9 | -2.3 |
|
| 849 |
+
| PairClassification | 78.6 | 78.2 | 76.6 | +0.4 |
|
| 850 |
+
| Reranking | 49.7 | 50.9 | 49.7 | -1.2 |
|
| 851 |
+
| Retrieval | 33.1 | 32.2 | 30.4 | +0.9 |
|
| 852 |
+
| STS | 75.6 | 73.9 | 72.9 | +1.7 |
|
| 853 |
+
| Summarization | 31.8 | 29.8 | 29.3 | +2.0 |
|
| 854 |
+
| **Overall** | **49.77** | **51.22** | **49.58** | **-1.45** |
|
| 855 |
+
|
| 856 |
+
> Potion scores are locally reproduced using the same eval pipeline for fair comparison.
|
| 857 |
+
|
| 858 |
+
## Ogma Model Family
|
| 859 |
+
|
| 860 |
+
| Model | Params | MTEB-54 | Best For |
|
| 861 |
+
|-------|--------|---------|----------|
|
| 862 |
+
| [ogma-large](https://huggingface.co/axiotic/ogma-large) | 32.37M | 57.38 | Maximum quality |
|
| 863 |
+
| [ogma-base](https://huggingface.co/axiotic/ogma-base) | 13.32M | 56.54 | General purpose |
|
| 864 |
+
| [ogma-small](https://huggingface.co/axiotic/ogma-small) | 8.60M | 55.79 | Best sub-10M |
|
| 865 |
+
| [ogma-mini](https://huggingface.co/axiotic/ogma-mini) | 3.51M | 51.42 | Edge deployment |
|
| 866 |
+
| [ogma-micro](https://huggingface.co/axiotic/ogma-micro) | 2.32M | 49.77 | Extreme edge |
|
| 867 |
|
| 868 |
## License
|
| 869 |
|
__init__.py
ADDED
|
File without changes
|
embeddings.py
CHANGED
|
@@ -5,7 +5,7 @@ from __future__ import annotations
|
|
| 5 |
import torch
|
| 6 |
import torch.nn as nn
|
| 7 |
|
| 8 |
-
from
|
| 9 |
|
| 10 |
__all__ = ["TokenEmbedding", "RotaryPositionalEncoding"]
|
| 11 |
|
|
|
|
| 5 |
import torch
|
| 6 |
import torch.nn as nn
|
| 7 |
|
| 8 |
+
from .config import OgmaConfig
|
| 9 |
|
| 10 |
__all__ = ["TokenEmbedding", "RotaryPositionalEncoding"]
|
| 11 |
|
ogma_model.py
CHANGED
|
@@ -6,16 +6,16 @@ import torch
|
|
| 6 |
import torch.nn as nn
|
| 7 |
import torch.nn.functional as F
|
| 8 |
|
| 9 |
-
from
|
| 10 |
-
from
|
| 11 |
-
from
|
| 12 |
-
from
|
| 13 |
-
from
|
| 14 |
-
from
|
| 15 |
-
from
|
| 16 |
-
from
|
| 17 |
-
from
|
| 18 |
-
from
|
| 19 |
|
| 20 |
__all__ = ["OgmaModel"]
|
| 21 |
|
|
|
|
| 6 |
import torch.nn as nn
|
| 7 |
import torch.nn.functional as F
|
| 8 |
|
| 9 |
+
from .config import OgmaConfig, TaskToken, VariantType
|
| 10 |
+
from .embeddings import TokenEmbedding
|
| 11 |
+
from .pooling import create_pooling
|
| 12 |
+
from .variants.conv import ConvVariant
|
| 13 |
+
from .variants.deep_narrow import DeepNarrowVariant
|
| 14 |
+
from .variants.linear_attention import LinearAttentionVariant
|
| 15 |
+
from .variants.mlp_mixer import MLPMixerVariant
|
| 16 |
+
from .variants.transformer import TransformerVariant
|
| 17 |
+
from .variants.transformer_resa import TransformerReSAVariant
|
| 18 |
+
from .variants.gla import GLAVariant
|
| 19 |
|
| 20 |
__all__ = ["OgmaModel"]
|
| 21 |
|
pooling.py
CHANGED
|
@@ -6,7 +6,7 @@ import torch
|
|
| 6 |
import torch.nn as nn
|
| 7 |
import torch.nn.functional as F
|
| 8 |
|
| 9 |
-
from
|
| 10 |
|
| 11 |
__all__ = [
|
| 12 |
"create_pooling",
|
|
|
|
| 6 |
import torch.nn as nn
|
| 7 |
import torch.nn.functional as F
|
| 8 |
|
| 9 |
+
from .config import OgmaConfig, PoolingType
|
| 10 |
|
| 11 |
__all__ = [
|
| 12 |
"create_pooling",
|
variants/transformer.py
CHANGED
|
@@ -8,8 +8,8 @@ import torch
|
|
| 8 |
import torch.nn as nn
|
| 9 |
import torch.nn.functional as F
|
| 10 |
|
| 11 |
-
from
|
| 12 |
-
from
|
| 13 |
|
| 14 |
__all__ = ["TransformerVariant"]
|
| 15 |
|
|
|
|
| 8 |
import torch.nn as nn
|
| 9 |
import torch.nn.functional as F
|
| 10 |
|
| 11 |
+
from ..config import OgmaConfig
|
| 12 |
+
from ..embeddings import RotaryPositionalEncoding, apply_rope
|
| 13 |
|
| 14 |
__all__ = ["TransformerVariant"]
|
| 15 |
|