Antreas commited on
Commit
ac59af7
·
verified ·
1 Parent(s): 6efaeab

Fix: relative imports for Hub loading, corrected model cards with accurate numbers and usage examples

Browse files
Files changed (6) hide show
  1. README.md +89 -146
  2. __init__.py +0 -0
  3. embeddings.py +1 -1
  4. ogma_model.py +10 -10
  5. pooling.py +1 -1
  6. variants/transformer.py +2 -2
README.md CHANGED
@@ -744,183 +744,126 @@ model-index:
744
 
745
  # ogma-micro
746
 
747
- **2.3M parameter text embedding model** by [Axiotic AI](https://axiotic.ai), achieving **49.77 average** on MTEB English v1 (54/54 tasks).
748
 
749
- 2-layer transformer, 128 hidden dim, 64 embedding dim — smallest model.
750
 
751
  ## Highlights
752
 
753
- - **2.3M parameters** — small enough for CPU inference, edge deployment, and resource-constrained environments
754
- - **49.77 MTEB average** — outperforms Potion-32M (51.22) despite being significantly smaller
755
- - **Matryoshka embeddings** — use dimensions [32, 64, 128] for flexible storage/compute tradeoffs
756
- - **Asymmetric encoding** — dedicated `[QRY]`, `[DOC]`, `[SYM]` task tokens for query-document and symmetric tasks
757
- - **1024 token context** — handles longer passages than typical small models (Potion: 512)
758
- - **Pure PyTorch** — no external transformer library dependencies
759
 
760
- ## Architecture
761
-
762
- | Component | Details |
763
- |-----------|---------|
764
- | Parameters | 2.3M |
765
- | Layers | 2 |
766
- | Hidden dim (d_model) | 128 |
767
- | Embedding dim (d_embed) | 64 |
768
- | Output dim (d_output) | 128 |
769
- | Attention heads | 2 |
770
- | Max sequence length | 1024 |
771
- | Matryoshka dims | [32, 64, 128] |
772
- | Pooling | Mean (mask-aware) |
773
- | Position encoding | RoPE |
774
- | FFN | SwiGLU |
775
- | Normalization | Pre-LayerNorm |
776
- | Tokenizer | SentencePiece Unigram (30K vocab) |
777
- | Training | Knowledge distillation from teacher model |
778
-
779
- ## MTEB Results
780
-
781
- ### Category-Level Scores
782
-
783
- | Category | ogma-micro | Potion-32M | Potion-8M | vs Potion-32M |
784
- |----------|------------|------------|-----------|---------------|
785
- | Classification | **59.49** | 66.01 | 64.46 | -6.52 |
786
- | Clustering | **36.88** | 39.24 | 36.88 | -2.36 |
787
- | PairClassification | **78.62** | 78.17 | 76.62 | +0.45 |
788
- | Reranking | **49.74** | 50.92 | 49.73 | -1.18 |
789
- | Retrieval | **33.09** | 32.21 | 30.43 | +0.88 |
790
- | STS | **75.63** | 73.86 | 72.93 | +1.77 |
791
- | Summarization | **31.77** | 29.77 | 29.26 | +2.00 |
792
- | **Overall** | **49.77** | 51.22 | 49.58 | **-1.45** |
793
-
794
- > **Potion scores are locally reproduced** using the same evaluation pipeline and hardware for fair head-to-head comparison. These are not self-reported numbers from the Potion model card.
795
-
796
- ## Usage
797
-
798
- ### Quick Start
799
 
800
  ```python
801
  import torch
802
- import numpy as np
803
- from pathlib import Path
 
 
 
 
804
 
805
- # Load model
806
  from ogma_model import OgmaModel
807
- from config import OgmaConfig
808
  from tokenizer import OgmaTokenizer
809
 
810
- # Load from checkpoint directory
811
- model = OgmaModel.from_checkpoint("path/to/ogma-micro", device="cpu")
 
 
 
 
 
812
  model.eval()
813
 
814
- # Load tokenizer (uses the SentencePiece model embedded in tokenizer.json)
815
- # The tokenizer needs the .model file — extract from tokenizer.json or use:
816
- tokenizer = OgmaTokenizer("path/to/tokenizer.model")
817
 
818
  # Encode text
819
- texts = ["This is a query", "This is a document"]
820
- encoded = tokenizer.batch_encode(texts, max_length=1024)
821
-
822
- token_ids = torch.tensor(encoded["input_ids"])
823
- attention_mask = torch.tensor(encoded["attention_mask"])
824
-
825
- # Use task tokens for asymmetric encoding
826
- from config import TaskToken
827
 
828
  with torch.no_grad():
829
- # For symmetric tasks (STS, clustering, classification)
830
- embeddings = model.encode(token_ids, attention_mask, task=TaskToken.SYM)
831
-
832
- # For retrieval — encode queries and documents separately
833
- query_embs = model.encode(token_ids[:1], attention_mask[:1], task=TaskToken.QRY)
834
- doc_embs = model.encode(token_ids[1:], attention_mask[1:], task=TaskToken.DOC)
835
 
836
- print(f"Embedding shape: {embeddings.shape}") # (2, 128)
 
 
 
837
  ```
838
 
839
- ### Matryoshka Dimensionality Reduction
840
 
841
  ```python
842
- # Full embeddings: 128d
843
- full_embs = model.encode(token_ids, attention_mask, task=TaskToken.SYM)
844
-
845
- # Reduce to any Matryoshka dimension: [32, 64, 128]
846
- dim = 64
847
- reduced_embs = torch.nn.functional.normalize(full_embs[:, :dim], p=2, dim=-1)
848
- # These reduced embeddings are trained to be effective at lower dims
849
- ```
850
 
851
- ### Loading with safetensors
 
852
 
853
- ```python
854
- import torch
855
- import yaml
856
- from safetensors.torch import load_file
857
- from ogma_model import OgmaModel
858
- from config import OgmaConfig
859
-
860
- # Load config
861
- with open("path/to/ogma-micro/config.json") as f:
862
- import json
863
- config_dict = json.load(f)
864
-
865
- config = OgmaConfig.from_dict(config_dict)
866
- model = OgmaModel(config)
867
 
868
- # Load weights from safetensors
869
- state_dict = load_file("path/to/ogma-micro/model.safetensors")
870
- model.load_state_dict(state_dict)
871
- model.eval()
872
  ```
873
 
874
- ## Task Tokens
875
-
876
- Ogma uses task-specific prefix tokens for asymmetric encoding:
877
-
878
- | Token | ID | Use Case |
879
- |-------|-----|----------|
880
- | `[QRY]` | 4 | Query encoding for retrieval |
881
- | `[DOC]` | 5 | Document/passage encoding for retrieval |
882
- | `[SYM]` | 6 | Symmetric tasks (STS, classification, clustering) |
883
-
884
- For retrieval tasks, encode queries with `[QRY]` and documents with `[DOC]`. For all other tasks, use `[SYM]`.
885
-
886
- ## Training
887
-
888
- Ogma is trained via **knowledge distillation** from a larger teacher embedding model. The training pipeline:
889
 
890
- 1. **Tokenizer**: SentencePiece Unigram model trained on the distillation corpus (30K vocab)
891
- 2. **Token embeddings**: PCA-reduced embeddings from the teacher model, providing a strong initialization
892
- 3. **Distillation**: MSE loss between student and teacher embeddings, with Matryoshka loss at multiple dimensions
893
- 4. **Architecture**: Standard transformer encoder with RoPE positional encoding and SwiGLU FFN
894
-
895
- ## Files
896
-
897
- | File | Description |
898
- |------|-------------|
899
- | `model.safetensors` | Model weights (safetensors format) |
900
- | `model.pt` | Model weights (PyTorch format) |
901
- | `config.json` | Model configuration |
902
- | `config.yaml` | Original training config |
903
- | `tokenizer.json` | HuggingFace tokenizer |
904
- | `tokenizer_config.json` | Tokenizer configuration |
905
- | `token_embeds_128d.npy` | Pre-computed token embeddings (30K × 128, float16) |
906
- | `ogma_model.py` | OgmaModel class |
907
- | `config.py` | OgmaConfig dataclass |
908
- | `embeddings.py` | Token embedding + RoPE |
909
- | `pooling.py` | Pooling strategies |
910
- | `variants/transformer.py` | Transformer encoder variant |
911
- | `tokenizer.py` | OgmaTokenizer wrapper |
912
- | `results/` | MTEB result JSONs |
913
 
914
- ## Citation
915
 
916
- ```bibtex
917
- @misc{ogma2026,
918
- title={Ogma: Small High-Performance Text Embeddings},
919
- author={Axiotic AI},
920
- year={2026},
921
- url={https://huggingface.co/axiotic/ogma-micro}
922
- }
923
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
924
 
925
  ## License
926
 
 
744
 
745
  # ogma-micro
746
 
747
+ **2.32M parameter text embedding model** by [Axiotic AI](https://axiotic.ai), achieving **49.77 average** on MTEB English (54/54 tasks).
748
 
749
+ 2-layer transformer, 128 hidden dim, mean pooling — smallest model for extreme edge deployment.
750
 
751
  ## Highlights
752
 
753
+ - **49.77 MTEB average** — comparable to Potion-8M (49.58) at 3.4x fewer parameters
754
+ - **Matryoshka embeddings** — dimensions [32, 64, 128] for flexible storage/compute tradeoffs
755
+ - **Asymmetric encoding** — dedicated `[QRY]`, `[DOC]`, `[SYM]` task tokens
756
+ - **1024 token context** — handles longer passages than typical small models
757
+ - **HuggingFace Hub** — load directly, no local package installation needed
 
758
 
759
+ ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
760
 
761
  ```python
762
  import torch
763
+ from huggingface_hub import snapshot_download
764
+ import sys, yaml
765
+
766
+ # Download model from HuggingFace
767
+ model_path = snapshot_download("axiotic/ogma-micro")
768
+ sys.path.insert(0, model_path)
769
 
 
770
  from ogma_model import OgmaModel
771
+ from config import OgmaConfig, TaskToken
772
  from tokenizer import OgmaTokenizer
773
 
774
+ # Load model
775
+ with open(f"{model_path}/config.yaml") as f:
776
+ cfg = yaml.safe_load(f)
777
+ config = OgmaConfig.from_dict(cfg)
778
+ model = OgmaModel(config)
779
+ state = torch.load(f"{model_path}/model.pt", map_location="cpu", weights_only=True)
780
+ model.load_state_dict(state)
781
  model.eval()
782
 
783
+ # Load tokenizer
784
+ tokenizer = OgmaTokenizer(f"{model_path}/tokenizer.json")
 
785
 
786
  # Encode text
787
+ sentences = ["The quick brown fox", "A fast auburn canine"]
788
+ enc = tokenizer.batch_encode(sentences, max_length=1024)
789
+ ids = torch.tensor(enc["input_ids"])
790
+ mask = torch.tensor(enc["attention_mask"])
 
 
 
 
791
 
792
  with torch.no_grad():
793
+ embs = model.encode(ids, mask, task=TaskToken.SYM)
 
 
 
 
 
794
 
795
+ # Cosine similarity
796
+ sim = torch.nn.functional.cosine_similarity(embs[0], embs[1], dim=0)
797
+ print(f"Similarity: {sim.item():.4f}")
798
+ print(f"Shape: {embs.shape}") # (2, 128)
799
  ```
800
 
801
+ ## Retrieval (Asymmetric Encoding)
802
 
803
  ```python
804
+ queries = ["What is machine learning?"]
805
+ documents = ["ML is a subset of AI...", "The weather is sunny today"]
 
 
 
 
 
 
806
 
807
+ q_enc = tokenizer.batch_encode(queries, max_length=1024)
808
+ d_enc = tokenizer.batch_encode(documents, max_length=1024)
809
 
810
+ with torch.no_grad():
811
+ q_embs = model.encode(torch.tensor(q_enc["input_ids"]),
812
+ torch.tensor(q_enc["attention_mask"]), task=TaskToken.QRY)
813
+ d_embs = model.encode(torch.tensor(d_enc["input_ids"]),
814
+ torch.tensor(d_enc["attention_mask"]), task=TaskToken.DOC)
 
 
 
 
 
 
 
 
 
815
 
816
+ scores = q_embs @ d_embs.T
817
+ print(f"Relevance scores: {scores}")
 
 
818
  ```
819
 
820
+ ## Matryoshka Dimensionality Reduction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
821
 
822
+ ```python
823
+ full = model.encode(ids, mask, task=TaskToken.SYM) # (128d)
824
+ small = torch.nn.functional.normalize(full[:, :32], p=2, dim=-1) # (32d)
825
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
826
 
827
+ ## Architecture
828
 
829
+ | Component | Details |
830
+ |-----------|---------|
831
+ | Parameters | 2.32M |
832
+ | Layers | 2 |
833
+ | Hidden dim | 128 |
834
+ | Output dim | 128 |
835
+ | Heads | 2 |
836
+ | Max seq len | 1024 |
837
+ | Matryoshka | [32, 64, 128] |
838
+ | Pooling | Mean |
839
+ | Positional | RoPE |
840
+ | FFN | SwiGLU |
841
+ | Tokenizer | SentencePiece Unigram (30K) |
842
+
843
+ ## MTEB Results (54/54 tasks)
844
+
845
+ | Category | ogma-micro | Potion-32M | Potion-8M | vs P-32M |
846
+ |----------|------------|------------|-----------|----------|
847
+ | Classification | 59.5 | 66.0 | 64.5 | -6.5 |
848
+ | Clustering | 36.9 | 39.2 | 36.9 | -2.3 |
849
+ | PairClassification | 78.6 | 78.2 | 76.6 | +0.4 |
850
+ | Reranking | 49.7 | 50.9 | 49.7 | -1.2 |
851
+ | Retrieval | 33.1 | 32.2 | 30.4 | +0.9 |
852
+ | STS | 75.6 | 73.9 | 72.9 | +1.7 |
853
+ | Summarization | 31.8 | 29.8 | 29.3 | +2.0 |
854
+ | **Overall** | **49.77** | **51.22** | **49.58** | **-1.45** |
855
+
856
+ > Potion scores are locally reproduced using the same eval pipeline for fair comparison.
857
+
858
+ ## Ogma Model Family
859
+
860
+ | Model | Params | MTEB-54 | Best For |
861
+ |-------|--------|---------|----------|
862
+ | [ogma-large](https://huggingface.co/axiotic/ogma-large) | 32.37M | 57.38 | Maximum quality |
863
+ | [ogma-base](https://huggingface.co/axiotic/ogma-base) | 13.32M | 56.54 | General purpose |
864
+ | [ogma-small](https://huggingface.co/axiotic/ogma-small) | 8.60M | 55.79 | Best sub-10M |
865
+ | [ogma-mini](https://huggingface.co/axiotic/ogma-mini) | 3.51M | 51.42 | Edge deployment |
866
+ | [ogma-micro](https://huggingface.co/axiotic/ogma-micro) | 2.32M | 49.77 | Extreme edge |
867
 
868
  ## License
869
 
__init__.py ADDED
File without changes
embeddings.py CHANGED
@@ -5,7 +5,7 @@ from __future__ import annotations
5
  import torch
6
  import torch.nn as nn
7
 
8
- from ogma.model.config import OgmaConfig
9
 
10
  __all__ = ["TokenEmbedding", "RotaryPositionalEncoding"]
11
 
 
5
  import torch
6
  import torch.nn as nn
7
 
8
+ from .config import OgmaConfig
9
 
10
  __all__ = ["TokenEmbedding", "RotaryPositionalEncoding"]
11
 
ogma_model.py CHANGED
@@ -6,16 +6,16 @@ import torch
6
  import torch.nn as nn
7
  import torch.nn.functional as F
8
 
9
- from ogma.model.config import OgmaConfig, TaskToken, VariantType
10
- from ogma.model.embeddings import TokenEmbedding
11
- from ogma.model.pooling import create_pooling
12
- from ogma.model.variants.conv import ConvVariant
13
- from ogma.model.variants.deep_narrow import DeepNarrowVariant
14
- from ogma.model.variants.linear_attention import LinearAttentionVariant
15
- from ogma.model.variants.mlp_mixer import MLPMixerVariant
16
- from ogma.model.variants.transformer import TransformerVariant
17
- from ogma.model.variants.transformer_resa import TransformerReSAVariant
18
- from ogma.model.variants.gla import GLAVariant
19
 
20
  __all__ = ["OgmaModel"]
21
 
 
6
  import torch.nn as nn
7
  import torch.nn.functional as F
8
 
9
+ from .config import OgmaConfig, TaskToken, VariantType
10
+ from .embeddings import TokenEmbedding
11
+ from .pooling import create_pooling
12
+ from .variants.conv import ConvVariant
13
+ from .variants.deep_narrow import DeepNarrowVariant
14
+ from .variants.linear_attention import LinearAttentionVariant
15
+ from .variants.mlp_mixer import MLPMixerVariant
16
+ from .variants.transformer import TransformerVariant
17
+ from .variants.transformer_resa import TransformerReSAVariant
18
+ from .variants.gla import GLAVariant
19
 
20
  __all__ = ["OgmaModel"]
21
 
pooling.py CHANGED
@@ -6,7 +6,7 @@ import torch
6
  import torch.nn as nn
7
  import torch.nn.functional as F
8
 
9
- from ogma.model.config import OgmaConfig, PoolingType
10
 
11
  __all__ = [
12
  "create_pooling",
 
6
  import torch.nn as nn
7
  import torch.nn.functional as F
8
 
9
+ from .config import OgmaConfig, PoolingType
10
 
11
  __all__ = [
12
  "create_pooling",
variants/transformer.py CHANGED
@@ -8,8 +8,8 @@ import torch
8
  import torch.nn as nn
9
  import torch.nn.functional as F
10
 
11
- from ogma.model.config import OgmaConfig
12
- from ogma.model.embeddings import RotaryPositionalEncoding, apply_rope
13
 
14
  __all__ = ["TransformerVariant"]
15
 
 
8
  import torch.nn as nn
9
  import torch.nn.functional as F
10
 
11
+ from ..config import OgmaConfig
12
+ from ..embeddings import RotaryPositionalEncoding, apply_rope
13
 
14
  __all__ = ["TransformerVariant"]
15