Fix: relative imports for Hub loading, corrected model cards with accurate numbers and usage examples

Browse files

Files changed (6) hide show

README.md +89 -146
__init__.py +0 -0
embeddings.py +1 -1
ogma_model.py +10 -10
pooling.py +1 -1
variants/transformer.py +2 -2

README.md CHANGED Viewed

@@ -744,183 +744,126 @@ model-index:
 # ogma-micro
-**2.3M parameter text embedding model** by [Axiotic AI](https://axiotic.ai), achieving **49.77 average** on MTEB English v1 (54/54 tasks).
-2-layer transformer, 128 hidden dim, 64 embedding dim — smallest model.
 ## Highlights
-- **2.3M parameters** — small enough for CPU inference, edge deployment, and resource-constrained environments
-- **49.77 MTEB average** — outperforms Potion-32M (51.22) despite being significantly smaller
-- **Matryoshka embeddings** — use dimensions [32, 64, 128] for flexible storage/compute tradeoffs
-- **Asymmetric encoding** — dedicated `[QRY]`, `[DOC]`, `[SYM]` task tokens for query-document and symmetric tasks
-- **1024 token context** — handles longer passages than typical small models (Potion: 512)
-- **Pure PyTorch** — no external transformer library dependencies
-## Architecture
-| Component | Details |
-|-----------|---------|
-| Parameters | 2.3M |
-| Layers | 2 |
-| Hidden dim (d_model) | 128 |
-| Embedding dim (d_embed) | 64 |
-| Output dim (d_output) | 128 |
-| Attention heads | 2 |
-| Max sequence length | 1024 |
-| Matryoshka dims | [32, 64, 128] |
-| Pooling | Mean (mask-aware) |
-| Position encoding | RoPE |
-| FFN | SwiGLU |
-| Normalization | Pre-LayerNorm |
-| Tokenizer | SentencePiece Unigram (30K vocab) |
-| Training | Knowledge distillation from teacher model |
-## MTEB Results
-### Category-Level Scores
-| Category | ogma-micro | Potion-32M | Potion-8M | vs Potion-32M |
-|----------|------------|------------|-----------|---------------|
-| Classification | **59.49** | 66.01 | 64.46 | -6.52 |
-| Clustering | **36.88** | 39.24 | 36.88 | -2.36 |
-| PairClassification | **78.62** | 78.17 | 76.62 | +0.45 |
-| Reranking | **49.74** | 50.92 | 49.73 | -1.18 |
-| Retrieval | **33.09** | 32.21 | 30.43 | +0.88 |
-| STS | **75.63** | 73.86 | 72.93 | +1.77 |
-| Summarization | **31.77** | 29.77 | 29.26 | +2.00 |
-| **Overall** | **49.77** | 51.22 | 49.58 | **-1.45** |
-> **Potion scores are locally reproduced** using the same evaluation pipeline and hardware for fair head-to-head comparison. These are not self-reported numbers from the Potion model card.
-## Usage
-### Quick Start
 ```python
 import torch
-import numpy as np
-from pathlib import Path
-# Load model
 from ogma_model import OgmaModel
-from config import OgmaConfig
 from tokenizer import OgmaTokenizer
-# Load from checkpoint directory
-model = OgmaModel.from_checkpoint("path/to/ogma-micro", device="cpu")
 model.eval()
-# Load tokenizer (uses the SentencePiece model embedded in tokenizer.json)
-# The tokenizer needs the .model file — extract from tokenizer.json or use:
-tokenizer = OgmaTokenizer("path/to/tokenizer.model")
 # Encode text
-texts = ["This is a query", "This is a document"]
-encoded = tokenizer.batch_encode(texts, max_length=1024)
-token_ids = torch.tensor(encoded["input_ids"])
-attention_mask = torch.tensor(encoded["attention_mask"])
-# Use task tokens for asymmetric encoding
-from config import TaskToken
 with torch.no_grad():
-    # For symmetric tasks (STS, clustering, classification)
-    embeddings = model.encode(token_ids, attention_mask, task=TaskToken.SYM)
-    # For retrieval — encode queries and documents separately
-    query_embs = model.encode(token_ids[:1], attention_mask[:1], task=TaskToken.QRY)
-    doc_embs = model.encode(token_ids[1:], attention_mask[1:], task=TaskToken.DOC)
-print(f"Embedding shape: {embeddings.shape}")  # (2, 128)
 ```
-### Matryoshka Dimensionality Reduction
 ```python
-# Full embeddings: 128d
-full_embs = model.encode(token_ids, attention_mask, task=TaskToken.SYM)
-# Reduce to any Matryoshka dimension: [32, 64, 128]
-dim = 64
-reduced_embs = torch.nn.functional.normalize(full_embs[:, :dim], p=2, dim=-1)
-# These reduced embeddings are trained to be effective at lower dims
-```
-### Loading with safetensors
-```python
-import torch
-import yaml
-from safetensors.torch import load_file
-from ogma_model import OgmaModel
-from config import OgmaConfig
-# Load config
-with open("path/to/ogma-micro/config.json") as f:
-    import json
-    config_dict = json.load(f)
-config = OgmaConfig.from_dict(config_dict)
-model = OgmaModel(config)
-# Load weights from safetensors
-state_dict = load_file("path/to/ogma-micro/model.safetensors")
-model.load_state_dict(state_dict)
-model.eval()
 ```
-## Task Tokens
-Ogma uses task-specific prefix tokens for asymmetric encoding:
-| Token | ID | Use Case |
-|-------|-----|----------|
-| `[QRY]` | 4 | Query encoding for retrieval |
-| `[DOC]` | 5 | Document/passage encoding for retrieval |
-| `[SYM]` | 6 | Symmetric tasks (STS, classification, clustering) |
-For retrieval tasks, encode queries with `[QRY]` and documents with `[DOC]`. For all other tasks, use `[SYM]`.
-## Training
-Ogma is trained via **knowledge distillation** from a larger teacher embedding model. The training pipeline:
-1. **Tokenizer**: SentencePiece Unigram model trained on the distillation corpus (30K vocab)
-2. **Token embeddings**: PCA-reduced embeddings from the teacher model, providing a strong initialization
-3. **Distillation**: MSE loss between student and teacher embeddings, with Matryoshka loss at multiple dimensions
-4. **Architecture**: Standard transformer encoder with RoPE positional encoding and SwiGLU FFN
-## Files
-| File | Description |
-|------|-------------|
-| `model.safetensors` | Model weights (safetensors format) |
-| `model.pt` | Model weights (PyTorch format) |
-| `config.json` | Model configuration |
-| `config.yaml` | Original training config |
-| `tokenizer.json` | HuggingFace tokenizer |
-| `tokenizer_config.json` | Tokenizer configuration |
-| `token_embeds_128d.npy` | Pre-computed token embeddings (30K × 128, float16) |
-| `ogma_model.py` | OgmaModel class |
-| `config.py` | OgmaConfig dataclass |
-| `embeddings.py` | Token embedding + RoPE |
-| `pooling.py` | Pooling strategies |
-| `variants/transformer.py` | Transformer encoder variant |
-| `tokenizer.py` | OgmaTokenizer wrapper |
-| `results/` | MTEB result JSONs |
-## Citation
-```bibtex
-@misc{ogma2026,
-  title={Ogma: Small High-Performance Text Embeddings},
-  author={Axiotic AI},
-  year={2026},
-  url={https://huggingface.co/axiotic/ogma-micro}
-}
-```
 ## License

 # ogma-micro
+**2.32M parameter text embedding model** by [Axiotic AI](https://axiotic.ai), achieving **49.77 average** on MTEB English (54/54 tasks).
+2-layer transformer, 128 hidden dim, mean pooling — smallest model for extreme edge deployment.
 ## Highlights
+- **49.77 MTEB average** — comparable to Potion-8M (49.58) at 3.4x fewer parameters
+- **Matryoshka embeddings** — dimensions [32, 64, 128] for flexible storage/compute tradeoffs
+- **Asymmetric encoding** — dedicated `[QRY]`, `[DOC]`, `[SYM]` task tokens
+- **1024 token context** — handles longer passages than typical small models
+- **HuggingFace Hub** — load directly, no local package installation needed
+## Quick Start
 ```python
 import torch
+from huggingface_hub import snapshot_download
+import sys, yaml
+# Download model from HuggingFace
+model_path = snapshot_download("axiotic/ogma-micro")
+sys.path.insert(0, model_path)
 from ogma_model import OgmaModel
+from config import OgmaConfig, TaskToken
 from tokenizer import OgmaTokenizer
+# Load model
+with open(f"{model_path}/config.yaml") as f:
+    cfg = yaml.safe_load(f)
+config = OgmaConfig.from_dict(cfg)
+model = OgmaModel(config)
+state = torch.load(f"{model_path}/model.pt", map_location="cpu", weights_only=True)
+model.load_state_dict(state)
 model.eval()
+# Load tokenizer
+tokenizer = OgmaTokenizer(f"{model_path}/tokenizer.json")
 # Encode text
+sentences = ["The quick brown fox", "A fast auburn canine"]
+enc = tokenizer.batch_encode(sentences, max_length=1024)
+ids = torch.tensor(enc["input_ids"])
+mask = torch.tensor(enc["attention_mask"])
 with torch.no_grad():
+    embs = model.encode(ids, mask, task=TaskToken.SYM)
+# Cosine similarity
+sim = torch.nn.functional.cosine_similarity(embs[0], embs[1], dim=0)
+print(f"Similarity: {sim.item():.4f}")
+print(f"Shape: {embs.shape}")  # (2, 128)
 ```
+## Retrieval (Asymmetric Encoding)
 ```python
+queries = ["What is machine learning?"]
+documents = ["ML is a subset of AI...", "The weather is sunny today"]
+q_enc = tokenizer.batch_encode(queries, max_length=1024)
+d_enc = tokenizer.batch_encode(documents, max_length=1024)
+with torch.no_grad():
+    q_embs = model.encode(torch.tensor(q_enc["input_ids"]),
+                           torch.tensor(q_enc["attention_mask"]), task=TaskToken.QRY)
+    d_embs = model.encode(torch.tensor(d_enc["input_ids"]),
+                           torch.tensor(d_enc["attention_mask"]), task=TaskToken.DOC)
+scores = q_embs @ d_embs.T
+print(f"Relevance scores: {scores}")
 ```
+## Matryoshka Dimensionality Reduction
+```python
+full = model.encode(ids, mask, task=TaskToken.SYM)       # (128d)
+small = torch.nn.functional.normalize(full[:, :32], p=2, dim=-1)  # (32d)
+```
+## Architecture
+| Component | Details |
+|-----------|---------|
+| Parameters | 2.32M |
+| Layers | 2 |
+| Hidden dim | 128 |
+| Output dim | 128 |
+| Heads | 2 |
+| Max seq len | 1024 |
+| Matryoshka | [32, 64, 128] |
+| Pooling | Mean |
+| Positional | RoPE |
+| FFN | SwiGLU |
+| Tokenizer | SentencePiece Unigram (30K) |
+## MTEB Results (54/54 tasks)
+| Category | ogma-micro | Potion-32M | Potion-8M | vs P-32M |
+|----------|------------|------------|-----------|----------|
+| Classification | 59.5 | 66.0 | 64.5 | -6.5 |
+| Clustering | 36.9 | 39.2 | 36.9 | -2.3 |
+| PairClassification | 78.6 | 78.2 | 76.6 | +0.4 |
+| Reranking | 49.7 | 50.9 | 49.7 | -1.2 |
+| Retrieval | 33.1 | 32.2 | 30.4 | +0.9 |
+| STS | 75.6 | 73.9 | 72.9 | +1.7 |
+| Summarization | 31.8 | 29.8 | 29.3 | +2.0 |
+| **Overall** | **49.77** | **51.22** | **49.58** | **-1.45** |
+> Potion scores are locally reproduced using the same eval pipeline for fair comparison.
+## Ogma Model Family
+| Model | Params | MTEB-54 | Best For |
+|-------|--------|---------|----------|
+| [ogma-large](https://huggingface.co/axiotic/ogma-large) | 32.37M | 57.38 | Maximum quality |
+| [ogma-base](https://huggingface.co/axiotic/ogma-base) | 13.32M | 56.54 | General purpose |
+| [ogma-small](https://huggingface.co/axiotic/ogma-small) | 8.60M | 55.79 | Best sub-10M |
+| [ogma-mini](https://huggingface.co/axiotic/ogma-mini) | 3.51M | 51.42 | Edge deployment |
+| [ogma-micro](https://huggingface.co/axiotic/ogma-micro) | 2.32M | 49.77 | Extreme edge |
 ## License

__init__.py ADDED Viewed

File without changes

embeddings.py CHANGED Viewed

@@ -5,7 +5,7 @@ from __future__ import annotations
 import torch
 import torch.nn as nn
-from ogma.model.config import OgmaConfig
 __all__ = ["TokenEmbedding", "RotaryPositionalEncoding"]

 import torch
 import torch.nn as nn
+from .config import OgmaConfig
 __all__ = ["TokenEmbedding", "RotaryPositionalEncoding"]

ogma_model.py CHANGED Viewed

@@ -6,16 +6,16 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F
-from ogma.model.config import OgmaConfig, TaskToken, VariantType
-from ogma.model.embeddings import TokenEmbedding
-from ogma.model.pooling import create_pooling
-from ogma.model.variants.conv import ConvVariant
-from ogma.model.variants.deep_narrow import DeepNarrowVariant
-from ogma.model.variants.linear_attention import LinearAttentionVariant
-from ogma.model.variants.mlp_mixer import MLPMixerVariant
-from ogma.model.variants.transformer import TransformerVariant
-from ogma.model.variants.transformer_resa import TransformerReSAVariant
-from ogma.model.variants.gla import GLAVariant
 __all__ = ["OgmaModel"]

 import torch.nn as nn
 import torch.nn.functional as F
+from .config import OgmaConfig, TaskToken, VariantType
+from .embeddings import TokenEmbedding
+from .pooling import create_pooling
+from .variants.conv import ConvVariant
+from .variants.deep_narrow import DeepNarrowVariant
+from .variants.linear_attention import LinearAttentionVariant
+from .variants.mlp_mixer import MLPMixerVariant
+from .variants.transformer import TransformerVariant
+from .variants.transformer_resa import TransformerReSAVariant
+from .variants.gla import GLAVariant
 __all__ = ["OgmaModel"]

pooling.py CHANGED Viewed

@@ -6,7 +6,7 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F
-from ogma.model.config import OgmaConfig, PoolingType
 __all__ = [
     "create_pooling",

 import torch.nn as nn
 import torch.nn.functional as F
+from .config import OgmaConfig, PoolingType
 __all__ = [
     "create_pooling",

variants/transformer.py CHANGED Viewed

@@ -8,8 +8,8 @@ import torch
 import torch.nn as nn
 import torch.nn.functional as F
-from ogma.model.config import OgmaConfig
-from ogma.model.embeddings import RotaryPositionalEncoding, apply_rope
 __all__ = ["TransformerVariant"]

 import torch.nn as nn
 import torch.nn.functional as F
+from ..config import OgmaConfig
+from ..embeddings import RotaryPositionalEncoding, apply_rope
 __all__ = ["TransformerVariant"]