Cosine-Embed
Cosine-Embed is a PyTorch sentence embedding model trained to place similar texts close together in an embedding space. The model outputs L2-normalized vectors so cosine similarity is computed as a dot product.
What it produces
- Input: tokenized text (
input_ids,attention_mask) - Output: an embedding vector of size
hidden_dimwith L2 normalization - Cosine similarity:
cos(a, b) = embedding(a) · embedding(b)
Model details
- Transformer blocks (custom implementation using RMSNorm, RoPE positional encoding, and SwiGLU feed-forward)
- Masked mean pooling over token embeddings
- Final L2 normalization
Default configuration
These parameters are used in Notebooks/Training.ipynb:
vocab_size: 30522seq_len: 128hidden_dim: 512n_heads: 8n_layer: 3ff_dim: 2048eps: 1e-5dropout: 0.1
Training objective
The model is trained with triplet loss on cosine similarity:
loss = max(0, sim(anchor, negative) - sim(anchor, positive) + margin)
Checkpoints
checkpoints/checkpoint.pt: training checkpoint (model, optimizer, losses, and configs)checkpoints/model.safetensors: weights-only export for inference
Minimal inference
import torch
from transformers import AutoTokenizer
from safetensors.torch import load_file
from Architecture import EmbeddingModel, ModelConfig
device = "cuda" if torch.cuda.is_available() else "cpu"
state_dict = load_file("checkpoints/model.safetensors")
cfg = ModelConfig(
vocab_size=30522,
seq_len=128,
hidden_dim=512,
n_heads=8,
n_layer=3,
eps=1e-5,
ff_dim=2048,
dropout=0.1,
)
model = EmbeddingModel(cfg).to(device)
model.load_state_dict(state_dict)
model.eval()
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
def embed(texts):
enc = tokenizer(
texts,
padding=True,
truncation=True,
max_length=128,
return_tensors="pt",
)
enc = {k: v.to(device) for k, v in enc.items()}
with torch.no_grad():
return model(enc["input_ids"], enc["attention_mask"]) # normalized
def cosine_similarity(a, b):
ea = embed([a])[0]
eb = embed([b])[0]
return float((ea * eb).sum().item())
Notes
- Use the same tokenizer (
bert-base-uncased) and the samemax_length=128(or keepseq_lenand preprocessing consistent).
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support