Model Card: Rocky-Embed

Model Description

rocky-embed is a custom, lightweight Transformer-based text embedding model. It was trained via knowledge distillation using the CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary dataset as a teacher. The model maps sentences and paragraphs to a 1024-dimensional dense vector space and can be used for tasks like clustering or semantic search.

Architecture Highlights:

  • Custom Transformer Blocks: Uses RMSNorm for layer normalization and GELU activations.
  • Positional Embeddings: Implements Rotary Positional Embeddings (RoPE).
  • Attention: Uses QK Normalization with a learnable temperature parameter.
  • Parameters:
    • Dimensions: 768
    • Depth: 12 layers
    • Heads: 12
    • Projection Dimension: 1024 (matching the teacher model)

Training Details

  • Dataset: Trained on English Wikipedia snippets.
  • Objective: Direct Mean Squared Error (MSE) distillation from the normalized embeddings of the teacher model.
  • Optimizer: AdamW with linear learning rate decay and warmup.

Evaluation Results (STSb)

  • Spearman Correlation: 0.5453

How to Use

You can load this model directly from the Hugging Face Hub using the transformers library. Since this model uses a custom architecture (RockyForEmbeddings), you must pass trust_remote_code=True when loading it.

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModel

# 1. Load the tokenizer and model
model_id = "pranavupadhyaya52/rocky-embed"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Important: Set trust_remote_code=True to use the custom Rocky architecture
model = AutoModel.from_pretrained(model_id, trust_remote_code=True)

model.eval()

# 2. Prepare your input texts
queries = [
    "What is the capital of France?",
    "Paris is the capital of France.",
    "A completely unrelated sentence about dogs."
]

# 3. Tokenize
inputs = tokenizer(
    queries,
    padding="max_length",
    truncation=True,
    max_length=64,
    return_tensors="pt"
)

# 4. Generate Embeddings
with torch.no_grad():
    # The model outputs the normalized pooled embeddings directly
    embeddings = model(inputs["input_ids"], inputs["attention_mask"])

print("Embeddings shape:", embeddings.shape)

# 5. Compute cosine similarities
query_emb = embeddings[0].unsqueeze(0)
option_embs = embeddings[1:]
similarities = F.cosine_similarity(query_emb, option_embs)

print(f"\nSimilarity with '{queries[1]}': {similarities[0]:.4f}")
print(f"Similarity with '{queries[2]}': {similarities[1]:.4f}")
Downloads last month
-
Safetensors
Model size
90.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support