cnmoro
/

LexicalEmbed-Base

Feature Extraction

sentence-transformers

lexical_embedding

Model card Files Files and versions

cnmoro commited on Dec 15, 2025

Commit

bf810b4

·

verified ·

1 Parent(s): cff9a03

Update README.md

Files changed (1) hide show

README.md +42 -3

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: mit
----

+---
+license: mit
+datasets:
+- cnmoro/LexicalTriplets
+language:
+- en
+- pt
+pipeline_tag: feature-extraction
+library_name: sentence-transformers
+---
+This is a model trained on [cnmoro/LexicalTriplets](https://huggingface.co/datasets/cnmoro/LexicalTriplets) to produce lexical embeddings (not semantic!)
+This can be used to compute lexical similarity between words or phrases.
+Concept:
+"Some text" will be similar to "Sm txt"
+"King" will *not* be similar to "Queen" or "Royalty"
+"Dog" will *not* be similar to "Animal"
+"Doge" will be similar to "Dog"
+This will be trained for 2 epochs. The current model here is the first one.
+```python
+import torch
+from transformers import AutoModel, AutoTokenizer
+model_name = "cnmoro/LexicalEmbed-Base"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
+model.eval()
+texts = ["hello world", "hel wor"]
+inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
+with torch.no_grad():
+    embeddings = model(**inputs)
+cosine_sim = torch.nn.functional.cosine_similarity(embeddings[0], embeddings[1], dim=0)
+print(f"Cosine Similarity: {cosine_sim.item()}") # 0.8960
+```