cnmoro commited on
Commit
bf810b4
·
verified ·
1 Parent(s): cff9a03

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +42 -3
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - cnmoro/LexicalTriplets
5
+ language:
6
+ - en
7
+ - pt
8
+ pipeline_tag: feature-extraction
9
+ library_name: sentence-transformers
10
+ ---
11
+
12
+ This is a model trained on [cnmoro/LexicalTriplets](https://huggingface.co/datasets/cnmoro/LexicalTriplets) to produce lexical embeddings (not semantic!)
13
+
14
+ This can be used to compute lexical similarity between words or phrases.
15
+
16
+ Concept:
17
+ "Some text" will be similar to "Sm txt"
18
+ "King" will *not* be similar to "Queen" or "Royalty"
19
+ "Dog" will *not* be similar to "Animal"
20
+ "Doge" will be similar to "Dog"
21
+
22
+ This will be trained for 2 epochs. The current model here is the first one.
23
+
24
+ ```python
25
+ import torch
26
+ from transformers import AutoModel, AutoTokenizer
27
+
28
+ model_name = "cnmoro/LexicalEmbed-Base"
29
+
30
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
31
+ model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
32
+ model.eval()
33
+
34
+ texts = ["hello world", "hel wor"]
35
+ inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
36
+
37
+ with torch.no_grad():
38
+ embeddings = model(**inputs)
39
+
40
+ cosine_sim = torch.nn.functional.cosine_similarity(embeddings[0], embeddings[1], dim=0)
41
+ print(f"Cosine Similarity: {cosine_sim.item()}") # 0.8960
42
+ ```