alemol commited on
Commit
4d8beca
·
verified ·
1 Parent(s): df4ec2a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -1
README.md CHANGED
@@ -9,4 +9,81 @@ tags:
9
  - yucatec maya
10
  - low-resource languages
11
  - indigenous languages
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - yucatec maya
10
  - low-resource languages
11
  - indigenous languages
12
+ ---
13
+
14
+ # maya2vec
15
+
16
+ maya2vec is a model to encode word embeddings in Maya.
17
+
18
+ ![word embeddings in Maya](peek.png)
19
+
20
+
21
+ maya2vec embeddings use 512 dimensions and were trained using the Skip-gram with Negative Sampling algorithm (SGNS) on data from La Jornada Maya (collaboration agreement), CENTROGEO - SEDECULTA phrases (referenced in Agreement SEDECULTA-DASJ-149-04-2024) and [T'aantsil corpus project](https://taantsil.com.mx/info).
22
+
23
+
24
+ ## Dependencies
25
+
26
+ Install gensim 4.0 version or greater.
27
+
28
+ ```
29
+ $ pip install gensim
30
+
31
+
32
+ ```
33
+
34
+
35
+ ## Usage
36
+
37
+ See usage.py
38
+
39
+ ```
40
+ import gensim
41
+ from gensim.models import Word2Vec
42
+
43
+ maya2vec = './model_512_60_5_-0.25_0.7308_3.35E-05'
44
+
45
+ # load global model
46
+ model = Word2Vec.load(maya2vec)
47
+
48
+ # Try out cosine similarity (dog, standing)
49
+ sim = model.wv.similarity("peek'", "waalak'")
50
+ print('''similarity("peek'", "waalak'")''', sim)
51
+
52
+ # Similarity between 'peek'' and 'waalak'': 0.9583
53
+ ```
54
+
55
+ Cite the paper please: https:publishing_soon/paper
56
+
57
+ ```
58
+ @article{maya2vec,
59
+ title={Generating Culturally and Linguistically Adapted Word Similarity Benchmarks for Indigenous Languages},
60
+ author={Molina-Villegas, Alejandro and Suro, Joel and Fernandez-Sabido, Silvia and Reyes-Magaña, Jorge},
61
+ journal={Inteligencia Artificial},
62
+ volume={29},
63
+ pages={103971},
64
+ year={2025},
65
+ publisher={IBERAMIA}
66
+ }
67
+
68
+ ```
69
+
70
+
71
+ ## License
72
+
73
+ Permission is hereby granted, free of charge, to any person obtaining a copy
74
+ of this software and associated documentation files (the "Software"), to deal
75
+ in the Software without restriction, including without limitation the rights
76
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
77
+ copies of the Software, and to permit persons to whom the Software is
78
+ furnished to do so, subject to the following conditions:
79
+
80
+ The above copyright notice and this permission notice shall be included in all
81
+ copies or substantial portions of the Software.
82
+
83
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
84
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
85
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
86
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
87
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
88
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
89
+ SOFTWARE.