File size: 2,828 Bytes
df4ec2a
 
 
 
 
 
 
 
 
f39b6b6
df4ec2a
 
4d8beca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
862d9e7
4d8beca
 
862d9e7
 
4d8beca
70e3e9b
862d9e7
4d8beca
862d9e7
 
 
4d8beca
862d9e7
 
4d8beca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
license: mit
language:
- yua
metrics:
- spearman rho
tags:
- maya
- yucatec maya
- embeddings
- low-resource languages
- indigenous languages
---

# maya2vec

maya2vec is a model to encode word embeddings in Maya.

![word embeddings in Maya](peek.png)


maya2vec embeddings use 512 dimensions and were trained using the Skip-gram with Negative Sampling algorithm (SGNS) on data from La Jornada Maya (collaboration agreement), CENTROGEO - SEDECULTA phrases (referenced in Agreement SEDECULTA-DASJ-149-04-2024) and [T'aantsil corpus project](https://taantsil.com.mx/info).


## Dependencies

Install gensim 4.0 version or greater.

```
$ pip install gensim


```


## Usage

See usage.py

```
import gensim
from gensim.models import Word2Vec

maya2vec = './model_512_60_5_-0.25_0.7308_3.35E-05'

# load global model
model = Word2Vec.load(maya2vec)

# Try out cosine similarity (dog, standing)
sim = model.wv.similarity("peek'", "waalak'")
print('''similarity("peek'", "waalak'")''', sim)

# Similarity between 'peek'' and 'waalak'': 0.9583
```

Cite the paper please: https://journal.iberamia.org/index.php/intartif/article/view/2119

```
Molina-Villegas, A., et al. (2025). Generating a Culturally and Linguistically Adapted Word Similarity Benchmark for Yucatec Maya. Inteligencia Artificial, 28(76), 283–300. https://doi.org/10.4114/intartif.vol28iss76pp283-300

@article{maya2vec,
  title={Generating a Culturally and Linguistically Adapted Word Similarity Benchmark for Yucatec Maya},
  author={Molina-Villegas, Alejandro and Suro-Villalobos, Joel and Reyes-Magaña, Jorge and Fernandez-Sabido, Silvia},
  journal={Inteligencia Artificial},
  volume={28},
  number={76}
  pages={283–300},
  year={2025},
  publisher={IBERAMIA},
  DOI={10.4114/intartif.vol28iss76pp283-300}
}

```


## License

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.