maya2vec / README.md
alemol's picture
Update README.md
862d9e7 verified
---
license: mit
language:
- yua
metrics:
- spearman rho
tags:
- maya
- yucatec maya
- embeddings
- low-resource languages
- indigenous languages
---
# maya2vec
maya2vec is a model to encode word embeddings in Maya.
![word embeddings in Maya](peek.png)
maya2vec embeddings use 512 dimensions and were trained using the Skip-gram with Negative Sampling algorithm (SGNS) on data from La Jornada Maya (collaboration agreement), CENTROGEO - SEDECULTA phrases (referenced in Agreement SEDECULTA-DASJ-149-04-2024) and [T'aantsil corpus project](https://taantsil.com.mx/info).
## Dependencies
Install gensim 4.0 version or greater.
```
$ pip install gensim
```
## Usage
See usage.py
```
import gensim
from gensim.models import Word2Vec
maya2vec = './model_512_60_5_-0.25_0.7308_3.35E-05'
# load global model
model = Word2Vec.load(maya2vec)
# Try out cosine similarity (dog, standing)
sim = model.wv.similarity("peek'", "waalak'")
print('''similarity("peek'", "waalak'")''', sim)
# Similarity between 'peek'' and 'waalak'': 0.9583
```
Cite the paper please: https://journal.iberamia.org/index.php/intartif/article/view/2119
```
Molina-Villegas, A., et al. (2025). Generating a Culturally and Linguistically Adapted Word Similarity Benchmark for Yucatec Maya. Inteligencia Artificial, 28(76), 283–300. https://doi.org/10.4114/intartif.vol28iss76pp283-300
@article{maya2vec,
title={Generating a Culturally and Linguistically Adapted Word Similarity Benchmark for Yucatec Maya},
author={Molina-Villegas, Alejandro and Suro-Villalobos, Joel and Reyes-Magaña, Jorge and Fernandez-Sabido, Silvia},
journal={Inteligencia Artificial},
volume={28},
number={76}
pages={283–300},
year={2025},
publisher={IBERAMIA},
DOI={10.4114/intartif.vol28iss76pp283-300}
}
```
## License
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.