|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- yua |
|
|
metrics: |
|
|
- spearman rho |
|
|
tags: |
|
|
- maya |
|
|
- yucatec maya |
|
|
- embeddings |
|
|
- low-resource languages |
|
|
- indigenous languages |
|
|
--- |
|
|
|
|
|
# maya2vec |
|
|
|
|
|
maya2vec is a model to encode word embeddings in Maya. |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
maya2vec embeddings use 512 dimensions and were trained using the Skip-gram with Negative Sampling algorithm (SGNS) on data from La Jornada Maya (collaboration agreement), CENTROGEO - SEDECULTA phrases (referenced in Agreement SEDECULTA-DASJ-149-04-2024) and [T'aantsil corpus project](https://taantsil.com.mx/info). |
|
|
|
|
|
|
|
|
## Dependencies |
|
|
|
|
|
Install gensim 4.0 version or greater. |
|
|
|
|
|
``` |
|
|
$ pip install gensim |
|
|
|
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
## Usage |
|
|
|
|
|
See usage.py |
|
|
|
|
|
``` |
|
|
import gensim |
|
|
from gensim.models import Word2Vec |
|
|
|
|
|
maya2vec = './model_512_60_5_-0.25_0.7308_3.35E-05' |
|
|
|
|
|
# load global model |
|
|
model = Word2Vec.load(maya2vec) |
|
|
|
|
|
# Try out cosine similarity (dog, standing) |
|
|
sim = model.wv.similarity("peek'", "waalak'") |
|
|
print('''similarity("peek'", "waalak'")''', sim) |
|
|
|
|
|
# Similarity between 'peek'' and 'waalak'': 0.9583 |
|
|
``` |
|
|
|
|
|
Cite the paper please: https://journal.iberamia.org/index.php/intartif/article/view/2119 |
|
|
|
|
|
``` |
|
|
Molina-Villegas, A., et al. (2025). Generating a Culturally and Linguistically Adapted Word Similarity Benchmark for Yucatec Maya. Inteligencia Artificial, 28(76), 283–300. https://doi.org/10.4114/intartif.vol28iss76pp283-300 |
|
|
|
|
|
@article{maya2vec, |
|
|
title={Generating a Culturally and Linguistically Adapted Word Similarity Benchmark for Yucatec Maya}, |
|
|
author={Molina-Villegas, Alejandro and Suro-Villalobos, Joel and Reyes-Magaña, Jorge and Fernandez-Sabido, Silvia}, |
|
|
journal={Inteligencia Artificial}, |
|
|
volume={28}, |
|
|
number={76} |
|
|
pages={283–300}, |
|
|
year={2025}, |
|
|
publisher={IBERAMIA}, |
|
|
DOI={10.4114/intartif.vol28iss76pp283-300} |
|
|
} |
|
|
|
|
|
``` |
|
|
|
|
|
|
|
|
## License |
|
|
|
|
|
Permission is hereby granted, free of charge, to any person obtaining a copy |
|
|
of this software and associated documentation files (the "Software"), to deal |
|
|
in the Software without restriction, including without limitation the rights |
|
|
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell |
|
|
copies of the Software, and to permit persons to whom the Software is |
|
|
furnished to do so, subject to the following conditions: |
|
|
|
|
|
The above copyright notice and this permission notice shall be included in all |
|
|
copies or substantial portions of the Software. |
|
|
|
|
|
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR |
|
|
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, |
|
|
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE |
|
|
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER |
|
|
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, |
|
|
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE |
|
|
SOFTWARE. |
|
|
|