--- license: cc-by-sa-4.0 datasets: - procesaur/Vikipedija - procesaur/Vikizvornik - procesaur/ZNANJE - jerteh/SrpELTeC - procesaur/kisobran language: - sr ---

Word2Vec Sr

Обучаван над корпусом српског језика - 9.5 милијарди речи

Међу датотекама се налазе два модела (CBOW и SkipGram варијанте)

Trained on the Serbian language corpus - 9.5 billion words

There are two models among the files (CBOW and SkipGram variants)

```python from gensim.models import Word2Vec model = Word2Vec.load("TeslaW2Vleme") examples = [ ("dim", "zavesa"), ("staklo", "zavesa"), ("ormar", "zavesa"), ("prozor", "zavesa"), ("draperija", "zavesa") ] for e in examples: model.wv.similarity(e[0], e[1])) ``` ``` 0.5193785 0.5763144 0.59982747 0.6022524 0.7117646 ```

Author

Mihailo Škorić

@procesaur

Computation

TESLA project

@te-sla

```bibtex @inproceedings{stankovic-dict2vec, author = {Ranka Stanković, Jovana Rađenović, Mihailo Škorić, Marko Putniković}, title = {Learning Word Embeddings using Lexical Resources and Corpora}, booktitle = {15th International Conference on Information Society and Technology, ISIST 2025, Kopaonik}, year = {2025}, address = {Kopaonik, Belgrade} publisher = {SASA, Belgrade}, url = {https://doi.org/10.5281/zenodo.15093900} } ```

Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA

This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA