| | --- |
| | license: mit |
| | --- |
| | |
| | # Bengali Word2Vec Model |
| | This is a pre-trained word2vec model for Bengali language. |
| |
|
| | This model is build for [bnlp](https://github.com/sagorbrur/bnlp) package. |
| |
|
| | ## Datasets |
| | - [Wikipedia dump datasets](https://dumps.wikimedia.org/bnwiki/latest/) |
| |
|
| | ## Training details |
| | - Word2Vec word embedding dimension = 100, min_count=5, window=5, epochs=10 |
| | |
| | ## Usage |
| | - `pip install -U bnlp_toolkit` |
| | - Generate Vector using pretrain model |
| |
|
| | ```py |
| | from bnlp import BengaliWord2Vec |
| | |
| | bwv = BengaliWord2Vec() |
| | model_path = "bengali_word2vec.model" |
| | word = 'গ্রাম' |
| | vector = bwv.generate_word_vector(model_path, word) |
| | print(vector.shape) |
| | print(vector) |
| | |
| | ``` |
| | |
| | - Find Most Similar Word Using Pretrained Model |
| |
|
| | ```py |
| | from bnlp import BengaliWord2Vec |
| | |
| | bwv = BengaliWord2Vec() |
| | model_path = "bengali_word2vec.model" |
| | word = 'গ্রাম' |
| | similar = bwv.most_similar(model_path, word, topn=10) |
| | print(similar) |
| | |
| | ``` |