Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- fa
|
| 4 |
+
pipeline_tag: feature-extraction
|
| 5 |
+
---
|
| 6 |
+
This is the original fasttext embedding model for Persian from [here](https://fasttext.cc/docs/en/crawl-vectors.html#models) loaded and converted using Gensim and exported to Hezar compatible format.
|
| 7 |
+
For more info, see [here](https://fasttext.cc/docs/en/support.html).
|
| 8 |
+
|
| 9 |
+
In order to use this model in Hezar you can simply use this piece of code:
|
| 10 |
+
```bash
|
| 11 |
+
pip install hezar
|
| 12 |
+
```
|
| 13 |
+
```python
|
| 14 |
+
from hezar import Embedding
|
| 15 |
+
|
| 16 |
+
fasttext = Embedding.load("hezarai/fasttext-fa-300")
|
| 17 |
+
# Get embedding vector
|
| 18 |
+
vector = fasttext("هزار")
|
| 19 |
+
# Find the word that doesn't match with the rest
|
| 20 |
+
doesnt_match = fasttext.doesnt_match(["خانه", "اتاق", "ماشین"])
|
| 21 |
+
# Find the top-n most similar words to the given word
|
| 22 |
+
most_similar = fasttext.most_similar("هزار", top_n=5)
|
| 23 |
+
# Find the cosine similarity value between two words
|
| 24 |
+
similarity = fasttext.similarity("مهندس", "دکتر")
|
| 25 |
+
```
|