Hailay commited on
Commit
014222a
·
verified ·
1 Parent(s): cb2baa9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -7
README.md CHANGED
@@ -1,11 +1,21 @@
1
-
 
 
 
 
 
 
 
 
 
 
2
  # Geez Word2Vec Model
3
 
4
  This repository contains a Word2Vec model trained on the TIGQA dataset using a custom tokenizer with SpaCy.
5
 
6
  ## Model Description
7
 
8
- The Word2Vec model in this repository has been trained to generate word embeddings for Geez script Tigrinya text . The model captures semantic relationships between words in the Geez language based on their context in the TIGQA dataset.
9
 
10
  ## Usage
11
 
@@ -24,18 +34,20 @@ from gensim.models import Word2Vec
24
  # Load the trained Word2Vec model
25
  model = Word2Vec.load("Geez_word2vec_skipgram.model")
26
 
27
- # Get vector for a word
28
  word_vector = model.wv['ሰብ']
29
  print(f"Vector for 'ሰብ': {word_vector}")
30
 
31
- # Find most similar words
32
  similar_words = model.wv.most_similar('ሰብ')
33
  print(f"Words similar to 'ሰብ': {similar_words}")
34
 
35
  Dataset Source
36
- The TIGQA dataset used for training this model contains text data in the Geez script Tigrinya. It is a publicly available dataset widely used for research and development of NLP models for the Tigrinya language.
 
 
37
 
38
- For more information about the TIGQA dataset, visit this link.
39
 
40
  License
41
- This Word2Vec model and its associated files are released under the MIT License.
 
1
+ ---
2
+ datasets:
3
+ - Hailay/TigQA
4
+ language:
5
+ - ti
6
+ ---
7
+ datasets:
8
+ - Hailay/TigQA
9
+ language:
10
+ - ti
11
+ ---
12
  # Geez Word2Vec Model
13
 
14
  This repository contains a Word2Vec model trained on the TIGQA dataset using a custom tokenizer with SpaCy.
15
 
16
  ## Model Description
17
 
18
+ The Word2Vec model in this repository has been trained to generate word embeddings for Geez script Tigrinya text. The model captures semantic relationships between words in the Geez language based on their context in the TIGQA dataset.
19
 
20
  ## Usage
21
 
 
34
  # Load the trained Word2Vec model
35
  model = Word2Vec.load("Geez_word2vec_skipgram.model")
36
 
37
+ # Get a vector for a word
38
  word_vector = model.wv['ሰብ']
39
  print(f"Vector for 'ሰብ': {word_vector}")
40
 
41
+ # Find the most similar words
42
  similar_words = model.wv.most_similar('ሰብ')
43
  print(f"Words similar to 'ሰብ': {similar_words}")
44
 
45
  Dataset Source
46
+
47
+ The TIGQA dataset for training this model contains text data in the Geez script of the Tigrinya language.
48
+ It is a publicly available dataset widely used for research and development of NLP models for the Tigrinya language.
49
 
50
+ For more information about the TIGQA dataset, visit this link. https://zenodo.org/records/11423987
51
 
52
  License
53
+ This Word2Vec model and its associated files are released under the MIT License.