baobuiquang commited on
Commit
eb75d8d
·
verified ·
1 Parent(s): fb47cf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -14,3 +14,33 @@ language:
14
  | `vi` | Diacritics | | [`VI_DIACRITICS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_DIACRITICS.txt) |
15
  | `vi` | Stopwords | [source](https://github.com/stopwords/vietnamese-stopwords) | [`VI_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_STOPWORDS.txt) |
16
  | `en` | Stopwords | nltk | [`EN_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/EN_STOPWORDS.txt) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  | `vi` | Diacritics | | [`VI_DIACRITICS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_DIACRITICS.txt) |
15
  | `vi` | Stopwords | [source](https://github.com/stopwords/vietnamese-stopwords) | [`VI_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_STOPWORDS.txt) |
16
  | `en` | Stopwords | nltk | [`EN_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/EN_STOPWORDS.txt) |
17
+
18
+ ## Short-term Usage
19
+ ```python
20
+ import requests
21
+ punctuation = requests.get("https://huggingface.co/onelevelstudio/NLPT/raw/main/PUNCTUATION.txt").text.splitlines()
22
+ ```
23
+
24
+ ## Long-term Usage
25
+ ```python
26
+ from huggingface_hub import hf_hub_download as HF_Download
27
+ import json
28
+
29
+ with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="PUNCTUATION.txt"), mode="r", encoding="utf-8") as f:
30
+ DATASET_punctuation = set(f.read().splitlines())
31
+
32
+ with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_DIACRITICS.txt"), mode="r", encoding="utf-8") as f:
33
+ DATASET_diacritics_vi = f.read().splitlines()
34
+
35
+ with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_VOCAB.txt"), mode="r", encoding="utf-8") as f:
36
+ DATASET_vocab_vi = f.read().splitlines()
37
+
38
+ with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_STOPWORDS.txt"), mode="r", encoding="utf-8") as f:
39
+ DATASET_stopwords_vi = f.read().splitlines()
40
+
41
+ with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="EN_STOPWORDS.txt"), mode="r", encoding="utf-8") as f:
42
+ DATASET_stopwords_en = f.read().splitlines()
43
+
44
+ with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_SYNONYMS.json"), mode="r", encoding="utf-8") as f:
45
+ DATASET_synonyms_vi = json.load(f)
46
+ ```