onelevelstudio
/

NLPT

Model card Files Files and versions

baobuiquang commited on Jul 1, 2025

Commit

eb75d8d

·

verified ·

1 Parent(s): fb47cf1

Update README.md

Files changed (1) hide show

README.md +30 -0

README.md CHANGED Viewed

@@ -14,3 +14,33 @@ language:
 | `vi`     | Diacritics  |                                                             | [`VI_DIACRITICS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_DIACRITICS.txt) |
 | `vi`     | Stopwords   | [source](https://github.com/stopwords/vietnamese-stopwords) | [`VI_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_STOPWORDS.txt)   |
 | `en`     | Stopwords   | nltk                                                        | [`EN_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/EN_STOPWORDS.txt)   |

 | `vi`     | Diacritics  |                                                             | [`VI_DIACRITICS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_DIACRITICS.txt) |
 | `vi`     | Stopwords   | [source](https://github.com/stopwords/vietnamese-stopwords) | [`VI_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_STOPWORDS.txt)   |
 | `en`     | Stopwords   | nltk                                                        | [`EN_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/EN_STOPWORDS.txt)   |
+## Short-term Usage
+```python
+import requests
+punctuation = requests.get("https://huggingface.co/onelevelstudio/NLPT/raw/main/PUNCTUATION.txt").text.splitlines()
+```
+## Long-term Usage
+```python
+from huggingface_hub import hf_hub_download as HF_Download
+import json
+with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="PUNCTUATION.txt"), mode="r", encoding="utf-8") as f:
+    DATASET_punctuation = set(f.read().splitlines())
+with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_DIACRITICS.txt"), mode="r", encoding="utf-8") as f:
+    DATASET_diacritics_vi = f.read().splitlines()
+with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_VOCAB.txt"), mode="r", encoding="utf-8") as f:
+    DATASET_vocab_vi = f.read().splitlines()
+with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_STOPWORDS.txt"), mode="r", encoding="utf-8") as f:
+    DATASET_stopwords_vi = f.read().splitlines()
+with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="EN_STOPWORDS.txt"), mode="r", encoding="utf-8") as f:
+    DATASET_stopwords_en = f.read().splitlines()
+with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_SYNONYMS.json"), mode="r", encoding="utf-8") as f:
+    DATASET_synonyms_vi = json.load(f)
+```