onelevelstudio
/

NLPT

Model card Files Files and versions

NLPT / README.md

baobuiquang's picture

Update README.md

eb75d8d verified 6 months ago

|

history blame contribute delete

2.77 kB

	---
	language:
	- vi
	- en
	---

	# NLPT

	\| Language \| Dataset \| Source \| Download \|
	\|----------\|-------------\|-------------------------------------------------------------\|--------------------------------------------------------------------------------------------\|
	\| `all` \| Punctuation \| \| [`PUNCTUATION.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/PUNCTUATION.txt) \|
	\| `vi` \| Synonyms \| [source](https://tudiendongnghia.com) \| [`VI_SYNONYMS.json`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_SYNONYMS.json) \|
	\| `vi` \| Vocab \| [source](https://github.com/duyet/vietnamese-wordlist) \| [`VI_VOCAB.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_VOCAB.txt) \|
	\| `vi` \| Diacritics \| \| [`VI_DIACRITICS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_DIACRITICS.txt) \|
	\| `vi` \| Stopwords \| [source](https://github.com/stopwords/vietnamese-stopwords) \| [`VI_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/VI_STOPWORDS.txt) \|
	\| `en` \| Stopwords \| nltk \| [`EN_STOPWORDS.txt`](https://huggingface.co/onelevelstudio/NLPT/raw/main/EN_STOPWORDS.txt) \|

	## Short-term Usage
	```python
	import requests
	punctuation = requests.get("https://huggingface.co/onelevelstudio/NLPT/raw/main/PUNCTUATION.txt").text.splitlines()
	```

	## Long-term Usage
	```python
	from huggingface_hub import hf_hub_download as HF_Download
	import json

	with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="PUNCTUATION.txt"), mode="r", encoding="utf-8") as f:
	DATASET_punctuation = set(f.read().splitlines())

	with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_DIACRITICS.txt"), mode="r", encoding="utf-8") as f:
	DATASET_diacritics_vi = f.read().splitlines()

	with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_VOCAB.txt"), mode="r", encoding="utf-8") as f:
	DATASET_vocab_vi = f.read().splitlines()

	with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_STOPWORDS.txt"), mode="r", encoding="utf-8") as f:
	DATASET_stopwords_vi = f.read().splitlines()

	with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="EN_STOPWORDS.txt"), mode="r", encoding="utf-8") as f:
	DATASET_stopwords_en = f.read().splitlines()

	with open(HF_Download(repo_id="onelevelstudio/NLPT", filename="VI_SYNONYMS.json"), mode="r", encoding="utf-8") as f:
	DATASET_synonyms_vi = json.load(f)
	```