Spaces:

visolex
/

README

Running

App Files Files Community

AnnyNguyen commited on Jan 6

Commit

4e34b1c

verified ·

1 Parent(s): 3b185c6

Update README.md

Browse files

Files changed (1) hide show

README.md +30 -32

README.md CHANGED Viewed

@@ -7,54 +7,52 @@ sdk: static
 pinned: false
 ---
-# 📦 ViSoLex Toolkit — Vietnamese Text Normalization & Processing
-**ViSoLex** là một toolkit mạnh mẽ dành cho **chuẩn hóa và xử lý văn bản tiếng Việt**, được thiết kế tối ưu cho môi trường **NLP** và dễ dàng cài đặt qua **PyPI**.
-Các tài nguyên (datasets, models) được lưu trữ và quản lý trực tiếp trên [Hugging Face Hub](https://huggingface.co/visolex).
 ---
-## 🚀 Tính năng chính
-### 1. 🔧 **Basic Normalizer** — Chuẩn hóa văn bản cơ bản
-* **Case folding**: chuyển toàn bộ văn bản về lowercase/uppercase/capitalize.
-* **Tone normalization**: chuẩn hóa dấu thanh tiếng Việt.
-* **Basic preprocessing**: loại bỏ khoảng trắng thừa, ký tự đặc biệt, định dạng câu.
-### 2. 😀 **Emoji Handler** — Xử lý emoji
-* **Detect emojis**: phát hiện emoji trong văn bản.
-* **Split emoji text**: tách emoji ra khỏi câu.
-* **Remove emojis**: loại bỏ toàn bộ emoji.
-### 3. 📊 **Resource Management** — Quản lý dữ liệu
-* `list_datasets()` — Liệt kê datasets có sẵn.
-* `load_dataset()` — Tải dataset từ Hugging Face.
-* `get_dataset_info()` — Xem thông tin chi tiết dataset.
-### 4. 🧠 **Task Models** — Mô hình xử lý tác vụ
-* **SpamReviewDetection** — Phát hiện spam.
-* **HateSpeechDetection** — Phát hiện hate speech.
-* **EmotionRecognition** — Nhận diện cảm xúc.
-* **AspectSentimentAnalysis** — Phân tích sentiment theo từng khía cạnh.
-### 5. 🧪 **Advanced Usage** — Kết hợp & Tùy chỉnh
-* Tạo **multi-step pipelines** cho chuẩn hóa và phân tích.
-* Tùy chỉnh từng bước xử lý theo nhu cầu.
-### 6. ✏ **Lexical Normalization** — Chuẩn hóa văn bản mạng xã hội
-* `detect_nsw()` — Phát hiện từ phi chuẩn (non-standard words).
-* `normalize_sentence()` — Chuẩn hóa câu chứa từ phi chuẩn.
 ---
-## 📥 Cài đặt
 ```bash
-pip install visolex
 ```

 pinned: false
 ---
+# 📦 ViSoNorm Toolkit — Vietnamese Text Normalization & Processing
+**ViSoNorm** is a specialized toolkit for **Vietnamese text normalization and processing**, optimized for **NLP** environments and easily installable via **PyPI**. Resources (datasets, models) are stored and managed directly on **Hugging Face Hub** and **GitHub Releases**.
 ---
+## 🚀 Key Features
+### 1. 🔧 **BasicNormalizer** — Basic Text Normalization
+* **Case folding**: convert entire text to lowercase/uppercase/capitalize.
+* **Tone normalization**: normalize Vietnamese tone marks.
+* **Basic preprocessing**: remove extra whitespace, special characters, sentence formatting.
+### 2. 😀 **EmojiHandler** — Emoji Processing
+* **Detect emojis**: detect emojis in text.
+* **Split emoji text**: separate emojis from sentences.
+* **Remove emojis**: remove all emojis.
+### 3. ✏️ **Lexical Normalization** — Social Media Text Normalization
+* **ViSoLexNormalizer**: Normalize text using deep learning models from HuggingFace.
+* **NswDetector**: Detect non-standard words (NSW).
+* **detect_nsw()**: Utility function to detect NSW.
+* **normalize_sentence()**: Utility function to normalize sentences.
+### 4. 📊 **Resource Management** — Dataset Management
+* `list_datasets()` — List available datasets.
+* `load_dataset()` — Load dataset from GitHub Releases.
+* `get_dataset_info()` — View detailed dataset information.
+### 5. 🧠 **Task Models** — Task Processing Models
+* **SpamReviewDetection** — Spam detection.
+* **HateSpeechDetection** — Hate speech detection.
+* **HateSpeechSpanDetection** — Hate speech span detection.
+* **EmotionRecognition** — Emotion recognition.
+* **AspectSentimentAnalysis** — Aspect-based sentiment analysis.
 ---
+## 📥 Installation
+### Install from PyPI (Recommended)
 ```bash
+pip install visonorm
 ```