visolex
/

visobert-hsd-span

@@ -1,16 +1,14 @@
 ---
-language: vi
 tags:
-- hate-speech-detection
 - vietnamese
-- transformer
-license: apache-2.0
 datasets:
 - visolex/ViHOS
-metrics:
-- precision
-- recall
-- f1
 model-index:
 - name: visobert-hsd-span
   results:
@@ -18,64 +16,66 @@ model-index:
       type: token-classification
       name: Hate Speech Span Detection
     dataset:
-      name: ViHOS
-      type: custom
     metrics:
-    - name: Precision
-      type: precision
-      value: <INSERT_PRECISION>
-    - name: Recall
-      type: recall
-      value: <INSERT_RECALL>
-    - name: F1 Score
-      type: f1
-      value: <INSERT_F1>
-base_model:
-- uitnlp/visobert
-pipeline_tag: token-classification
 ---
-# ViSoBERT-HSD-Span
-This model is fine-tuned from [`uitnlp/visobert`](https://huggingface.co/uitnlp/visobert) on the **visolex/ViHOS** dataset for span-level hate/offensive detection in Vietnamese comments.
 ## Model Details
-* **Base Model**: [`uitnlp/visobert`](https://huggingface.co/uitnlp/visobert)
-* **Dataset**: [visolex/ViHOS](https://huggingface.co/datasets/visolex/ViHOS)
-* **Fine-tuning**: HuggingFace Transformers
 ### Hyperparameters
-* Batch size: `16`
-* Learning rate: `5e-5`
-* Epochs: `100`
-* Max sequence length: `128`
-* Early stopping: `5`
 ## Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForTokenClassification
-tokenizer = AutoTokenizer.from_pretrained("visolex/visobert-hsd-span")
-model = AutoModelForTokenClassification.from_pretrained("visolex/visobert-hsd-span")
-text = "Nói cái lol . t thấy thô tục vl"
-inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
 with torch.no_grad():
-    outputs = model(**inputs)
-logits = outputs.logits  # [batch, seq_len, num_labels]
-# For binary: use sigmoid, for multi-class: use softmax+argmax
-probs = torch.sigmoid(logits)
-preds = (probs > 0.5).long().squeeze().tolist()  # [seq_len]
-tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
-span_labels = [p[0] for p in preds]
-# Lấy token có nhãn span = 1, loại bỏ <s> và </s> nếu muốn
-span_tokens = [token for token, label in zip(tokens, span_labels) if label == 1 and token not in ['<s>', '</s>']]
-print("Span tokens:", span_tokens)
-print("Span text:", tokenizer.convert_tokens_to_string(span_tokens))
-```

 ---
+license: apache-2.0
+base_model: visobert
 tags:
 - vietnamese
+- hate-speech
+- span-detection
+- token-classification
+- nlp
 datasets:
 - visolex/ViHOS
 model-index:
 - name: visobert-hsd-span
   results:
       type: token-classification
       name: Hate Speech Span Detection
     dataset:
+      name: visolex/ViHOS
+      type: visolex/ViHOS
     metrics:
+      - type: f1
+        value: N/A
+      - type: precision
+        value: N/A
+      - type: recall
+        value: N/A
+      - type: exact_match
+        value: 0.1230
 ---
+# visobert-hsd-span: Hate Speech Span Detection (Vietnamese)
+This model is a fine-tuned version of [visobert](https://huggingface.co/visobert) for Vietnamese **Hate Speech Span Detection**.
 ## Model Details
+- Base Model: `visobert`
+- Description: Vietnamese Hate Speech Span Detection
+- Framework: HuggingFace Transformers
+- Task: Hate Speech Span Detection (token/char-level spans)
 ### Hyperparameters
+- Max sequence length: `64`
+- Learning rate: `5e-6`
+- Batch size: `32`
+- Epochs: `100`
+- Early stopping patience: `5`
+## Results
+- F1: `N/A`
+- Precision: `N/A`
+- Recall: `N/A`
+- Exact Match: `0.1230`
 ## Usage
 ```python
 from transformers import AutoTokenizer, AutoModelForTokenClassification
+import torch
+model_name = "visobert-hsd-span"
+tok = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForTokenClassification.from_pretrained(model_name)
+text = "Ví dụ câu tiếng Việt có nội dung thù ghét ..."
+enc = tok(text, return_tensors="pt", truncation=True, max_length=256, is_split_into_words=False)
 with torch.no_grad():
+    logits = model(**enc).logits
+    pred_ids = logits.argmax(-1)[0].tolist()
+# TODO: chuyển pred_ids -> spans theo scheme nhãn của bạn (BIO/BILOU/char-offset)
+```
+## License
+Apache-2.0
+## Acknowledgments
+- Base model: [visobert](https://huggingface.co/visobert)