Improve model card: Add library_name, correct paper link, add code, model, and dataset links
Browse filesThis PR updates the model card for `tkhangg0910/viconbert-base` to enhance its discoverability and usefulness on the Hugging Face Hub.
Specifically, it includes the following improvements:
- Adds the `library_name: transformers` tag to the metadata, which enables the automated "How to use" widget on the model page. This is supported by the `transformers` imports in the `Example usage` code snippet and the `auto_map` entry in `config.json`.
- Corrects the paper link at the top of the model card to point to the official Hugging Face paper page: [ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations](https://huggingface.co/papers/2511.12249). The previous link incorrectly pointed to the model repository itself.
- Adds explicit links to the GitHub repository, the model itself, and the associated dataset at the top of the model card for improved navigation, aligning with the structure found in the original GitHub README.
- Updates the `ViConBERT models` table to include the `Backbone` column, providing more detailed information about the base models used, consistent with the original GitHub repository.
- Populates the empty `
|
@@ -1,19 +1,21 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- vi
|
| 5 |
base_model:
|
| 6 |
- vinai/phobert-large
|
|
|
|
|
|
|
|
|
|
| 7 |
pipeline_tag: feature-extraction
|
| 8 |
tags:
|
| 9 |
- bert
|
| 10 |
- wsd
|
| 11 |
- vietnamese
|
| 12 |
- semantic_similarity
|
|
|
|
| 13 |
---
|
|
|
|
| 14 |
# ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations
|
| 15 |
|
| 16 |
-
[Paper](https://huggingface.co/tkhangg0910/viconbert-base)
|
| 17 |
|
| 18 |
This repository is official implementation of the paper: ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations
|
| 19 |
|
|
@@ -42,10 +44,10 @@ pip3 install -r requirements.txt
|
|
| 42 |
### ViConBERT models <a name="models2"></a>
|
| 43 |
|
| 44 |
|
| 45 |
-
Model | #params | Arch. | Max length | Training data
|
| 46 |
-
---|---|---|---|---
|
| 47 |
-
[`tkhangg0910/viconbert-base`](https://huggingface.co/tkhangg0910/viconbert-base) | 135M | base | 256 | [ViConWSD](https://huggingface.co/datasets/tkhangg0910/ViConWSD)
|
| 48 |
-
[`tkhangg0910/viconbert-large`](https://huggingface.co/tkhangg0910/viconbert-large) | 370M | large | 256 | [ViConWSD](https://huggingface.co/datasets/tkhangg0910/ViConWSD)
|
| 49 |
|
| 50 |
|
| 51 |
### Example usage <a name="usage2"></a>
|
|
@@ -127,3 +129,17 @@ print(f"Similarity between 2: {target_2} and 3:{target_3}: {sim_2:.4f}")
|
|
| 127 |
<em>Contextual separation of "Khoan", "chạy", and zero-shot ability for unseen words</em>
|
| 128 |
</p>
|
| 129 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- vinai/phobert-large
|
| 4 |
+
language:
|
| 5 |
+
- vi
|
| 6 |
+
license: apache-2.0
|
| 7 |
pipeline_tag: feature-extraction
|
| 8 |
tags:
|
| 9 |
- bert
|
| 10 |
- wsd
|
| 11 |
- vietnamese
|
| 12 |
- semantic_similarity
|
| 13 |
+
library_name: transformers
|
| 14 |
---
|
| 15 |
+
|
| 16 |
# ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations
|
| 17 |
|
| 18 |
+
[Paper](https://huggingface.co/papers/2511.12249) | [Code](https://github.com/tkhangg0910/ViConBERT) | [Model](https://huggingface.co/tkhangg0910/viconbert-base) | [Dataset](https://huggingface.co/datasets/tkhangg0910/ViConWSD)
|
| 19 |
|
| 20 |
This repository is official implementation of the paper: ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations
|
| 21 |
|
|
|
|
| 44 |
### ViConBERT models <a name="models2"></a>
|
| 45 |
|
| 46 |
|
| 47 |
+
Model | #params | Arch. | Max length | Backbone | Training data
|
| 48 |
+
---|---|---|---|---|---
|
| 49 |
+
[`tkhangg0910/viconbert-base`](https://huggingface.co/tkhangg0910/viconbert-base) | 135M | base | 256 | [PhoBERT-base](https://huggingface.co/vinai/phobert-base) | [ViConWSD](https://huggingface.co/datasets/tkhangg0910/ViConWSD)
|
| 50 |
+
[`tkhangg0910/viconbert-large`](https://huggingface.co/tkhangg0910/viconbert-large) | 370M | large | 256 | [PhoBERT-large](https://huggingface.co/vinai/phobert-large) | [ViConWSD](https://huggingface.co/datasets/tkhangg0910/ViConWSD)
|
| 51 |
|
| 52 |
|
| 53 |
### Example usage <a name="usage2"></a>
|
|
|
|
| 129 |
<em>Contextual separation of "Khoan", "chạy", and zero-shot ability for unseen words</em>
|
| 130 |
</p>
|
| 131 |
|
| 132 |
+
## Citation
|
| 133 |
+
If you find ViConBERT useful for your research and applications, please cite using this BibTeX:
|
| 134 |
+
|
| 135 |
+
```bibtex
|
| 136 |
+
@article{tkhangg09102025viconbert,
|
| 137 |
+
title={ViConBERT: Context-Gloss Aligned Vietnamese Word Embedding for Polysemous and Sense-Aware Representations},
|
| 138 |
+
author={Tkhangg0910 and {others}},
|
| 139 |
+
journal={arXiv preprint arXiv:2511.12249},
|
| 140 |
+
year={2025}
|
| 141 |
+
}
|
| 142 |
+
```
|
| 143 |
+
|
| 144 |
+
## Acknowledgement
|
| 145 |
+
[PhoBERT](https://github.com/VinAIResearch/PhoBERT): ViConBERT used PhoBERT as backbone model.
|