|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- hu |
|
|
base_model: |
|
|
- SZTAKI-HLT/hubert-base-cc |
|
|
- FacebookAI/xlm-roberta-base |
|
|
--- |
|
|
|
|
|
# 🧠 Static Word Embeddings for Hungarian (huBERT & XLM-RoBERTa) |
|
|
|
|
|
This repository contains static word embedding models extracted from the following BERT-based models: |
|
|
|
|
|
- [`SZTAKI-HLT/hubert-base-cc`](https://huggingface.co/SZTAKI-HLT/hubert-base-cc) |
|
|
- [`FacebookAI/xlm-roberta-base`](https://huggingface.co/FacebookAI/xlm-roberta-base) |
|
|
|
|
|
## 📦 Available Embedding Variants |
|
|
|
|
|
Each model is provided in three static embedding variants: |
|
|
|
|
|
- **Decontextualized**: Token embeddings extracted without any surrounding context. |
|
|
- **Aggregate**: Static embeddings computed by averaging token representations of different contexts the word appears in. |
|
|
- **X2Static**: Learned static embeddings trained via the **X2Static** method, designed to optimize static representations from contextual models. |
|
|
|
|
|
## 🧪 Use Case |
|
|
|
|
|
These embeddings were developed and evaluated as part of the paper: **_A Comparative Analysis of Static Word Embeddings for Hungarian_** by *Máté Gedeon*. They can be used for intrinsic tasks (e.g., word analogies) and extrinsic tasks (e.g., POS tagging, NER) in Hungarian NLP applications. |
|
|
|
|
|
The paper can be found here: https://arxiv.org/abs/2505.07809 |
|
|
|
|
|
The corresponding GitHub repository: https://github.com/gedeonmate/hungarian_static_embeddings |
|
|
|
|
|
## 🙏 Citation |
|
|
|
|
|
If you use these models, code, or any part of the accompanying materials in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{Gedeon_2025, |
|
|
title={A Comparative Analysis of Static Word Embeddings for Hungarian}, |
|
|
volume={17}, |
|
|
ISSN={2061-2079}, |
|
|
url={http://dx.doi.org/10.36244/ICJ.2025.2.4}, |
|
|
DOI={10.36244/icj.2025.2.4}, |
|
|
number={2}, |
|
|
journal={Infocommunications Journal}, |
|
|
publisher={Infocommunications Journal}, |
|
|
author={Gedeon, Máté}, |
|
|
year={2025}, |
|
|
pages={28–34} |
|
|
} |