gedeonmate
/

static_hungarian_bert

Model card Files Files and versions

static_hungarian_bert / README.md

gedeonmate's picture

Update README.md

8be7342 verified 4 months ago

|

history blame contribute delete

1.89 kB

	---
	license: apache-2.0
	language:
	- hu
	base_model:
	- SZTAKI-HLT/hubert-base-cc
	- FacebookAI/xlm-roberta-base
	---

	# 🧠 Static Word Embeddings for Hungarian (huBERT & XLM-RoBERTa)

	This repository contains static word embedding models extracted from the following BERT-based models:

	- [`SZTAKI-HLT/hubert-base-cc`](https://huggingface.co/SZTAKI-HLT/hubert-base-cc)
	- [`FacebookAI/xlm-roberta-base`](https://huggingface.co/FacebookAI/xlm-roberta-base)

	## 📦 Available Embedding Variants

	Each model is provided in three static embedding variants:

	- Decontextualized: Token embeddings extracted without any surrounding context.
	- Aggregate: Static embeddings computed by averaging token representations of different contexts the word appears in.
	- X2Static: Learned static embeddings trained via the X2Static method, designed to optimize static representations from contextual models.

	## 🧪 Use Case

	These embeddings were developed and evaluated as part of the paper: _A Comparative Analysis of Static Word Embeddings for Hungarian_ by Máté Gedeon. They can be used for intrinsic tasks (e.g., word analogies) and extrinsic tasks (e.g., POS tagging, NER) in Hungarian NLP applications.

	The paper can be found here: https://arxiv.org/abs/2505.07809

	The corresponding GitHub repository: https://github.com/gedeonmate/hungarian_static_embeddings

	## 🙏 Citation

	If you use these models, code, or any part of the accompanying materials in your research, please cite:

	```bibtex
	@article{Gedeon_2025,
	title={A Comparative Analysis of Static Word Embeddings for Hungarian},
	volume={17},
	ISSN={2061-2079},
	url={http://dx.doi.org/10.36244/ICJ.2025.2.4},
	DOI={10.36244/icj.2025.2.4},
	number={2},
	journal={Infocommunications Journal},
	publisher={Infocommunications Journal},
	author={Gedeon, Máté},
	year={2025},
	pages={28–34}
	}