gedeonmate
/

static_hungarian_bert

Model card Files Files and versions

gedeonmate commited on Apr 18, 2025

Commit

469280b

·

verified ·

1 Parent(s): 10a643e

Update README.md

Files changed (1) hide show

README.md +29 -3

README.md CHANGED Viewed

@@ -1,3 +1,29 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- hu
+base_model:
+- SZTAKI-HLT/hubert-base-cc
+- FacebookAI/xlm-roberta-base
+---
+# 🧠 Static Word Embeddings for Hungarian (huBERT & XLM-RoBERTa)
+This repository contains static word embedding models extracted from the following BERT-based models:
+- [`SZTAKI-HLT/hubert-base-cc`](https://huggingface.co/SZTAKI-HLT/hubert-base-cc)
+- [`FacebookAI/xlm-roberta-base`](https://huggingface.co/FacebookAI/xlm-roberta-base)
+## 📦 Available Embedding Variants
+Each model is provided in three static embedding variants:
+- **Decontextualized**: Token embeddings extracted without any surrounding context.
+- **Aggregate**: Static embeddings computed by averaging token representations of different contexts the word appears in.
+- **X2Static**: Learned static embeddings trained via the **X2Static** method, designed to optimize static representations from contextual models.
+## 🧪 Use Case
+These embeddings were developed and evaluated as part of the paper: **_A Comparative Analysis of Static Word Embeddings for Hungarian_** by *Máté Gedeon*.
+They can be used for intrinsic tasks (e.g., word analogies) and extrinsic tasks (e.g., POS tagging, NER) in Hungarian NLP applications.