StyleECU-es

StyleECU-es is a style embedding model for Spanish, obtained by fine-tuning mStyleDistance on SynthSTEL-ES, a purpose-built Spanish contrastive dataset of 51,400 triplets covering 71 stylistic dimensions.

Model Description

StyleECU-es specializes the mStyleDistance embedding space toward stylistic phenomena most relevant to Spanish, including dialectal variation (voseo/tuteo), expressive morphology, syntactic complexity, and digital style.

Training

  • Base model: StyleDistance/mstyledistance
  • Training objective: TripletLoss (contrastive learning)
  • Dataset: style-anon/SynthSTEL-ES
  • Training size: 51,400 triplets
  • Epochs: 2

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("style-anon/StyleECU-es")
embeddings = model.encode(["Your text here"])

Evaluation

Evaluated on PAN author profiling tasks (Spanish):

Task Base (mStyleDistance) StyleECU-es ฮ”
PAN 2018 โ€“ Gender prediction baseline +3 pp +3 pp
PAN 2021 โ€“ Hate speech spreaders 0.70 0.81 +11 pp

Authors

Citation

If you use this model, please cite:

Paper under review. Citation will be updated upon publication.

Downloads last month
9
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for style-anon/StyleECU-es

Finetuned
(2)
this model