logo

LudoBETO is a domain adaptation of a Spanish BERT language model.
It was adapted to the pathological gambling domain with a corpus extracted from a specialised forum. We automatically compiled with a LLM a lexical resource to guide the masking process of the language model and, therefore, to help it in paying more attention to words related to pathological gambling.

For training the model we used a batch size of 8, Adam optimizer, with a learning rate of 2e-5, and cross-entropy as a loss function. We trained the model for four epochs using a GPU NVIDIA GeForce RTX 4070 12GB.

Usage

from transformers import pipeline

pipe = pipeline("fill-mask", model="citiusLTL/ludoBETO")

text = pipe("Las [MASK] son adictivas.")

print(text)

Load model directly

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("citiusLTL/ludoBETO")
model = AutoModelForMaskedLM.from_pretrained("citiusLTL/ludoBETO")

Paper

For more details, refer to the paper Analyzing Gambling Addictions: A Spanish Corpus for Understanding Pathological Behavior.

@inproceedings{couto-etal-2025-analyzing,
    title = "Analyzing Gambling Addictions: A {S}panish Corpus for Understanding Pathological Behavior",
    author = "Couto, Manuel  and
      Fern{\'a}ndez-Pichel, Marcos  and
      Aragon, Mario Ezra  and
      Losada, David E.",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.955/",
    doi = "10.18653/v1/2025.findings-emnlp.955",
    pages = "17610--17619",
    ISBN = "979-8-89176-335-7",
    abstract = "This work fosters research on the interaction between natural language use and gambling disorders. We have built a new Spanish corpus for screening standardized gambling symptoms. We employ search methods to find on-topic sentences, top-k pooling to form the assessment pools of sentences, and thorough annotation guidelines. The labeling task is challenging, given the need to identify topic relevance and explicit evidence about the symptoms. Additionally, we explore using state-of-the-art LLMs for annotation and compare different sentence search models."
}
Downloads last month
18
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support