MulderFinders / README.md
MorcuendeA's picture
Upload README.md
191b62e verified
metadata
library_name: transformers
license: apache-2.0
base_model: EuroBERT/EuroBERT-210m
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: MulderFinders
    results: []
datasets:
  - MorcuendeA/ConspiraText-ES
language:
  - es

MulderFinders Logo

MulderFinders

MulderFinders

The truth is out there... and this model is here to help you find it.

MulderFinders is a fine-tuned version of EuroBERT/EuroBERT-210m, trained on MorcuendeA/ConspiraText-ES, a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all.

Trust no one... except maybe the F1 score.

Usage

You can use the model directly with the 🤗 Transformers library:

  from transformers import AutoTokenizer, AutoModelForSequenceClassification
  import torch
  
  model_name = "MorcuendeA/MulderFinders"
  
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
  
  text = "las redes 5G nos ayudan a tener mejor internet"
  
  inputs = tokenizer(text, return_tensors="pt")
  outputs = model(**inputs)
  logits = outputs.logits
  probs = torch.softmax(logits, dim=1)  [0]
  labels = model.config.id2label
  pred = torch.argmax(probs).item()
  print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})")
  
  # Output:
  # Prediction: rational (0.9989)

It achieves the following results on the evaluation set:

  • Loss: 0.0059
  • Accuracy: 0.9981
  • F1 Score: 0.9983

Model description

Model description

MulderFinders is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on EuroBERT/EuroBERT-210m, a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not.

Intended uses & limitations

Intended uses:

  • Content moderation on social media or online forums.
  • Research and analysis of conspiratorial discourse in Spanish-language texts.
  • Assisting fact-checking workflows by flagging potentially conspiratorial statements.

Limitations:

  • May not handle sarcasm, irony, or ambiguous language reliably.
  • Performance outside the original domain (i.e., texts similar to the training dataset) may degrade.
  • May reflect biases present in the training data.

Training and evaluation data

The model was fine-tuned using the ConspiraText-ES dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes. During fine-tuning, regularization was applied with attention_dropout and hidden_dropout both set to 0.2.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 69
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Accuracy F1 Score
0.2601 0.3030 20 0.0532 0.9848 0.9855
0.0771 0.6061 40 0.0197 0.9981 0.9982
0.0271 0.9091 60 0.0218 0.9981 0.9982
0.0189 1.2121 80 0.0182 0.9943 0.9945
0.0176 1.5152 100 0.0093 0.9962 0.9963

Framework versions

  • Transformers 4.53.2
  • Pytorch 2.6.0+cu124
  • Datasets 2.14.4
  • Tokenizers 0.21.2