MulderFinders / README.md

MorcuendeA

Upload README.md

191b62e verified 6 months ago

preview code

raw

history blame contribute delete

4.2 kB

metadata

library_name: transformers
license: apache-2.0
base_model: EuroBERT/EuroBERT-210m
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: MulderFinders
    results: []
datasets:
  - MorcuendeA/ConspiraText-ES
language:
  - es

MulderFinders

The truth is out there... and this model is here to help you find it.

MulderFinders is a fine-tuned version of EuroBERT/EuroBERT-210m, trained on MorcuendeA/ConspiraText-ES, a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all.

Trust no one... except maybe the F1 score.

Usage

You can use the model directly with the 🤗 Transformers library:

  from transformers import AutoTokenizer, AutoModelForSequenceClassification
  import torch
  
  model_name = "MorcuendeA/MulderFinders"
  
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
  
  text = "las redes 5G nos ayudan a tener mejor internet"
  
  inputs = tokenizer(text, return_tensors="pt")
  outputs = model(**inputs)
  logits = outputs.logits
  probs = torch.softmax(logits, dim=1)  [0]
  labels = model.config.id2label
  pred = torch.argmax(probs).item()
  print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})")
  
  # Output:
  # Prediction: rational (0.9989)

It achieves the following results on the evaluation set:

Loss: 0.0059
Accuracy: 0.9981
F1 Score: 0.9983

Model description

MulderFinders is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on EuroBERT/EuroBERT-210m, a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not.

Intended uses & limitations

Intended uses:

Content moderation on social media or online forums.
Research and analysis of conspiratorial discourse in Spanish-language texts.
Assisting fact-checking workflows by flagging potentially conspiratorial statements.

Limitations:

May not handle sarcasm, irony, or ambiguous language reliably.
Performance outside the original domain (i.e., texts similar to the training dataset) may degrade.
May reflect biases present in the training data.

Training and evaluation data

The model was fine-tuned using the ConspiraText-ES dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes. During fine-tuning, regularization was applied with attention_dropout and hidden_dropout both set to 0.2.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
seed: 69
gradient_accumulation_steps: 2
total_train_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 6

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1 Score
0.2601	0.3030	20	0.0532	0.9848	0.9855
0.0771	0.6061	40	0.0197	0.9981	0.9982
0.0271	0.9091	60	0.0218	0.9981	0.9982
0.0189	1.2121	80	0.0182	0.9943	0.9945
0.0176	1.5152	100	0.0093	0.9962	0.9963

Framework versions

Transformers 4.53.2
Pytorch 2.6.0+cu124
Datasets 2.14.4
Tokenizers 0.21.2