--- library_name: transformers license: apache-2.0 base_model: EuroBERT/EuroBERT-210m tags: - generated_from_trainer metrics: - accuracy model-index: - name: MulderFinders results: [] datasets: - MorcuendeA/ConspiraText-ES language: - es --- ![MulderFinders Logo](./i_want_to_belive.png) # MulderFinders # MulderFinders The truth is out there... and this model is here to help you find it. **MulderFinders** is a fine-tuned version of [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), trained on [MorcuendeA/ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES), a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all. Trust no one... except maybe the F1 score. ## Usage You can use the model directly with the 🤗 Transformers library: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "MorcuendeA/MulderFinders" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True) text = "las redes 5G nos ayudan a tener mejor internet" inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) logits = outputs.logits probs = torch.softmax(logits, dim=1) [0] labels = model.config.id2label pred = torch.argmax(probs).item() print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})") # Output: # Prediction: rational (0.9989) ``` It achieves the following results on the evaluation set: - Loss: 0.0059 - Accuracy: 0.9981 - F1 Score: 0.9983 ## Model description Model description **MulderFinders** is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not. ## Intended uses & limitations **Intended uses:** - Content moderation on social media or online forums. - Research and analysis of conspiratorial discourse in Spanish-language texts. - Assisting fact-checking workflows by flagging potentially conspiratorial statements. **Limitations:** - May not handle sarcasm, irony, or ambiguous language reliably. - Performance outside the original domain (i.e., texts similar to the training dataset) may degrade. - May reflect biases present in the training data. ## Training and evaluation data The model was fine-tuned using the [ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES) dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes. During fine-tuning, regularization was applied with **attention_dropout** and **hidden_dropout** both set to 0.2. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 32 - eval_batch_size: 32 - seed: 69 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: linear - num_epochs: 6 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 Score | |:-------------:|:------:|:----:|:---------------:|:--------:|:--------:| | 0.2601 | 0.3030 | 20 | 0.0532 | 0.9848 | 0.9855 | | 0.0771 | 0.6061 | 40 | 0.0197 | 0.9981 | 0.9982 | | 0.0271 | 0.9091 | 60 | 0.0218 | 0.9981 | 0.9982 | | 0.0189 | 1.2121 | 80 | 0.0182 | 0.9943 | 0.9945 | | 0.0176 | 1.5152 | 100 | 0.0093 | 0.9962 | 0.9963 | ### Framework versions - Transformers 4.53.2 - Pytorch 2.6.0+cu124 - Datasets 2.14.4 - Tokenizers 0.21.2