MulderFinders / README.md

Upload README.md

191b62e verified 6 months ago

4.2 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: EuroBERT/EuroBERT-210m
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: MulderFinders
	results: []
	datasets:
	- MorcuendeA/ConspiraText-ES
	language:
	- es
	---

	![MulderFinders Logo](./i_want_to_belive.png)


	# MulderFinders

	# MulderFinders

	The truth is out there... and this model is here to help you find it.

	MulderFinders is a fine-tuned version of [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), trained on [MorcuendeA/ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES), a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all.

	Trust no one... except maybe the F1 score.


	## Usage

	You can use the model directly with the 🤗 Transformers library:


	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "MorcuendeA/MulderFinders"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)

	text = "las redes 5G nos ayudan a tener mejor internet"

	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	logits = outputs.logits
	probs = torch.softmax(logits, dim=1) [0]
	labels = model.config.id2label
	pred = torch.argmax(probs).item()
	print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})")

	# Output:
	# Prediction: rational (0.9989)
	```

	It achieves the following results on the evaluation set:
	- Loss: 0.0059
	- Accuracy: 0.9981
	- F1 Score: 0.9983

	## Model description

	Model description

	MulderFinders is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not.

	## Intended uses & limitations

	Intended uses:

	- Content moderation on social media or online forums.
	- Research and analysis of conspiratorial discourse in Spanish-language texts.
	- Assisting fact-checking workflows by flagging potentially conspiratorial statements.

	Limitations:

	- May not handle sarcasm, irony, or ambiguous language reliably.
	- Performance outside the original domain (i.e., texts similar to the training dataset) may degrade.
	- May reflect biases present in the training data.

	## Training and evaluation data

	The model was fine-tuned using the [ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES) dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes.
	During fine-tuning, regularization was applied with attention_dropout and hidden_dropout both set to 0.2.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 69
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- num_epochs: 6

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \| F1 Score \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------:\|:--------:\|
	\| 0.2601 \| 0.3030 \| 20 \| 0.0532 \| 0.9848 \| 0.9855 \|
	\| 0.0771 \| 0.6061 \| 40 \| 0.0197 \| 0.9981 \| 0.9982 \|
	\| 0.0271 \| 0.9091 \| 60 \| 0.0218 \| 0.9981 \| 0.9982 \|
	\| 0.0189 \| 1.2121 \| 80 \| 0.0182 \| 0.9943 \| 0.9945 \|
	\| 0.0176 \| 1.5152 \| 100 \| 0.0093 \| 0.9962 \| 0.9963 \|


	### Framework versions

	- Transformers 4.53.2
	- Pytorch 2.6.0+cu124
	- Datasets 2.14.4
	- Tokenizers 0.21.2