MorcuendeA commited on
Commit
27750f3
·
verified ·
1 Parent(s): f59270a

MulderFinders

Browse files
Files changed (3) hide show
  1. README.md +9 -62
  2. config.json +3 -5
  3. training_args.bin +1 -1
README.md CHANGED
@@ -9,54 +9,14 @@ metrics:
9
  model-index:
10
  - name: MulderFinders
11
  results: []
12
- datasets:
13
- - MorcuendeA/ConspiraText-ES
14
- language:
15
- - es
16
  ---
17
 
18
- ![MulderFinders Logo](./i_want_to_belive.png)
19
-
20
-
21
- # MulderFinders
22
 
23
  # MulderFinders
24
 
25
- The truth is out there... and this model is here to help you find it.
26
-
27
- **MulderFinders** is a fine-tuned version of [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), trained on [MorcuendeA/ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES), a dataset full of Spanish-language conspiratorial and non-conspiratorial text. Whether it's aliens, 5G towers, or secret societies, this model is ready to classify them all.
28
-
29
- Trust no one... except maybe the F1 score.
30
-
31
-
32
- ## Usage
33
-
34
- You can use the model directly with the 🤗 Transformers library:
35
-
36
-
37
- ```python
38
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
39
- import torch
40
-
41
- model_name = "MorcuendeA/MulderFinders"
42
-
43
- tokenizer = AutoTokenizer.from_pretrained(model_name)
44
- model = AutoModelForSequenceClassification.from_pretrained(model_name, trust_remote_code=True)
45
-
46
- text = "las redes 5G nos ayudan a tener mejor internet"
47
-
48
- inputs = tokenizer(text, return_tensors="pt")
49
- outputs = model(**inputs)
50
- logits = outputs.logits
51
- probs = torch.softmax(logits, dim=1) [0]
52
- labels = model.config.id2label
53
- pred = torch.argmax(probs).item()
54
- print(f"Prediction: {labels[pred]} ({probs[pred].item():.4f})")
55
-
56
- # Output:
57
- # Prediction: rational (0.9989)
58
- ```
59
-
60
  It achieves the following results on the evaluation set:
61
  - Loss: 0.0004
62
  - Accuracy: 1.0
@@ -64,28 +24,15 @@ It achieves the following results on the evaluation set:
64
 
65
  ## Model description
66
 
67
- Model description
68
-
69
- **MulderFinders** is a Spanish-language text classification model fine-tuned to detect conspiracy-related content. It is based on [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m), a transformer model pre-trained on multiple European languages. MulderFinders performs binary classification, identifying whether a given piece of text expresses conspiratorial ideas or not.
70
 
71
  ## Intended uses & limitations
72
 
73
- **Intended uses:**
74
-
75
- - Content moderation on social media or online forums.
76
- - Research and analysis of conspiratorial discourse in Spanish-language texts.
77
- - Assisting fact-checking workflows by flagging potentially conspiratorial statements.
78
-
79
- **Limitations:**
80
 
81
- - May not handle sarcasm, irony, or ambiguous language reliably.
82
- - Performance outside the original domain (i.e., texts similar to the training dataset) may degrade.
83
- - May reflect biases present in the training data.
84
-
85
  ## Training and evaluation data
86
 
87
- The model was fine-tuned using the [ConspiraText-ES](https://huggingface.co/datasets/MorcuendeA/ConspiraText-ES) dataset, which contains Spanish-language examples labeled as conspiratorial or not. The dataset includes only synthetic text samples, covering various conspiracy-related themes.
88
- During fine-tuning, regularization was applied with **attention_dropout** and **hidden_dropout** both set to 0.1.
89
 
90
  ## Training procedure
91
 
@@ -116,7 +63,7 @@ The following hyperparameters were used during training:
116
 
117
  ### Framework versions
118
 
119
- - Transformers 4.53.2
120
  - Pytorch 2.6.0+cu124
121
- - Datasets 2.14.4
122
- - Tokenizers 0.21.2
 
9
  model-index:
10
  - name: MulderFinders
11
  results: []
 
 
 
 
12
  ---
13
 
14
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
+ should probably proofread and complete it, then remove this comment. -->
 
 
16
 
17
  # MulderFinders
18
 
19
+ This model is a fine-tuned version of [EuroBERT/EuroBERT-210m](https://huggingface.co/EuroBERT/EuroBERT-210m) on an unknown dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  It achieves the following results on the evaluation set:
21
  - Loss: 0.0004
22
  - Accuracy: 1.0
 
24
 
25
  ## Model description
26
 
27
+ More information needed
 
 
28
 
29
  ## Intended uses & limitations
30
 
31
+ More information needed
 
 
 
 
 
 
32
 
 
 
 
 
33
  ## Training and evaluation data
34
 
35
+ More information needed
 
36
 
37
  ## Training procedure
38
 
 
63
 
64
  ### Framework versions
65
 
66
+ - Transformers 4.54.0
67
  - Pytorch 2.6.0+cu124
68
+ - Datasets 4.0.0
69
+ - Tokenizers 0.21.2
config.json CHANGED
@@ -3,7 +3,7 @@
3
  "EuroBertForSequenceClassification"
4
  ],
5
  "attention_bias": false,
6
- "attention_dropout": 0.1,
7
  "auto_map": {
8
  "AutoConfig": "configuration_eurobert.EuroBertConfig",
9
  "AutoModel": "modeling_eurobert.EuroBertModel",
@@ -19,9 +19,7 @@
19
  "eos_token_id": 128001,
20
  "head_dim": 64,
21
  "hidden_act": "silu",
22
- "hidden_dropout": [
23
- 0.1
24
- ],
25
  "hidden_size": 768,
26
  "id2label": {
27
  "0": "rational",
@@ -50,7 +48,7 @@
50
  "rope_theta": 250000,
51
  "tie_word_embeddings": false,
52
  "torch_dtype": "float32",
53
- "transformers_version": "4.53.2",
54
  "use_cache": false,
55
  "vocab_size": 128256
56
  }
 
3
  "EuroBertForSequenceClassification"
4
  ],
5
  "attention_bias": false,
6
+ "attention_dropout": 0.3,
7
  "auto_map": {
8
  "AutoConfig": "configuration_eurobert.EuroBertConfig",
9
  "AutoModel": "modeling_eurobert.EuroBertModel",
 
19
  "eos_token_id": 128001,
20
  "head_dim": 64,
21
  "hidden_act": "silu",
22
+ "hidden_dropout": 0.3,
 
 
23
  "hidden_size": 768,
24
  "id2label": {
25
  "0": "rational",
 
48
  "rope_theta": 250000,
49
  "tie_word_embeddings": false,
50
  "torch_dtype": "float32",
51
+ "transformers_version": "4.54.0",
52
  "use_cache": false,
53
  "vocab_size": 128256
54
  }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7088a99f6ff3bc21b9e375ebc00f0dcc15c369193b923cac470665b3ab015572
3
  size 5304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d209ad7a782f8bd52d93c64d8cfe3272215ced7a889639a474cfc3b0b88c0325
3
  size 5304