File size: 4,416 Bytes
6679e15
 
 
 
 
 
 
 
 
 
1e67da4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
license: mit
language:
- de
metrics:
- f1
base_model:
- FacebookAI/xlm-roberta-base
pipeline_tag: token-classification
library_name: transformers
tags:
- arxiv:2509.07459
---

# Model Card: AIxcellent Vibes' Model for Candy Speech Detection

## Model Details

- **Model Type:** Transformer-based encoder (XLM-RoBERTa-Large)
- **Developed by:** Christian Rene Thelen, Patrick Gustav Blaneck, Tobias Bornheim, Niklas Grieger, Stephan Bialonski (FH Aachen, RWTH Aachen, ORDIX AG, Utrecht University)
- **Paper:** [AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training](https://arxiv.org/abs/2509.07459v2)
- **Base Model:** [XLM-RoBERTa-Large](https://huggingface.co/FacebookAI/xlm-roberta-large) (Conneau et al., 2020)
- **Fine-tuning Objective:** Detection of *candy speech* (positive/supportive language) in German YouTube comments.

## Model Description

This model is a fine-tuned **XLM-RoBERTa-Large** adapted for the **GermEval 2025 Shared Task on Candy Speech Detection**.
It was trained to identify *candy speech* at both:

- **Binary level:** Classify whether a comment contains candy speech.
- **Span level:** Detect the exact spans and categories of candy speech within comments, using a BIO tagging scheme across **10 categories** (positive feedback, compliment, affection declaration, encouragement, gratitude, agreement, ambiguous, implicit, group membership, sympathy).

The span-level model also proved effective for binary detection by classifying a comment as candy speech if at least one positive span was detected.

## Intended Uses

- **Research:** Analysis of positive/supportive communication in German social media.
- **Applications:** Social media analytics, conversational AI safety (mitigating sycophancy), computational social science.
- **Not for:** Deployments without fairness/robustness testing on out-of-domain data.

## Performance

- **Dataset:** 46k German YouTube comments, annotated with candy speech spans.
- **Training Data Split:** 37,057 comments (train), 9,229 (test).
- **Shared Task Results:**

  - **Subtask 1 (binary detection):** Positive F1 = **0.891** (ranked 1st)
  - **Subtask 2 (span detection):** Strict F1 = **0.631** (ranked 1st)

## Training Procedure

- **Architecture:** XLM-RoBERTa-Large + linear classification layer (BIO tagging, 21 labels including “O”).
- **Optimizer:** AdamW
- **Learning Rate:** Peak 2e-5 with linear decay and warmup (500 steps).
- **Epochs:** 20 (with early stopping).
- **Batch Size:** 32
- **Regularization:** Dropout (0.1), weight decay (0.01), gradient clipping (L2 norm 1.0).
- **Postprocessing:** BIO tag correction and subword alignment.

## Limitations

- **Domain Specificity:** Trained only on German YouTube comments; performance may degrade on other platforms, genres, or languages.
- **Overlapping Spans:** Cannot handle overlapping spans, as they were rare (<2%) in the training data.
- **Biases:** May reflect biases present in the dataset (e.g., demographic skews in YouTube communities).
- **Generalization:** Needs evaluation before deployment in real-world moderation systems.

## Ethical Considerations

- **Positive speech detection** is less studied than toxic speech, but automatic labeling of “supportiveness” may reinforce cultural biases about what counts as “positive.”
- Must be complemented with **human-in-the-loop moderation** to avoid misuse.

## Citation

If you use this model, please cite:

```
@inproceedings{thelen-etal-2025-aixcellent,
    title = "{AI}xcellent Vibes at {G}erm{E}val 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training",
    author = "Thelen, Christian Rene  and
      Blaneck, Patrick Gustav  and
      Bornheim, Tobias  and
      Grieger, Niklas  and
      Bialonski, Stephan",
    editor = "Wartena, Christian  and
      Heid, Ulrich",
    booktitle = "Proceedings of the 21st Conference on Natural Language Processing (KONVENS 2025): Workshops",
    month = sep,
    year = "2025",
    address = "Hannover, Germany",
    publisher = "HsH Applied Academics",
    url = "https://aclanthology.org/2025.konvens-2.33/",
    pages = "398--403"
}
```

[AIxcellent Vibes at GermEval 2025 Shared Task on Candy Speech Detection: Improving Model Performance by Span-Level Training](https://aclanthology.org/2025.konvens-2.33/) (Thelen et al., KONVENS 2025)