File size: 11,198 Bytes
ab90516
 
 
 
 
 
 
 
 
 
 
7cdde82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ab90516
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bcd81d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
08cc734
bcd81d7
08cc734
 
 
 
 
 
bcd81d7
0375c68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bcd81d7
08cc734
bcd81d7
08cc734
bcd81d7
08cc734
 
bcd81d7
08cc734
 
bcd81d7
08cc734
 
6e536c6
 
 
08cc734
bcd81d7
08cc734
 
 
6e536c6
 
 
 
 
0375c68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: mit
language:
- en
metrics:
- accuracy
- f1
base_model:
- bilalzafar/CentralBank-BERT
pipeline_tag: text-classification
library_name: transformers
tags:
- finance
- cbdc
- central-bank
- financial-nlp
- economic-policy
- monetary-policy
- sentence-classification
- text-classification
- transformers
- bert
- discourse-analysis
- policy-analysis
- centralbank-bert
- bis-speeches

---


# CBDC-Discourse

`CBDC-Discourse` is a **BERT-based sentence classifier** fine-tuned to categorize central bank digital currency (CBDC) discourse into three conceptually distinct classes: **Feature, Risk-Benefit, and Process**.
This model enables structured analysis of CBDC-related policy and research texts by separating **design attributes**, **evaluative outcomes**, and **procedural activities**.

| Class            | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Feature**      | A sentence that specifies a **concrete design element or operational mechanism** of CBDC. Examples include: wallet/card modality; programmability/smart contracts; privacy model; interoperability requirements; legal tender status; distribution via intermediaries; holding limits/caps; interest-bearing/remuneration (incl. negative rates); rulebook/scheme rules; settlement architecture (DLT/RPS/RTGS links).                                                                                                                                                                                 |
| **Risk-Benefit** | A sentence that asserts or implies **outcomes, effects, or trade-offs** (positive or negative) from a CBDC feature or its introduction, including policy/equilibrium impacts. Examples include: faster/cheaper/more transparent cross-border payments; financial inclusion; regional cooperation; competition/innovation; sovereignty/autonomy; efficiency/productivity gains. Also, negative concerns such as bank disintermediation; cyber/operational risk; crisis flight from deposits; privacy harms; monetary/fiscal dominance concerns; “too successful” crowd-out; legal/regulatory fragility. |
| **Process**      | A sentence about **research, consultations, pilots, governance, timeline, or agenda-setting**, without specifying a concrete feature or claiming effects/trade-offs. Examples include: public consultations; surveys/focus groups; task forces; phases (investigation/preparation/pilot); rulebook drafting as an activity (absent specifics); reports/citations; statements of interest/attention; open questions; goal/timeline setting (e.g., “medium-term goal”).                                                                                                                                  |


## Base Model

This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996–2024)**.
CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification.

## Dataset

The model was fine-tuned on a **manually annotated dataset of CBDC-related sentences** extracted from **Bank for International Settlements (BIS) central bank speeches (1996–2024)**.
The dataset was balanced across three discourse classes with a total of **2,886 sentences (962 per class)**:

## Intended Use

This model is designed for the **automatic classification of CBDC discourse** in policy, research, and financial communications. It enables researchers, analysts, and practitioners to distinguish whether a sentence describes **procedural aspects**, **design features**, or **evaluative outcomes** of central bank digital currencies.
Such categorization supports **policy analysis, thematic mapping of central bank communication, and structured NLP-based research** in the fields of **finance, monetary economics, and economic policy**.

## Training Details

* Tokenization: WordPiece (CentralBank-BERT tokenizer)
* Maximum sequence length: 256 tokens
* Dynamic padding (`DataCollatorWithPadding`)
* Train/Val/Test split: 80/10/10 stratified by label

| Parameter                     | Value                       |
| ----------------------------- | --------------------------- |
| Base model                    | [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) |
| Epochs                        | 6                           |
| Train batch size (per device) | 8                           |
| Eval batch size (per device)  | 16                          |
| Gradient accumulation         | 2                           |
| Effective batch size          | 16                          |
| Learning rate                 | 2e-5                        |
| Weight decay                  | 0.01                        |
| Warmup ratio                  | 0.06                        |
| Scheduler                     | Cosine                      |
| Mixed precision (fp16)        | Enabled                     |

* Environment: Google Colab
* GPU: Tesla T4 (16GB)
* Framework: PyTorch 2.8.0 + Hugging Face Transformers

## Evaluation Results

| Split      | Accuracy  | Macro-F1  | Weighted-F1 | Class            | Precision | Recall | F1    |
| ---------- | --------- | --------- | ----------- | ---------------- | --------- | ------ | ----- |
| Validation | **0.851** | **0.839** | **0.852**   | –                | –         | –      | –     |
| Test       | **0.823** | **0.803** | **0.825**   | **Feature**      | 0.759     | 0.782  | 0.770 |
|            |           |           |             | **Process**      | 0.927     | 0.845  | 0.884 |
|            |           |           |             | **Risk-Benefit** | 0.700     | 0.817  | 0.754 |

---

## Other CBDC Models

This model is part of the **CentralBank-BERT / CBDC model family**, a suite of domain-adapted classifiers for analyzing central-bank communication.

| **Model**                       | **Purpose**                                                         | **Intended Use**                                                    | **Link**                                                               |
| ------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- |
| **bilalzafar/CentralBank-BERT** | Domain-adaptive masked LM trained on BIS speeches (1996–2024).      | Base encoder for CBDC downstream tasks; fill-mask tasks.            | [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) |
| **bilalzafar/CBDC-BERT**        | Binary classifier: CBDC vs. Non-CBDC.                               | Flagging CBDC-related discourse in large corpora.                   | [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT)               |
| **bilalzafar/CBDC-Stance**      | 3-class stance model (Pro, Wait-and-See, Anti).                     | Research on policy stances and discourse monitoring.                | [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance)           |
| **bilalzafar/CBDC-Sentiment**   | 3-class sentiment model (Positive, Neutral, Negative).              | Tone analysis in central bank communications.                       | [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment)     |
| **bilalzafar/CBDC-Type**        | Classifies Retail, Wholesale, General CBDC mentions.                | Distinguishing policy focus (retail vs wholesale).                  | [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type)               |
| **bilalzafar/CBDC-Discourse**   | 3-class discourse classifier (Feature, Process, Risk-Benefit).      | Structured categorization of CBDC communications.                   | [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse)     |
| **bilalzafar/CentralBank-NER**  | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER)   |


## Repository and Replication Package

All **training pipelines, preprocessing scripts, evaluation notebooks, and result outputs** are available in the companion GitHub repository:

🔗 **[https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)**

---

## How to Use

```python
from transformers import pipeline

# Load pipeline
classifier = pipeline("text-classification", model="bilalzafar/CBDC-Discourse")

# Example sentences
sentences = [
    "The central bank launched a pilot project for CBDC cross-border settlement.", # Process
    "Programmability in CBDC allows conditional payments.", # Feature
    "CBDC may increase risks of bank disintermediation." # Risk-Benefit
]

# Predict
for s in sentences:
    result = classifier(s, return_all_scores=False)[0]
    print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")

# Example output 
# [{The central bank launched a pilot project for CBDC cross-border settlement. → Process (score=0.9989)}]
# [{Programmability in CBDC allows conditional payments. → Feature (score=0.9991)}]
# [{CBDC may increase risks of bank disintermediation. → Risk-Benefit (score=0.9986)}]
```
---

## Citation

If you use this model, please cite as:

**Zafar, M. B. (2025). *CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse*. SSRN. [https://papers.ssrn.com/abstract=5404456](https://papers.ssrn.com/abstract=5404456)**

```bibtex
@article{zafar2025centralbankbert,
  title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
  author={Zafar, Muhammad Bilal},
  year={2025},
  journal={SSRN Electronic Journal},
  url={https://papers.ssrn.com/abstract=5404456}
}