File size: 11,198 Bytes
ab90516 7cdde82 ab90516 bcd81d7 08cc734 bcd81d7 08cc734 bcd81d7 0375c68 bcd81d7 08cc734 bcd81d7 08cc734 bcd81d7 08cc734 bcd81d7 08cc734 bcd81d7 08cc734 6e536c6 08cc734 bcd81d7 08cc734 6e536c6 0375c68 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
---
license: mit
language:
- en
metrics:
- accuracy
- f1
base_model:
- bilalzafar/CentralBank-BERT
pipeline_tag: text-classification
library_name: transformers
tags:
- finance
- cbdc
- central-bank
- financial-nlp
- economic-policy
- monetary-policy
- sentence-classification
- text-classification
- transformers
- bert
- discourse-analysis
- policy-analysis
- centralbank-bert
- bis-speeches
---
# CBDC-Discourse
`CBDC-Discourse` is a **BERT-based sentence classifier** fine-tuned to categorize central bank digital currency (CBDC) discourse into three conceptually distinct classes: **Feature, Risk-Benefit, and Process**.
This model enables structured analysis of CBDC-related policy and research texts by separating **design attributes**, **evaluative outcomes**, and **procedural activities**.
| Class | Description |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Feature** | A sentence that specifies a **concrete design element or operational mechanism** of CBDC. Examples include: wallet/card modality; programmability/smart contracts; privacy model; interoperability requirements; legal tender status; distribution via intermediaries; holding limits/caps; interest-bearing/remuneration (incl. negative rates); rulebook/scheme rules; settlement architecture (DLT/RPS/RTGS links). |
| **Risk-Benefit** | A sentence that asserts or implies **outcomes, effects, or trade-offs** (positive or negative) from a CBDC feature or its introduction, including policy/equilibrium impacts. Examples include: faster/cheaper/more transparent cross-border payments; financial inclusion; regional cooperation; competition/innovation; sovereignty/autonomy; efficiency/productivity gains. Also, negative concerns such as bank disintermediation; cyber/operational risk; crisis flight from deposits; privacy harms; monetary/fiscal dominance concerns; “too successful” crowd-out; legal/regulatory fragility. |
| **Process** | A sentence about **research, consultations, pilots, governance, timeline, or agenda-setting**, without specifying a concrete feature or claiming effects/trade-offs. Examples include: public consultations; surveys/focus groups; task forces; phases (investigation/preparation/pilot); rulebook drafting as an activity (absent specifics); reports/citations; statements of interest/attention; open questions; goal/timeline setting (e.g., “medium-term goal”). |
## Base Model
This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996–2024)**.
CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification.
## Dataset
The model was fine-tuned on a **manually annotated dataset of CBDC-related sentences** extracted from **Bank for International Settlements (BIS) central bank speeches (1996–2024)**.
The dataset was balanced across three discourse classes with a total of **2,886 sentences (962 per class)**:
## Intended Use
This model is designed for the **automatic classification of CBDC discourse** in policy, research, and financial communications. It enables researchers, analysts, and practitioners to distinguish whether a sentence describes **procedural aspects**, **design features**, or **evaluative outcomes** of central bank digital currencies.
Such categorization supports **policy analysis, thematic mapping of central bank communication, and structured NLP-based research** in the fields of **finance, monetary economics, and economic policy**.
## Training Details
* Tokenization: WordPiece (CentralBank-BERT tokenizer)
* Maximum sequence length: 256 tokens
* Dynamic padding (`DataCollatorWithPadding`)
* Train/Val/Test split: 80/10/10 stratified by label
| Parameter | Value |
| ----------------------------- | --------------------------- |
| Base model | [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) |
| Epochs | 6 |
| Train batch size (per device) | 8 |
| Eval batch size (per device) | 16 |
| Gradient accumulation | 2 |
| Effective batch size | 16 |
| Learning rate | 2e-5 |
| Weight decay | 0.01 |
| Warmup ratio | 0.06 |
| Scheduler | Cosine |
| Mixed precision (fp16) | Enabled |
* Environment: Google Colab
* GPU: Tesla T4 (16GB)
* Framework: PyTorch 2.8.0 + Hugging Face Transformers
## Evaluation Results
| Split | Accuracy | Macro-F1 | Weighted-F1 | Class | Precision | Recall | F1 |
| ---------- | --------- | --------- | ----------- | ---------------- | --------- | ------ | ----- |
| Validation | **0.851** | **0.839** | **0.852** | – | – | – | – |
| Test | **0.823** | **0.803** | **0.825** | **Feature** | 0.759 | 0.782 | 0.770 |
| | | | | **Process** | 0.927 | 0.845 | 0.884 |
| | | | | **Risk-Benefit** | 0.700 | 0.817 | 0.754 |
---
## Other CBDC Models
This model is part of the **CentralBank-BERT / CBDC model family**, a suite of domain-adapted classifiers for analyzing central-bank communication.
| **Model** | **Purpose** | **Intended Use** | **Link** |
| ------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- |
| **bilalzafar/CentralBank-BERT** | Domain-adaptive masked LM trained on BIS speeches (1996–2024). | Base encoder for CBDC downstream tasks; fill-mask tasks. | [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) |
| **bilalzafar/CBDC-BERT** | Binary classifier: CBDC vs. Non-CBDC. | Flagging CBDC-related discourse in large corpora. | [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT) |
| **bilalzafar/CBDC-Stance** | 3-class stance model (Pro, Wait-and-See, Anti). | Research on policy stances and discourse monitoring. | [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance) |
| **bilalzafar/CBDC-Sentiment** | 3-class sentiment model (Positive, Neutral, Negative). | Tone analysis in central bank communications. | [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment) |
| **bilalzafar/CBDC-Type** | Classifies Retail, Wholesale, General CBDC mentions. | Distinguishing policy focus (retail vs wholesale). | [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type) |
| **bilalzafar/CBDC-Discourse** | 3-class discourse classifier (Feature, Process, Risk-Benefit). | Structured categorization of CBDC communications. | [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse) |
| **bilalzafar/CentralBank-NER** | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER) |
## Repository and Replication Package
All **training pipelines, preprocessing scripts, evaluation notebooks, and result outputs** are available in the companion GitHub repository:
🔗 **[https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)**
---
## How to Use
```python
from transformers import pipeline
# Load pipeline
classifier = pipeline("text-classification", model="bilalzafar/CBDC-Discourse")
# Example sentences
sentences = [
"The central bank launched a pilot project for CBDC cross-border settlement.", # Process
"Programmability in CBDC allows conditional payments.", # Feature
"CBDC may increase risks of bank disintermediation." # Risk-Benefit
]
# Predict
for s in sentences:
result = classifier(s, return_all_scores=False)[0]
print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")
# Example output
# [{The central bank launched a pilot project for CBDC cross-border settlement. → Process (score=0.9989)}]
# [{Programmability in CBDC allows conditional payments. → Feature (score=0.9991)}]
# [{CBDC may increase risks of bank disintermediation. → Risk-Benefit (score=0.9986)}]
```
---
## Citation
If you use this model, please cite as:
**Zafar, M. B. (2025). *CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse*. SSRN. [https://papers.ssrn.com/abstract=5404456](https://papers.ssrn.com/abstract=5404456)**
```bibtex
@article{zafar2025centralbankbert,
title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
author={Zafar, Muhammad Bilal},
year={2025},
journal={SSRN Electronic Journal},
url={https://papers.ssrn.com/abstract=5404456}
} |