README.md · bilalzafar/CBDC-Discourse at main

File size: 11,198 Bytes

---
license: mit
language:
- en
metrics:
- accuracy
- f1
base_model:
- bilalzafar/CentralBank-BERT
pipeline_tag: text-classification
library_name: transformers
tags:
- finance
- cbdc
- central-bank
- financial-nlp
- economic-policy
- monetary-policy
- sentence-classification
- text-classification
- transformers
- bert
- discourse-analysis
- policy-analysis
- centralbank-bert
- bis-speeches

---


# CBDC-Discourse

`CBDC-Discourse` is a **BERT-based sentence classifier** fine-tuned to categorize central bank digital currency (CBDC) discourse into three conceptually distinct classes: **Feature, Risk-Benefit, and Process**.
This model enables structured analysis of CBDC-related policy and research texts by separating **design attributes**, **evaluative outcomes**, and **procedural activities**.

| Class            | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Feature**      | A sentence that specifies a **concrete design element or operational mechanism** of CBDC. Examples include: wallet/card modality; programmability/smart contracts; privacy model; interoperability requirements; legal tender status; distribution via intermediaries; holding limits/caps; interest-bearing/remuneration (incl. negative rates); rulebook/scheme rules; settlement architecture (DLT/RPS/RTGS links).                                                                                                                                                                                 |
| **Risk-Benefit** | A sentence that asserts or implies **outcomes, effects, or trade-offs** (positive or negative) from a CBDC feature or its introduction, including policy/equilibrium impacts. Examples include: faster/cheaper/more transparent cross-border payments; financial inclusion; regional cooperation; competition/innovation; sovereignty/autonomy; efficiency/productivity gains. Also, negative concerns such as bank disintermediation; cyber/operational risk; crisis flight from deposits; privacy harms; monetary/fiscal dominance concerns; “too successful” crowd-out; legal/regulatory fragility. |
| **Process**      | A sentence about **research, consultations, pilots, governance, timeline, or agenda-setting**, without specifying a concrete feature or claiming effects/trade-offs. Examples include: public consultations; surveys/focus groups; task forces; phases (investigation/preparation/pilot); rulebook drafting as an activity (absent specifics); reports/citations; statements of interest/attention; open questions; goal/timeline setting (e.g., “medium-term goal”).                                                                                                                                  |


## Base Model

This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996–2024)**.
CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification.

## Dataset

The model was fine-tuned on a **manually annotated dataset of CBDC-related sentences** extracted from **Bank for International Settlements (BIS) central bank speeches (1996–2024)**.
The dataset was balanced across three discourse classes with a total of **2,886 sentences (962 per class)**:

## Intended Use

This model is designed for the **automatic classification of CBDC discourse** in policy, research, and financial communications. It enables researchers, analysts, and practitioners to distinguish whether a sentence describes **procedural aspects**, **design features**, or **evaluative outcomes** of central bank digital currencies.
Such categorization supports **policy analysis, thematic mapping of central bank communication, and structured NLP-based research** in the fields of **finance, monetary economics, and economic policy**.

## Training Details

* Tokenization: WordPiece (CentralBank-BERT tokenizer)
* Maximum sequence length: 256 tokens
* Dynamic padding (`DataCollatorWithPadding`)
* Train/Val/Test split: 80/10/10 stratified by label

| Parameter                     | Value                       |
| ----------------------------- | --------------------------- |
| Base model                    | [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) |
| Epochs                        | 6                           |
| Train batch size (per device) | 8                           |
| Eval batch size (per device)  | 16                          |
| Gradient accumulation         | 2                           |
| Effective batch size          | 16                          |
| Learning rate                 | 2e-5                        |
| Weight decay                  | 0.01                        |
| Warmup ratio                  | 0.06                        |
| Scheduler                     | Cosine                      |
| Mixed precision (fp16)        | Enabled                     |

* Environment: Google Colab
* GPU: Tesla T4 (16GB)
* Framework: PyTorch 2.8.0 + Hugging Face Transformers

## Evaluation Results

| Split      | Accuracy  | Macro-F1  | Weighted-F1 | Class            | Precision | Recall | F1    |
| ---------- | --------- | --------- | ----------- | ---------------- | --------- | ------ | ----- |
| Validation | **0.851** | **0.839** | **0.852**   | –                | –         | –      | –     |
| Test       | **0.823** | **0.803** | **0.825**   | **Feature**      | 0.759     | 0.782  | 0.770 |
|            |           |           |             | **Process**      | 0.927     | 0.845  | 0.884 |
|            |           |           |             | **Risk-Benefit** | 0.700     | 0.817  | 0.754 |

---

## Other CBDC Models

This model is part of the **CentralBank-BERT / CBDC model family**, a suite of domain-adapted classifiers for analyzing central-bank communication.

| **Model**                       | **Purpose**                                                         | **Intended Use**                                                    | **Link**                                                               |
| ------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- |
| **bilalzafar/CentralBank-BERT** | Domain-adaptive masked LM trained on BIS speeches (1996–2024).      | Base encoder for CBDC downstream tasks; fill-mask tasks.            | [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) |
| **bilalzafar/CBDC-BERT**        | Binary classifier: CBDC vs. Non-CBDC.                               | Flagging CBDC-related discourse in large corpora.                   | [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT)               |
| **bilalzafar/CBDC-Stance**      | 3-class stance model (Pro, Wait-and-See, Anti).                     | Research on policy stances and discourse monitoring.                | [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance)           |
| **bilalzafar/CBDC-Sentiment**   | 3-class sentiment model (Positive, Neutral, Negative).              | Tone analysis in central bank communications.                       | [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment)     |
| **bilalzafar/CBDC-Type**        | Classifies Retail, Wholesale, General CBDC mentions.                | Distinguishing policy focus (retail vs wholesale).                  | [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type)               |
| **bilalzafar/CBDC-Discourse**   | 3-class discourse classifier (Feature, Process, Risk-Benefit).      | Structured categorization of CBDC communications.                   | [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse)     |
| **bilalzafar/CentralBank-NER**  | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER)   |


## Repository and Replication Package

All **training pipelines, preprocessing scripts, evaluation notebooks, and result outputs** are available in the companion GitHub repository:

🔗 **[https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)**

---

## How to Use

```python
from transformers import pipeline

# Load pipeline
classifier = pipeline("text-classification", model="bilalzafar/CBDC-Discourse")

# Example sentences
sentences = [
    "The central bank launched a pilot project for CBDC cross-border settlement.", # Process
    "Programmability in CBDC allows conditional payments.", # Feature
    "CBDC may increase risks of bank disintermediation." # Risk-Benefit
]

# Predict
for s in sentences:
    result = classifier(s, return_all_scores=False)[0]
    print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")

# Example output 
# [{The central bank launched a pilot project for CBDC cross-border settlement. → Process (score=0.9989)}]
# [{Programmability in CBDC allows conditional payments. → Feature (score=0.9991)}]
# [{CBDC may increase risks of bank disintermediation. → Risk-Benefit (score=0.9986)}]
```
---

## Citation

If you use this model, please cite as:

**Zafar, M. B. (2025). *CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse*. SSRN. [https://papers.ssrn.com/abstract=5404456](https://papers.ssrn.com/abstract=5404456)**

```bibtex
@article{zafar2025centralbankbert,
  title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
  author={Zafar, Muhammad Bilal},
  year={2025},
  journal={SSRN Electronic Journal},
  url={https://papers.ssrn.com/abstract=5404456}
}