|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
base_model: |
|
|
- bilalzafar/CentralBank-BERT |
|
|
pipeline_tag: text-classification |
|
|
library_name: transformers |
|
|
tags: |
|
|
- finance |
|
|
- cbdc |
|
|
- central-bank |
|
|
- financial-nlp |
|
|
- economic-policy |
|
|
- monetary-policy |
|
|
- sentence-classification |
|
|
- text-classification |
|
|
- transformers |
|
|
- bert |
|
|
- discourse-analysis |
|
|
- policy-analysis |
|
|
- centralbank-bert |
|
|
- bis-speeches |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
# CBDC-Discourse |
|
|
|
|
|
`CBDC-Discourse` is a **BERT-based sentence classifier** fine-tuned to categorize central bank digital currency (CBDC) discourse into three conceptually distinct classes: **Feature, Risk-Benefit, and Process**. |
|
|
This model enables structured analysis of CBDC-related policy and research texts by separating **design attributes**, **evaluative outcomes**, and **procedural activities**. |
|
|
|
|
|
| Class | Description | |
|
|
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | |
|
|
| **Feature** | A sentence that specifies a **concrete design element or operational mechanism** of CBDC. Examples include: wallet/card modality; programmability/smart contracts; privacy model; interoperability requirements; legal tender status; distribution via intermediaries; holding limits/caps; interest-bearing/remuneration (incl. negative rates); rulebook/scheme rules; settlement architecture (DLT/RPS/RTGS links). | |
|
|
| **Risk-Benefit** | A sentence that asserts or implies **outcomes, effects, or trade-offs** (positive or negative) from a CBDC feature or its introduction, including policy/equilibrium impacts. Examples include: faster/cheaper/more transparent cross-border payments; financial inclusion; regional cooperation; competition/innovation; sovereignty/autonomy; efficiency/productivity gains. Also, negative concerns such as bank disintermediation; cyber/operational risk; crisis flight from deposits; privacy harms; monetary/fiscal dominance concerns; โtoo successfulโ crowd-out; legal/regulatory fragility. | |
|
|
| **Process** | A sentence about **research, consultations, pilots, governance, timeline, or agenda-setting**, without specifying a concrete feature or claiming effects/trade-offs. Examples include: public consultations; surveys/focus groups; task forces; phases (investigation/preparation/pilot); rulebook drafting as an activity (absent specifics); reports/citations; statements of interest/attention; open questions; goal/timeline setting (e.g., โmedium-term goalโ). | |
|
|
|
|
|
|
|
|
## Base Model |
|
|
|
|
|
This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996โ2024)**. |
|
|
CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification. |
|
|
|
|
|
## Dataset |
|
|
|
|
|
The model was fine-tuned on a **manually annotated dataset of CBDC-related sentences** extracted from **Bank for International Settlements (BIS) central bank speeches (1996โ2024)**. |
|
|
The dataset was balanced across three discourse classes with a total of **2,886 sentences (962 per class)**: |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is designed for the **automatic classification of CBDC discourse** in policy, research, and financial communications. It enables researchers, analysts, and practitioners to distinguish whether a sentence describes **procedural aspects**, **design features**, or **evaluative outcomes** of central bank digital currencies. |
|
|
Such categorization supports **policy analysis, thematic mapping of central bank communication, and structured NLP-based research** in the fields of **finance, monetary economics, and economic policy**. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
* Tokenization: WordPiece (CentralBank-BERT tokenizer) |
|
|
* Maximum sequence length: 256 tokens |
|
|
* Dynamic padding (`DataCollatorWithPadding`) |
|
|
* Train/Val/Test split: 80/10/10 stratified by label |
|
|
|
|
|
| Parameter | Value | |
|
|
| ----------------------------- | --------------------------- | |
|
|
| Base model | [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) | |
|
|
| Epochs | 6 | |
|
|
| Train batch size (per device) | 8 | |
|
|
| Eval batch size (per device) | 16 | |
|
|
| Gradient accumulation | 2 | |
|
|
| Effective batch size | 16 | |
|
|
| Learning rate | 2e-5 | |
|
|
| Weight decay | 0.01 | |
|
|
| Warmup ratio | 0.06 | |
|
|
| Scheduler | Cosine | |
|
|
| Mixed precision (fp16) | Enabled | |
|
|
|
|
|
* Environment: Google Colab |
|
|
* GPU: Tesla T4 (16GB) |
|
|
* Framework: PyTorch 2.8.0 + Hugging Face Transformers |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
| Split | Accuracy | Macro-F1 | Weighted-F1 | Class | Precision | Recall | F1 | |
|
|
| ---------- | --------- | --------- | ----------- | ---------------- | --------- | ------ | ----- | |
|
|
| Validation | **0.851** | **0.839** | **0.852** | โ | โ | โ | โ | |
|
|
| Test | **0.823** | **0.803** | **0.825** | **Feature** | 0.759 | 0.782 | 0.770 | |
|
|
| | | | | **Process** | 0.927 | 0.845 | 0.884 | |
|
|
| | | | | **Risk-Benefit** | 0.700 | 0.817 | 0.754 | |
|
|
|
|
|
--- |
|
|
|
|
|
## Other CBDC Models |
|
|
|
|
|
This model is part of the **CentralBank-BERT / CBDC model family**, a suite of domain-adapted classifiers for analyzing central-bank communication. |
|
|
|
|
|
| **Model** | **Purpose** | **Intended Use** | **Link** | |
|
|
| ------------------------------- | ------------------------------------------------------------------- | ------------------------------------------------------------------- | ---------------------------------------------------------------------- | |
|
|
| **bilalzafar/CentralBank-BERT** | Domain-adaptive masked LM trained on BIS speeches (1996โ2024). | Base encoder for CBDC downstream tasks; fill-mask tasks. | [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) | |
|
|
| **bilalzafar/CBDC-BERT** | Binary classifier: CBDC vs. Non-CBDC. | Flagging CBDC-related discourse in large corpora. | [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT) | |
|
|
| **bilalzafar/CBDC-Stance** | 3-class stance model (Pro, Wait-and-See, Anti). | Research on policy stances and discourse monitoring. | [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance) | |
|
|
| **bilalzafar/CBDC-Sentiment** | 3-class sentiment model (Positive, Neutral, Negative). | Tone analysis in central bank communications. | [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment) | |
|
|
| **bilalzafar/CBDC-Type** | Classifies Retail, Wholesale, General CBDC mentions. | Distinguishing policy focus (retail vs wholesale). | [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type) | |
|
|
| **bilalzafar/CBDC-Discourse** | 3-class discourse classifier (Feature, Process, Risk-Benefit). | Structured categorization of CBDC communications. | [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse) | |
|
|
| **bilalzafar/CentralBank-NER** | Named Entity Recognition (NER) model for central banking discourse. | Identifying institutions, persons, and policy entities in speeches. | [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER) | |
|
|
|
|
|
|
|
|
## Repository and Replication Package |
|
|
|
|
|
All **training pipelines, preprocessing scripts, evaluation notebooks, and result outputs** are available in the companion GitHub repository: |
|
|
|
|
|
๐ **[https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)** |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load pipeline |
|
|
classifier = pipeline("text-classification", model="bilalzafar/CBDC-Discourse") |
|
|
|
|
|
# Example sentences |
|
|
sentences = [ |
|
|
"The central bank launched a pilot project for CBDC cross-border settlement.", # Process |
|
|
"Programmability in CBDC allows conditional payments.", # Feature |
|
|
"CBDC may increase risks of bank disintermediation." # Risk-Benefit |
|
|
] |
|
|
|
|
|
# Predict |
|
|
for s in sentences: |
|
|
result = classifier(s, return_all_scores=False)[0] |
|
|
print(f"{s}\n โ {result['label']} (score={result['score']:.4f})\n") |
|
|
|
|
|
# Example output |
|
|
# [{The central bank launched a pilot project for CBDC cross-border settlement. โ Process (score=0.9989)}] |
|
|
# [{Programmability in CBDC allows conditional payments. โ Feature (score=0.9991)}] |
|
|
# [{CBDC may increase risks of bank disintermediation. โ Risk-Benefit (score=0.9986)}] |
|
|
``` |
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite as: |
|
|
|
|
|
**Zafar, M. B. (2025). *CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse*. SSRN. [https://papers.ssrn.com/abstract=5404456](https://papers.ssrn.com/abstract=5404456)** |
|
|
|
|
|
```bibtex |
|
|
@article{zafar2025centralbankbert, |
|
|
title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse}, |
|
|
author={Zafar, Muhammad Bilal}, |
|
|
year={2025}, |
|
|
journal={SSRN Electronic Journal}, |
|
|
url={https://papers.ssrn.com/abstract=5404456} |
|
|
} |