CBDC-Type / README.md

Update README.md

1f8febd verified 6 months ago

7.38 kB

	---
	license: mit
	language:
	- en
	metrics:
	- f1
	- accuracy
	base_model:
	- bilalzafar/CentralBank-BERT
	pipeline_tag: text-classification
	tags:
	- CBDC
	- Central Bank Digital Currencies
	- Central Bank Digital Currency
	- Classification
	- Wholesale CBDC
	- Retail CBDC
	- Central Bank
	- Tone
	- Finance
	- NLP
	- Finance NLP
	- BERT
	- Transformers
	- Digital Currency
	library_name: transformers
	---

	# CBDC-Type-BERT: Classifying Retail vs Wholesale vs General CBDC Sentences
	A domain-specialized BERT classifier that labels central-bank text about CBDCs into three categories:
	* Retail CBDC – statements about a general-purpose CBDC for the public (households, merchants, wallets, offline use, legal-tender for everyday payments, holding limits, tiered remuneration, “digital euro/pound/rupee” for citizens, etc.).
	* Wholesale CBDC – statements about a financial-institution CBDC (RTGS/settlement, DLT platforms, PvP/DvP, tokenised assets/markets, interbank use, central-bank reserves on ledger, etc.).
	* General/Unspecified – CBDC mentions that don’t clearly indicate retail or wholesale scope, or discuss CBDCs at a conceptual/policy level without specifying the type.

	Training data: 1,417 manually annotated CBDC sentences from BIS central-bank speeches — Retail CBDC (543), Wholesale CBDC (329), and General/Unspecified (545) — split 80/10/10 (train/validation/test) with stratification.

	Base model: [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) - CentralBank-BERT is a domain-adapted BERT trained on \~2M sentences (66M tokens) of central bank speeches (BIS, 1996–2024). It captures monetary-policy and payments vocabulary far better than generic BERT, which materially helps downstream CBDC classification.

	## Preprocessing, Class Weights & Training
	Performed light manual cleaning (trimming whitespace, normalizing quotes/dashes, de-duplication, dropping nulls) and tokenized with [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT)’s WordPiece (max length 192). Because Wholesale had fewer examples, we applied inverse-frequency class weights in `CrossEntropyLoss` to balance learning (train-split weights ≈ General 0.866, Retail 0.870, Wholesale 1.436). The model was fine-tuned with AdamW (lr 2e-5, weight decay 0.01, warmup ratio 0.1), batch sizes 8/16 (train/eval), for 5 epochs with fp16 mixed precision. Early stopping monitored validation macro-F1 (patience = 2), and the best checkpoint was restored at the end. Training ran on a single Colab GPU.

	## Performance & Evaluation
	On a 10% held-out test set, the model achieved 88.7% accuracy, 0.898 macro-F1, and 0.887 weighted-F1. Class-wise, performance was strong across categories, with Retail ≈ 0.86 F1, Wholesale ≈ 0.97 F1, and General ≈ 0.86 F1, indicating particularly high precision/recall on Wholesale, and balanced, reliable performance on Retail and General.

	---

	## Other CBDC Models

	This model is part of the CentralBank-BERT / CBDC model family, a suite of domain-adapted classifiers for analyzing central-bank communication.

	\| Model \| Purpose \| Intended Use \| Link \|
	\| ------------------------------- \| ------------------------------------------------------------------- \| ------------------------------------------------------------------- \| ---------------------------------------------------------------------- \|
	\| bilalzafar/CentralBank-BERT \| Domain-adaptive masked LM trained on BIS speeches (1996–2024). \| Base encoder for CBDC downstream tasks; fill-mask tasks. \| [CentralBank-BERT](https://huggingface.co/bilalzafar/CentralBank-BERT) \|
	\| bilalzafar/CBDC-BERT \| Binary classifier: CBDC vs. Non-CBDC. \| Flagging CBDC-related discourse in large corpora. \| [CBDC-BERT](https://huggingface.co/bilalzafar/CBDC-BERT) \|
	\| bilalzafar/CBDC-Stance \| 3-class stance model (Pro, Wait-and-See, Anti). \| Research on policy stances and discourse monitoring. \| [CBDC-Stance](https://huggingface.co/bilalzafar/CBDC-Stance) \|
	\| bilalzafar/CBDC-Sentiment \| 3-class sentiment model (Positive, Neutral, Negative). \| Tone analysis in central bank communications. \| [CBDC-Sentiment](https://huggingface.co/bilalzafar/CBDC-Sentiment) \|
	\| bilalzafar/CBDC-Type \| Classifies Retail, Wholesale, General CBDC mentions. \| Distinguishing policy focus (retail vs wholesale). \| [CBDC-Type](https://huggingface.co/bilalzafar/CBDC-Type) \|
	\| bilalzafar/CBDC-Discourse \| 3-class discourse classifier (Feature, Process, Risk-Benefit). \| Structured categorization of CBDC communications. \| [CBDC-Discourse](https://huggingface.co/bilalzafar/CBDC-Discourse) \|
	\| bilalzafar/CentralBank-NER \| Named Entity Recognition (NER) model for central banking discourse. \| Identifying institutions, persons, and policy entities in speeches. \| [CentralBank-NER](https://huggingface.co/bilalzafar/CentralBank-NER) \|


	## Repository and Replication Package

	All training pipelines, preprocessing scripts, evaluation notebooks, and result outputs are available in the companion GitHub repository:

	🔗 [https://github.com/bilalezafar/CentralBank-BERT](https://github.com/bilalezafar/CentralBank-BERT)

	---

	## Usage

	```python
	from transformers import pipeline

	# Load pipeline
	classifier = pipeline("text-classification", model="bilalzafar/CBDC-Type")

	# Example sentences
	sentences = [
	"The digital euro will be available to citizens and merchants for daily payments.", # Retail
	"DLT-based interbank settlement with a central bank liability will lower PvP risk.", # Wholesale
	"Several central banks are assessing CBDCs to modernise payments and policy transmission." # General
	]

	# Predict
	for s in sentences:
	result = classifier(s, return_all_scores=False)[0]
	print(f"{s}\n → {result['label']} (score={result['score']:.4f})\n")

	# Example output
	# [{The digital euro will be available to citizens and merchants for daily payments. → Retail CBDC (score=0.9985)}]
	# [{DLT-based interbank settlement with a central bank liability will lower PvP risk. → Wholesale CBDC (score=0.9974)}]
	# [{Several central banks are assessing CBDCs to modernise payments and policy transmission. → General/Unspecified (score=0.9979)}]

	```
	---

	## Citation

	If you use this model, please cite as:

	*Zafar, M. B. (2025). CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse. SSRN. [https://papers.ssrn.com/abstract=5404456](https://papers.ssrn.com/abstract=5404456)*

	```bibtex
	@article{zafar2025centralbankbert,
	title={CentralBank-BERT: Machine Learning Evidence on Central Bank Digital Currency Discourse},
	author={Zafar, Muhammad Bilal},
	year={2025},
	journal={SSRN Electronic Journal},
	url={https://papers.ssrn.com/abstract=5404456}
	}