Update README.md

Browse files

Files changed (1) hide show

README.md +91 -3

README.md CHANGED Viewed

@@ -1,3 +1,91 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+metrics:
+- f1
+- accuracy
+base_model:
+- bilalzafar/CentralBank-BERT
+pipeline_tag: text-classification
+tags:
+- CBDC
+- Central Bank Digital Currencies
+- Central Bank Digital Currency
+- Classification
+- Wholesale CBDC
+- Retail CBDC
+- Central Bank
+- Tone
+- Finance
+- NLP
+- Finance NLP
+- BERT
+- Transformers
+- Digital Currency
+---
+# CBDC-Type-BERT: Classifying Retail vs Wholesale vs General CBDC Sentences
+**A domain-specialized BERT classifier that labels central-bank text about CBDCs into three categories:**
+* **Retail CBDC** – statements about a **general-purpose** CBDC for the public (households, merchants, wallets, offline use, legal-tender for everyday payments, holding limits, tiered remuneration, “digital euro/pound/rupee” for citizens, etc.).
+* **Wholesale CBDC** – statements about a **financial-institution** CBDC (RTGS/settlement, DLT platforms, PvP/DvP, tokenised assets/markets, interbank use, central-bank reserves on ledger, etc.).
+* **General/Unspecified** – CBDC mentions that **don’t clearly indicate retail or wholesale** scope, or discuss CBDCs at a conceptual/policy level without specifying the type.
+**Training data:** 1,417 manually annotated CBDC sentences from BIS central-bank speeches — **Retail CBDC** (543), **Wholesale CBDC** (329), and **General/Unspecified** (545) — split **80/10/10** (train/validation/test) with stratification.
+**Base model:** [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) - **CentralBank-BERT** is a domain-adapted BERT trained on \~2M sentences (66M tokens) of **central bank speeches** (BIS, 1996–2024). It captures monetary-policy and payments vocabulary far better than generic BERT, which materially helps downstream CBDC classification.
+---
+## Preprocessing, Class Weights & Training
+* Text cleaning: minimal.
+* Tokenizer: CentralBank-BERT WordPiece (max length **192**).
+* **Class imbalance:** fewer **Wholesale** examples, hence used **inverse-frequency class weights** in `CrossEntropyLoss` to balance learning:
+  * General ≈ 0.866, Retail ≈ 0.870, Wholesale ≈ 1.436 (computed from train split).
+* Optimizer: AdamW; **lr=2e-5**, **weight\_decay=0.01**, **warmup\_ratio=0.1**
+* Batch sizes: train **8**, eval **16**; epochs: **5**; **fp16**
+* Early stopping on validation **macro-F1** (patience=2), best model loaded at end.
+* Hardware: single GPU (Colab).
+---
+## Performance & Evaluation
+**Held-out test (10%)**
+* **Accuracy:** **0.887**
+* **F1 (macro):** **0.898**
+* **F1 (weighted):** **0.887**
+Class-wise F1 (test):
+* **Retail:** \~0.86
+* **Wholesale:** \~0.97
+* **General:** \~0.86
+---
+## Usage
+```python
+from transformers import pipeline
+model_id = "your-username/cbdc-type-bert"  # replace with your repo
+clf = pipeline("text-classification", model=model_id, tokenizer=model_id,
+               truncation=True, max_length=192)
+texts = [
+    "The digital euro will be available to citizens and merchants for daily payments.",         # Retail
+    "DLT-based interbank settlement with a central bank liability will lower PvP risk.",       # Wholesale
+    "Several central banks are assessing CBDCs to modernise payments and policy transmission." # General
+]
+for t in texts:
+    out = clf(t)[0]
+    print(f"{out['label']:>20}  {out['score']:.3f}  |  {t}")
+```