bilalzafar commited on
Commit
223ca7a
·
verified ·
1 Parent(s): 8aaab25

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -3
README.md CHANGED
@@ -1,3 +1,91 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ metrics:
6
+ - f1
7
+ - accuracy
8
+ base_model:
9
+ - bilalzafar/CentralBank-BERT
10
+ pipeline_tag: text-classification
11
+ tags:
12
+ - CBDC
13
+ - Central Bank Digital Currencies
14
+ - Central Bank Digital Currency
15
+ - Classification
16
+ - Wholesale CBDC
17
+ - Retail CBDC
18
+ - Central Bank
19
+ - Tone
20
+ - Finance
21
+ - NLP
22
+ - Finance NLP
23
+ - BERT
24
+ - Transformers
25
+ - Digital Currency
26
+ ---
27
+
28
+ # CBDC-Type-BERT: Classifying Retail vs Wholesale vs General CBDC Sentences
29
+
30
+ **A domain-specialized BERT classifier that labels central-bank text about CBDCs into three categories:**
31
+
32
+ * **Retail CBDC** – statements about a **general-purpose** CBDC for the public (households, merchants, wallets, offline use, legal-tender for everyday payments, holding limits, tiered remuneration, “digital euro/pound/rupee” for citizens, etc.).
33
+ * **Wholesale CBDC** – statements about a **financial-institution** CBDC (RTGS/settlement, DLT platforms, PvP/DvP, tokenised assets/markets, interbank use, central-bank reserves on ledger, etc.).
34
+ * **General/Unspecified** – CBDC mentions that **don’t clearly indicate retail or wholesale** scope, or discuss CBDCs at a conceptual/policy level without specifying the type.
35
+
36
+
37
+ **Training data:** 1,417 manually annotated CBDC sentences from BIS central-bank speeches — **Retail CBDC** (543), **Wholesale CBDC** (329), and **General/Unspecified** (545) — split **80/10/10** (train/validation/test) with stratification.
38
+
39
+ **Base model:** [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) - **CentralBank-BERT** is a domain-adapted BERT trained on \~2M sentences (66M tokens) of **central bank speeches** (BIS, 1996–2024). It captures monetary-policy and payments vocabulary far better than generic BERT, which materially helps downstream CBDC classification.
40
+
41
+ ---
42
+
43
+ ## Preprocessing, Class Weights & Training
44
+
45
+ * Text cleaning: minimal.
46
+ * Tokenizer: CentralBank-BERT WordPiece (max length **192**).
47
+ * **Class imbalance:** fewer **Wholesale** examples, hence used **inverse-frequency class weights** in `CrossEntropyLoss` to balance learning:
48
+ * General ≈ 0.866, Retail ≈ 0.870, Wholesale ≈ 1.436 (computed from train split).
49
+
50
+ * Optimizer: AdamW; **lr=2e-5**, **weight\_decay=0.01**, **warmup\_ratio=0.1**
51
+ * Batch sizes: train **8**, eval **16**; epochs: **5**; **fp16**
52
+ * Early stopping on validation **macro-F1** (patience=2), best model loaded at end.
53
+ * Hardware: single GPU (Colab).
54
+
55
+ ---
56
+
57
+ ## Performance & Evaluation
58
+
59
+ **Held-out test (10%)**
60
+
61
+ * **Accuracy:** **0.887**
62
+ * **F1 (macro):** **0.898**
63
+ * **F1 (weighted):** **0.887**
64
+
65
+ Class-wise F1 (test):
66
+
67
+ * **Retail:** \~0.86
68
+ * **Wholesale:** \~0.97
69
+ * **General:** \~0.86
70
+
71
+ ---
72
+
73
+ ## Usage
74
+
75
+ ```python
76
+ from transformers import pipeline
77
+
78
+ model_id = "your-username/cbdc-type-bert" # replace with your repo
79
+ clf = pipeline("text-classification", model=model_id, tokenizer=model_id,
80
+ truncation=True, max_length=192)
81
+
82
+ texts = [
83
+ "The digital euro will be available to citizens and merchants for daily payments.", # Retail
84
+ "DLT-based interbank settlement with a central bank liability will lower PvP risk.", # Wholesale
85
+ "Several central banks are assessing CBDCs to modernise payments and policy transmission." # General
86
+ ]
87
+
88
+ for t in texts:
89
+ out = clf(t)[0]
90
+ print(f"{out['label']:>20} {out['score']:.3f} | {t}")
91
+ ```