bilalzafar commited on
Commit
bcd81d7
·
verified ·
1 Parent(s): ab90516

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -29,3 +29,61 @@ This model enables structured analysis of CBDC-related policy and research texts
29
  This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996–2024)**.
30
  CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification.
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996–2024)**.
30
  CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification.
31
 
32
+ ## Dataset
33
+
34
+ The model was fine-tuned on a **manually annotated dataset of CBDC-related sentences** extracted from **Bank for International Settlements (BIS) central bank speeches (1996–2024)**.
35
+ The dataset was balanced across three discourse classes with a total of **2,886 sentences (962 per class)**:
36
+
37
+ ## Intended Use
38
+
39
+ This model is designed for the **automatic classification of CBDC discourse** in policy, research, and financial communications. It enables researchers, analysts, and practitioners to distinguish whether a sentence describes **procedural aspects**, **design features**, or **evaluative outcomes** of central bank digital currencies.
40
+ Such categorization supports **policy analysis, thematic mapping of central bank communication, and structured NLP-based research** in the fields of **finance, monetary economics, and economic policy**.
41
+
42
+ ## Training Details
43
+
44
+ * Tokenization: WordPiece (CentralBank-BERT tokenizer)
45
+ * Maximum sequence length: 256 tokens
46
+ * Dynamic padding (`DataCollatorWithPadding`)
47
+ * Train/Val/Test split: 80/10/10 stratified by label
48
+
49
+ | Parameter | Value |
50
+ | ----------------------------- | --------------------------- |
51
+ | Base model | [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) |
52
+ | Epochs | 6 |
53
+ | Train batch size (per device) | 8 |
54
+ | Eval batch size (per device) | 16 |
55
+ | Gradient accumulation | 2 |
56
+ | Effective batch size | 16 |
57
+ | Learning rate | 2e-5 |
58
+ | Weight decay | 0.01 |
59
+ | Warmup ratio | 0.06 |
60
+ | Scheduler | Cosine |
61
+ | Mixed precision (fp16) | Enabled |
62
+
63
+ * Environment: Google Colab
64
+ * GPU: Tesla T4 (16GB)
65
+ * Framework: PyTorch 2.8.0 + Hugging Face Transformers
66
+
67
+
68
+ ## Evaluation
69
+
70
+ ### Validation (10%)
71
+
72
+ * Accuracy: **0.851**
73
+ * Macro-F1: **0.839**
74
+ * Weighted-F1: **0.852**
75
+
76
+ ### Test (10%)
77
+
78
+ * Accuracy: **0.823**
79
+ * Macro-F1: **0.803**
80
+ * Weighted-F1: **0.825**
81
+
82
+ #### Per-class performance (Test)
83
+
84
+ | Class | Precision | Recall | F1 |
85
+ | ------------ | --------- | ------ | ----- |
86
+ | Feature | 0.759 | 0.782 | 0.770 |
87
+ | Process | 0.927 | 0.845 | 0.884 |
88
+ | Risk-Benefit | 0.700 | 0.817 | 0.754 |
89
+