Update README.md
Browse files
README.md
CHANGED
|
@@ -29,3 +29,61 @@ This model enables structured analysis of CBDC-related policy and research texts
|
|
| 29 |
This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996–2024)**.
|
| 30 |
CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification.
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
This classifier is built on top of [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT), a **domain-adapted BERT model** pretrained on over **2 million sentences (\~66M tokens)** from **BIS central bank speeches (1996–2024)**.
|
| 30 |
CentralBank-BERT provides deep contextual understanding of **monetary policy, financial regulation, and central banking discourse**, making it an optimal foundation for downstream CBDC-related text classification.
|
| 31 |
|
| 32 |
+
## Dataset
|
| 33 |
+
|
| 34 |
+
The model was fine-tuned on a **manually annotated dataset of CBDC-related sentences** extracted from **Bank for International Settlements (BIS) central bank speeches (1996–2024)**.
|
| 35 |
+
The dataset was balanced across three discourse classes with a total of **2,886 sentences (962 per class)**:
|
| 36 |
+
|
| 37 |
+
## Intended Use
|
| 38 |
+
|
| 39 |
+
This model is designed for the **automatic classification of CBDC discourse** in policy, research, and financial communications. It enables researchers, analysts, and practitioners to distinguish whether a sentence describes **procedural aspects**, **design features**, or **evaluative outcomes** of central bank digital currencies.
|
| 40 |
+
Such categorization supports **policy analysis, thematic mapping of central bank communication, and structured NLP-based research** in the fields of **finance, monetary economics, and economic policy**.
|
| 41 |
+
|
| 42 |
+
## Training Details
|
| 43 |
+
|
| 44 |
+
* Tokenization: WordPiece (CentralBank-BERT tokenizer)
|
| 45 |
+
* Maximum sequence length: 256 tokens
|
| 46 |
+
* Dynamic padding (`DataCollatorWithPadding`)
|
| 47 |
+
* Train/Val/Test split: 80/10/10 stratified by label
|
| 48 |
+
|
| 49 |
+
| Parameter | Value |
|
| 50 |
+
| ----------------------------- | --------------------------- |
|
| 51 |
+
| Base model | [`bilalzafar/CentralBank-BERT`](https://huggingface.co/bilalzafar/CentralBank-BERT) |
|
| 52 |
+
| Epochs | 6 |
|
| 53 |
+
| Train batch size (per device) | 8 |
|
| 54 |
+
| Eval batch size (per device) | 16 |
|
| 55 |
+
| Gradient accumulation | 2 |
|
| 56 |
+
| Effective batch size | 16 |
|
| 57 |
+
| Learning rate | 2e-5 |
|
| 58 |
+
| Weight decay | 0.01 |
|
| 59 |
+
| Warmup ratio | 0.06 |
|
| 60 |
+
| Scheduler | Cosine |
|
| 61 |
+
| Mixed precision (fp16) | Enabled |
|
| 62 |
+
|
| 63 |
+
* Environment: Google Colab
|
| 64 |
+
* GPU: Tesla T4 (16GB)
|
| 65 |
+
* Framework: PyTorch 2.8.0 + Hugging Face Transformers
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
## Evaluation
|
| 69 |
+
|
| 70 |
+
### Validation (10%)
|
| 71 |
+
|
| 72 |
+
* Accuracy: **0.851**
|
| 73 |
+
* Macro-F1: **0.839**
|
| 74 |
+
* Weighted-F1: **0.852**
|
| 75 |
+
|
| 76 |
+
### Test (10%)
|
| 77 |
+
|
| 78 |
+
* Accuracy: **0.823**
|
| 79 |
+
* Macro-F1: **0.803**
|
| 80 |
+
* Weighted-F1: **0.825**
|
| 81 |
+
|
| 82 |
+
#### Per-class performance (Test)
|
| 83 |
+
|
| 84 |
+
| Class | Precision | Recall | F1 |
|
| 85 |
+
| ------------ | --------- | ------ | ----- |
|
| 86 |
+
| Feature | 0.759 | 0.782 | 0.770 |
|
| 87 |
+
| Process | 0.927 | 0.845 | 0.884 |
|
| 88 |
+
| Risk-Benefit | 0.700 | 0.817 | 0.754 |
|
| 89 |
+
|