HC-BERT-IFIs

HC-BERT-IFIs is a BERT-base sentence-level classifier fine-tuned to identify human capital (HC) sentences in Islamic banking / Islamic finanancial institutions (IFIs) annual-report disclosures.

Labels

0 = Non-HC
1 = HC

Training setup

HC-BERT-IFIs was fine-tuned from bert-base-uncased for binary sentence classification (HC = 1, Non-HC = 0) using the Hugging Face Transformers Trainer on GPU with mixed precision (fp16). Sentences were tokenized with the BERT tokenizer using truncation and padding to a maximum sequence length of 128. Training used a learning rate of 2e-5, 3 epochs, weight decay = 0.01, and batch sizes of 16 (training) and 32 (evaluation), with evaluation and checkpoint saving performed each epoch. The best checkpoint was selected using validation F1 (load_best_model_at_end=True) and saved as the final model artifacts for inference. See pipeline and notebooks at GitHub.

Training/Evaluation dataset

Training and evaluation used a balanced sentence-level dataset of 98,298 observations, comprising 49,149 HC and 49,149 Non-HC sentences. The data were split into 78,638 training instances, 9,830 validation instances, and 9,830 test instances. Sentences were extracted from Islamic bank annual reports covering 86 banks across 21 countries over 2015–2023 (638 reports), then cleaned and balanced for supervised fine-tuning and held-out evaluation (details on data curation and dataset construction are available on GitHub).

Held-out test performance

On the held-out test set, HC-BERT-IFIs achieved 0.9706 accuracy and an F1-score of 0.9700, with precision = 0.9903 and recall = 0.9506. The corresponding confusion matrix was [[4869, 46], [243, 4672]], indicating very few false positives and a small number of false negatives.

A baseline HC classifier shared by Demers et al. (2024) was evaluated on the same test set for comparison.
HC-BERT-IFIs achieved higher overall performance (F1 ≈ 0.9700 vs 0.9087), and the two models showed 87.7% agreement, supporting stability of HC classification in the Islamic banking disclosure context.

Repository (GitHub)

Full pipeline notebooks, lexicon files, and reproducible experiments are available at GitHub: bilalezafar/Human-Capital-Islamic-banks

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "bilalzafar/HC-BERT-IFIs"

tok = AutoTokenizer.from_pretrained(model_id)
mdl = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "The bank expanded staff training and development programs across branches."
inputs = tok(text, return_tensors="pt", truncation=True, padding=True, max_length=128)

with torch.no_grad():
    probs = torch.softmax(mdl(**inputs).logits, dim=-1).squeeze()

pred = int(torch.argmax(probs).item())
label = "HC" if pred == 1 else "Non-HC"
score = float(probs[pred].item())

print(f"Classification: {label} | Score: {score:.5f}")

Citation

If you use this repository, model, or replication materials in your research, please cite:

Zafar, M. B. (2026). Human capital disclosure in Islamic banks: A multi-method analysis using machine learning. Journal of Intellectual Capital. https://doi.org/10.1108/JIC-06-2025-0251

BibTeX

@article{zafar2026human,
  title   = {Human Capital Disclosure in Islamic Banks: A Multi-Method Analysis Using Machine Learning},
  author  = {Zafar, Muhammad Bilal},
  year    = {2026},
  journal = {Journal of Intellectual Capital},
  doi     = {10.1108/JIC-06-2025-0251}
}

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for bilalzafar/HC-BERT-IFIs

Base model

google-bert/bert-base-uncased

Finetuned

(6784)

this model