Update README.md

138e265 verified 14 days ago

5.16 kB

	---
	license: mit
	language:
	- en
	metrics:
	- accuracy
	- f1
	base_model:
	- climatebert/distilroberta-base-climate-detector
	pipeline_tag: text-classification
	tags:
	- islamic finance
	- islamic banks
	- text classification
	- climate
	- binary classification
	- NLP
	- finance
	---


	# Islamic-FinClimateBERT: Fine-Tuned ClimateBERT for Islamic Finance Climate Discourse

	A domain-adapted binary classifier fine-tuned on climate-related vs. non-climate sentences from Islamic finance corpora. This model is based on [`ClimateBERT`](https://huggingface.co/climatebert/distilroberta-base-climate-detector) and is specialized for detecting climate relevance in Islamic financial narratives.


	## Model Summary

	- Base model: [`ClimateBERT`](https://huggingface.co/climatebert/distilroberta-base-climate-detector)
	- Architecture: RoBERTa-based, distilled
	- Task: Binary sentence classification
	- Domain: Islamic Finance + Climate Discourse
	- Labels:
	- `0` → Not Climate-Relevant
	- `1` → Climate-Relevant
	- Language: English (Islamic finance-specific vocabulary)
	- Training Data Size: 1,132 manually annotated sentences


	## Training Pipeline
	- Framework: Hugging Face `transformers` + `datasets`
	- Tokenizer: ClimateBERT tokenizer (BPE)
	- Training split: Stratified 80/20 (train/test)
	- Evaluation metric: F1 (macro), accuracy
	- Optimizer: AdamW with weight decay
	- Epochs: 4
	- Batch size: 16
	- Precision: FP16 enabled

	### Evaluation
	\| Metric \| Value \|
	\|------------\|-----------\|
	\| Accuracy \| 0.9868 \|
	\| F1-score \| 0.9868 \|
	\| Eval loss \| 0.0553 \|

	---

	## Evaluation & Domain Comparison

	The Islamic-FinClimateBERT model was evaluated against the original [`ClimateBERT`](https://huggingface.co/climatebert/distilroberta-base-climate-detector) using 79,876 sentence-level samples extracted from 838 annual reports of 103 Islamic banks across 25 jurisdictions (2015–2024).

	This comparative evaluation assesses how domain fine-tuning affects climate relevance detection within Islamic finance discourse.

	### Evaluation Summary

	\| Metric \| Fine-Tuned \| Original \| Description \|
	\|------------------\|------------\|-----------\|--------------\|
	\| Total Sentences \| 79,876 \| – \| Sentences compared 1-to-1 \|
	\| Agreements \| 70,209 \| – \| Sentences where both models agreed \|
	\| Disagreements \| 9,667 \| – \| Sentences with differing predictions \|
	\| Overall Accuracy \| 0.88 \| – \| Agreement between models \|


	### Classification Report (Fine-Tuned vs. Original)

	\| Label \| Precision \| Recall \| F1-score \| Support \|
	\|:------\|:-----------:\|:--------:\|:----------:\|:---------:\|
	\| Climate \| 0.92 \| 0.83 \| 0.87 \| 39,558 \|
	\| Non-Climate \| 0.85 \| 0.93 \| 0.89 \| 40,318 \|
	\| Overall Accuracy \| – \| – \| 0.88 \| 79,876 \|
	\| Macro Avg \| 0.88 \| 0.88 \| 0.88 \| – \|

	### Confusion Matrix

	\| \| Fine = Climate \| Fine = Non-Climate \|
	\|----------------------:\|------------------:\|-----------------------:\|
	\| Orig = Climate \| 32,887 \| 6,671 \|
	\| Orig = Non-Climate\| 2,996 \| 37,322 \|

	- The fine-tuned model shows strong domain adaptation, improving contextual sensitivity to Islamic finance climate narratives.
	- It tends to classify fewer sentences as “climate-relevant” compared to the base model, reflecting a more conservative and context-aware understanding of climate-related terminology in Islamic finance reporting.

	---
	## GitHub Repository

	The full project repository, including training notebooks, dataset scripts, and evaluation pipelines, is available at [https://github.com/bilalezafar/Islamic-FinClimateBERT](https://github.com/bilalezafar/Islamic-FinClimateBERT).

	---

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained("bilalzafar/Islamic-FinClimateBERT")
	model = AutoModelForSequenceClassification.from_pretrained("bilalzafar/Islamic-FinClimateBERT")

	# Define classifier function
	def clf(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=-1)
	label = probs.argmax().item()
	score = probs.max().item()
	return [{"label": "Climate" if label == 1 else "Not Climate", "score": round(score, 4)}]

	# Example usage
	text = "The bank’s green sukuk issuance aims to support renewable energy projects in the country."
	print(clf(text)[0])

	# Example output: {'label': 'Climate', 'score': 0.9995}
	```

	---
	## Citation

	```bibtex
	@article{zafar2026islamicfinclimatebert,
	title = {Talk or Action? Unveiling the Nature and Depth of Climate Disclosures in Islamic Banks Using Machine Learning},
	author = {Zafar, Muhammad Bilal},
	journal = {Borsa Istanbul Review},
	year = {2026},
	doi = {10.1016/j.bir.2026.100789}
	}
	```

	Zafar, M. B. (2026). Talk or action? Unveiling the nature and depth of climate disclosures in Islamic banks using machine learning. Borsa Istanbul Review. https://doi.org/10.1016/j.bir.2026.100789