SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning
Abstract
The first Arabic financial NLP benchmark covering seven tasks with expert-verified data reveals significant gaps between linguistic fluency and financial reasoning capabilities in large language models.
English financial NLP has advanced rapidly through benchmarks targeting earnings analysis, market sentiment, tabular reasoning, and financial question answering, yet Arabic financial NLP remains virtually nonexistent, despite 422 million speakers, 4.9 trillion in Gulf sovereign wealth, and a 4-5 trillion Islamic finance industry requiring specialized Shari'ah compliance over instruments like sukuk, murabaha, and takaful. We introduce Sahm, the first Arabic financial benchmark spanning seven tasks: AAOIFI standards QA, fatwa-based QA/MCQ, accounting and business exams, financial sentiment analysis, extractive summarization, and event-cause reasoning, comprising 14,380 expert-verified instances from authentic regulatory, juristic, and corporate sources. Evaluating 20 LLMs, we find Arabic fluency does not imply financial reasoning: models achieving 91% on recognition tasks drop sharply on generation, and event-cause reasoning exposes the widest performance gap (1.89-9.84/10). We release the benchmark and dataset to support trustworthy Arabic financial assistants.
Get this paper in your agent:
hf papers read 2604.19098 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper