arxiv:2602.03709

No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding

Published on Feb 3

· Submitted by

Xingwei Tan on Feb 4

Upvote

Authors:

Xingwei Tan ,

Abstract

Multi-hop question answering dataset ID-MoCQA assesses cultural understanding in large language models through Indonesian traditions with diverse reasoning chains.

AI-generated summary

Understanding culture requires reasoning across context, tradition, and implicit social knowledge, far beyond recalling isolated facts. Yet most culturally focused question answering (QA) benchmarks rely on single-hop questions, which may allow models to exploit shallow cues rather than demonstrate genuine cultural reasoning. In this work, we introduce ID-MoCQA, the first large-scale multi-hop QA dataset for assessing the cultural understanding of large language models (LLMs), grounded in Indonesian traditions and available in both English and Indonesian. We present a new framework that systematically transforms single-hop cultural questions into multi-hop reasoning chains spanning six clue types (e.g., commonsense, temporal, geographical). Our multi-stage validation pipeline, combining expert review and LLM-as-a-judge filtering, ensures high-quality question-answer pairs. Our evaluation across state-of-the-art models reveals substantial gaps in cultural reasoning, particularly in tasks requiring nuanced inference. ID-MoCQA provides a challenging and essential benchmark for advancing the cultural competency of LLMs.

View arXiv page View PDF Add to collection

Community

XingweiT

Paper author Paper submitter about 7 hours ago

To move beyond simple fact-recalling, researchers have introduced ID-MoCQA, the first large-scale multi-hop reasoning dataset focused on Indonesian culture.

The Problem: Most AI benchmarks use "single-hop" questions that models can answer using surface-level patterns rather than true cultural understanding.
The Solution: ID-MoCQA uses a framework to turn simple facts into complex reasoning chains across six categories (like geography and tradition) in both English and Indonesian.
The Finding: Current LLMs struggle significantly with these complex cultural inferences, highlighting a major gap in their "cultural intelligence."

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03709 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03709 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.