DABStep submission validator rejects valid string columns under pandas StringDtype

#26
by JunhaoChen - opened

DABStep submission validator rejects valid string columns under pandas StringDtype

Hi DABStep team,

The submission endpoint appears to reject valid submissions with the error:

Columns with non-string data type: task_id, agent_answer

This happens even for a minimal one-line JSONL where both fields are explicit JSON strings:

{"task_id": "x", "agent_answer": "y"}

Root cause

I checked the public Space source. The validator currently does:

submission_df = pd.read_json(submission_file, lines=True, dtype=str)

non_string_columns = [
    col for col in submission_df.columns
    if submission_df[col].dtype != "object"
]

This check is brittle under newer pandas string dtype behavior.

With pandas==2.3.3 and string inference enabled,
pd.read_json(..., dtype=str) returns StringDtype, not object, even though the column contains valid strings.


Reproduction

pandas 2.3.3, default:
  dtypes: object
  validator passes

pandas 2.3.3, PANDAS_FUTURE_INFER_STRING=1:
  dtypes: StringDtype
  validator fails with task_id, agent_answer

Environment (current Space)

datasets==4.8.1
gradio==6.10.0
pandas==2.3.3
pyarrow==24.0.0

This likely explains why submissions that worked before the Apr 27–28 Space updates now fail without any change in JSONL format.


Suggested fixes

Option 1: Use proper string dtype check

from pandas.api.types import is_string_dtype

non_string_columns = [
    col for col in submission_df.columns
    if not is_string_dtype(submission_df[col])
]

Option 2: Cast before validation

submission_df = submission_df.astype(object)

Option 3: Pin / adjust environment

Ensure pd.read_json(..., dtype=str) returns object dtype as before.


Thanks!

since the day LB went down from HF-side, we are not able to do any submissions basically. It's been 3 days. https://huggingface.co/spaces/adyen/DABstep/discussions/24 Now the issue is on the benchmark side but no-one is answering. I might say that Im really disappointed with the maintenance of this challenge including how late I got answers to my questions in the past week. Very vey disappointing.

I’ve opened a PR for the same issue
hopefully they fix it quick : )

@frisokingma @jeanmarcs @drublackberry @martinigoyanes @antonioramos @MindyKasting @davidlever @rokpopov @JorgeZapa @AaronAtAdyen @andreumora @KoenRoelofs @hannav @sergioadyen @zoranaAtadyen @wolfsinemm @BelleB @moktay @lchumaceiro @olgakostinaadyen @robertAdyen @tomjadams

Can someone from this company, for the sake of kindness and respectfullnes, let us know that if this benchmark is maintained at all? If not, that's fine, just let us know so that we don't need to waste our time.

@frisokingma @jeanmarcs @drublackberry @martinigoyanes @antonioramos @MindyKasting @davidlever @rokpopov @JorgeZapa @AaronAtAdyen @andreumora @KoenRoelofs @hannav @sergioadyen @zoranaAtadyen @wolfsinemm @BelleB @moktay @lchumaceiro @olgakostinaadyen @robertAdyen @tomjadams @iadyen @eggie5-adyen @martinigoyanes @andreumora

Can someone from this company, for the sake of kindness and respectfullnes, let us know that if this benchmark is maintained at all? If not, that's fine, just let us know so that we don't need to waste our time.

Sign up or log in to comment