DABstep

Running on CPU Upgrade

App Files Files Community

Availability of Submission data via HF pull

#15

by skylord - opened Dec 15, 2025

Discussion

skylord

Dec 15, 2025

•

edited Dec 15, 2025

Hi Team

I can easily pull all the submission data using the HF hub API. As a submitter I would prefer that my submissions are not open to my competition. This also creates an opportunity for spoofers to make submissions which can easliy overfit the leaderboard. Another point is that if the test set used is static, I could use a call made to the API to allow my LLM to train on the data and hit the top of the leaderboard.

I am certain HF hub can allow you to make the submission files private.

My suggestion is to leverage the LLM to make the test set dynamic by generating unique, non-repeating questions. This better reflects the inherently dynamic nature of financial analysis. There are more sophisticated approaches to enhance this further, which we can discuss separately.

Script that I have used to download other submission files. An SLM could now be used to train on these files and generate a reasoning trace.

from huggingface_hub import list_repo_files

# List all files in the dataset repository
files = list_repo_files(repo_id="adyen/DABstep", repo_type="dataset")

all_files = []
print("Files found in repo:")
for f in files:
    all_files.append(f)

for filename in all_files:
    if filename in ['.gitattributes', '.gitignore', 'LICENSE',]:
        continue 
    hf_hub_download(
        repo_id="adyen/DABstep",
        repo_type="dataset",
        filename=filename,
        local_dir=".",
        force_download=False
    )

justinlangsethgenesis

Dec 15, 2025

Agree, the submission and score files should not be available for download.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment