Spaces:
Runtime error
Availability of Submission data via HF pull
Hi Team
I can easily pull all the submission data using the HF hub API. As a submitter I would prefer that my submissions are not open to my competition. This also creates an opportunity for spoofers to make submissions which can easliy overfit the leaderboard. Another point is that if the test set used is static, I could use a call made to the API to allow my LLM to train on the data and hit the top of the leaderboard.
I am certain HF hub can allow you to make the submission files private.
My suggestion is to leverage the LLM to make the test set dynamic by generating unique, non-repeating questions. This better reflects the inherently dynamic nature of financial analysis. There are more sophisticated approaches to enhance this further, which we can discuss separately.
Script that I have used to download other submission files. An SLM could now be used to train on these files and generate a reasoning trace.
from huggingface_hub import list_repo_files
# List all files in the dataset repository
files = list_repo_files(repo_id="adyen/DABstep", repo_type="dataset")
all_files = []
print("Files found in repo:")
for f in files:
all_files.append(f)
for filename in all_files:
if filename in ['.gitattributes', '.gitignore', 'LICENSE',]:
continue
hf_hub_download(
repo_id="adyen/DABstep",
repo_type="dataset",
filename=filename,
local_dir=".",
force_download=False
)
Agree, the submission and score files should not be available for download.