Hypernet Scaling Law Data

Data assets for scaling-law and preservation (catastrophic forgetting) experiments.

Contents

  • OOD splits: train_ood_scaling_law.pq, valid_ood_scaling_law.pq, eval_ood_scaling_law.pq β€” train/valid/eval by domain (eval = held-out domains).
  • Scaling law: train_scaling_law.pq, valid_scaling_law.pq β€” 1hop/2hop/3hop QA.
  • With facts: train_scaling_law_with_facts.pq, valid_scaling_law_with_facts.pq β€” same + facts column from relation templates.
  • Preservation: preservation_train.pq, preservation_eval.pq (and preserve_data/, preserve_data_2hop/, preserve_data_combined/) β€” entities not in train, for preservation loss and eval.
  • Relation templates: relation_template_mapping.csv β€” relation label β†’ question template and noun_template for fact generation.
  • EDA: domain_counts_eda.csv, figures/ β€” domain and n_hop stats/plots.

Schema (parquet)

Canonical columns: triplet_subject, triplet_relation, triplet_object, question_prompt, answer.
Some files add n_hop, facts (list of strings), or domain.

Usage

import pandas as pd
from huggingface_hub import hf_hub_download

path = hf_hub_download(repo_id="nace-ai/hypernet-scaling-law-data", filename="train_ood_scaling_law.pq")
df = pd.read_parquet(path)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support