Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
Paper
โข 2603.12180 โข Published
โข 39
This organization is maintained by the transformers team at Hugging Face and contains checkpoints of segmentation models such as SamHQ.
from datasets import load_dataset
pdfa_dataset = load_dataset('pixparse/pdfa-eng-wds', streaming=True)
IDL_dataset = load_dataset('pixparse/idl-wds', streaming=True)import chug
task_cfg = chug.DataTaskDocReadCfg(
page_sampling='all',
)
data_cfg = chug.DataCfg(
source='pixparse/pdfa-eng-wds',
split='train',
batch_size=None,
format='hfids',
num_workers=0,
)
data_loader = chug.create_loader(
data_cfg,
task_cfg,
)
sample = next(iter(data_loader))