Buckets:

HCAI-Lab/dolma3-6t-sample-10000-docs / sample_contract.json
glennmatlin's picture
download
raw
485 Bytes
{
"WORKING_SAMPLE_TOKEN_FLOOR_PER_BIN": 0,
"WORKING_SAMPLE_DOCS_PER_BIN": 10000,
"WORKING_SAMPLE_GLOBAL_TOKEN_BUDGET": null,
"WORKING_SAMPLE_MIN_TOKEN_COUNT": 512,
"WORKING_SAMPLE_MAX_TOKEN_COUNT": null,
"WORKING_SAMPLE_REALIZED_TOKEN_TOTAL": 13659019359,
"WORKING_SAMPLE_REALIZED_DOC_COUNT": 5620463,
"WORKING_SAMPLE_UNDERFILLED_BIN_COUNT": 30,
"WORKING_SAMPLE_COVERED_BIN_COUNT": 576,
"WORKING_SAMPLE_TOTAL_BIN_COUNT": 576,
"WORKING_SAMPLE_SAMPLING_SEED": 42
}

Xet Storage Details

Size:
485 Bytes
·
Xet hash:
ef347ee9504b58601b69cbb2c0d01f771d6a100bf7eab443440e1779395a8a62

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.