Buckets:

HCAI-Lab/dolma3-6t-sample-500-docs / sample_contract.json
glennmatlin's picture
download
raw
479 Bytes
{
"WORKING_SAMPLE_TOKEN_FLOOR_PER_BIN": 0,
"WORKING_SAMPLE_DOCS_PER_BIN": 500,
"WORKING_SAMPLE_GLOBAL_TOKEN_BUDGET": null,
"WORKING_SAMPLE_MIN_TOKEN_COUNT": 512,
"WORKING_SAMPLE_MAX_TOKEN_COUNT": null,
"WORKING_SAMPLE_REALIZED_TOKEN_TOTAL": 705194520,
"WORKING_SAMPLE_REALIZED_DOC_COUNT": 287793,
"WORKING_SAMPLE_UNDERFILLED_BIN_COUNT": 1,
"WORKING_SAMPLE_COVERED_BIN_COUNT": 576,
"WORKING_SAMPLE_TOTAL_BIN_COUNT": 576,
"WORKING_SAMPLE_SAMPLING_SEED": 42
}

Xet Storage Details

Size:
479 Bytes
·
Xet hash:
a758339d6881d226ce8534b9d5ce88e6d329438cdc3fec334afa562feb9b9ba8

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.