Buckets:

HCAI-Lab/dolma3-6t-sample-50000-docs / sample_contract.json
glennmatlin's picture
download
raw
486 Bytes
{
"WORKING_SAMPLE_TOKEN_FLOOR_PER_BIN": 0,
"WORKING_SAMPLE_DOCS_PER_BIN": 50000,
"WORKING_SAMPLE_GLOBAL_TOKEN_BUDGET": null,
"WORKING_SAMPLE_MIN_TOKEN_COUNT": 512,
"WORKING_SAMPLE_MAX_TOKEN_COUNT": null,
"WORKING_SAMPLE_REALIZED_TOKEN_TOTAL": 62819501017,
"WORKING_SAMPLE_REALIZED_DOC_COUNT": 26249124,
"WORKING_SAMPLE_UNDERFILLED_BIN_COUNT": 79,
"WORKING_SAMPLE_COVERED_BIN_COUNT": 576,
"WORKING_SAMPLE_TOTAL_BIN_COUNT": 576,
"WORKING_SAMPLE_SAMPLING_SEED": 42
}

Xet Storage Details

Size:
486 Bytes
·
Xet hash:
6d7a28e8dec721c1c8b40545ad25c8cbb660681265a10abb6ef336a654fb55ab

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.