leideng/QCFuse / run_longbench_preprocess.sh
leideng's picture
download
raw
292 Bytes
python3 data/build_longbench_data.py \
--input_dir data/raw_longbench \
--output_dir data/final_data \
--tokenizer_path /public/models/Qwen3-8B \
--embedding_model /public/models/embedding/bge-m3 \
--chunk_size 512 \
--chunk_overlap 50 \
--context_topk 20 \
--max_samples 200

Xet Storage Details

Size:
292 Bytes
·
Xet hash:
6cd4a36eaa689c664da714276c6b6e5625ba0957dc772ae0074740e21fbbdcd3

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.