| python3 data/build_longbench_data.py \ | |
| --input_dir data/raw_longbench \ | |
| --output_dir data/final_data \ | |
| --tokenizer_path /public/models/Qwen3-8B \ | |
| --embedding_model /public/models/embedding/bge-m3 \ | |
| --chunk_size 512 \ | |
| --chunk_overlap 50 \ | |
| --context_topk 20 \ | |
| --max_samples 200 | |
Xet Storage Details
- Size:
- 292 Bytes
- Xet hash:
- 6cd4a36eaa689c664da714276c6b6e5625ba0957dc772ae0074740e21fbbdcd3
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.