Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
yuccaaa
/
nas
like
0
TensorBoard
Safetensors
License:
apache-2.0
Model card
Files
Files and versions
xet
Metrics
Training metrics
Community
main
nas
/
pretrain_data
1 contributor
History:
134 commits
yuccaaa
Upload pretrain_data/nan/math_thinking/data/train-00007-of-00010.parquet with huggingface_hub
b3eec9a
verified
6 months ago
cot
Upload pretrain_data/cot/science/train-00003-of-00004.parquet with huggingface_hub
6 months ago
instruct
Upload pretrain_data/instruct/cot.jsonl with huggingface_hub
6 months ago
nan
Upload pretrain_data/nan/math_thinking/data/train-00007-of-00010.parquet with huggingface_hub
6 months ago
.gitattributes
2.6 kB
Upload pretrain_data/.gitattributes with huggingface_hub
6 months ago
README.md
Safe
24 Bytes
Upload pretrain_data/README.md with huggingface_hub
6 months ago
clean_OntoProtein.jsonl
Safe
509 MB
xet
Upload pretrain_data/clean_OntoProtein.jsonl with huggingface_hub
6 months ago
clean_bio.jsonl
Safe
321 MB
xet
Upload pretrain_data/clean_bio.jsonl with huggingface_hub
6 months ago
clean_pmc_full_text.jsonl
11.5 GB
xet
Upload pretrain_data/clean_pmc_full_text.jsonl with huggingface_hub
6 months ago
clean_pmc_full_text_small.jsonl
Safe
1.15 GB
xet
Upload pretrain_data/clean_pmc_full_text_small.jsonl with huggingface_hub
6 months ago
clean_pubmed_abstract_part1.jsonl
12.2 GB
xet
Upload pretrain_data/clean_pubmed_abstract_part1.jsonl with huggingface_hub
6 months ago
clean_pubmed_abstract_part1_small.jsonl
Safe
1.22 GB
xet
Upload pretrain_data/clean_pubmed_abstract_part1_small.jsonl with huggingface_hub
6 months ago
clean_pubmed_abstract_part1_small1.jsonl
Safe
1.16 GB
xet
Upload pretrain_data/clean_pubmed_abstract_part1_small1.jsonl with huggingface_hub
6 months ago
clean_pubmed_abstract_part1_small1_new.jsonl
Safe
1.15 GB
xet
Upload pretrain_data/clean_pubmed_abstract_part1_small1_new.jsonl with huggingface_hub
6 months ago
clean_pubmed_abstract_part1_small_new.jsonl
Safe
1.21 GB
xet
Upload pretrain_data/clean_pubmed_abstract_part1_small_new.jsonl with huggingface_hub
6 months ago
clean_seq_in_text.jsonl
Safe
183 MB
xet
Upload pretrain_data/clean_seq_in_text.jsonl with huggingface_hub
6 months ago
clean_seq_in_text_new.jsonl
149 MB
xet
Upload pretrain_data/clean_seq_in_text_new.jsonl with huggingface_hub
6 months ago
clean_swissProt2Text.jsonl
Safe
913 MB
xet
Upload pretrain_data/clean_swissProt2Text.jsonl with huggingface_hub
6 months ago
clean_swissProt2Text_new.jsonl
Safe
907 MB
xet
Upload pretrain_data/clean_swissProt2Text_new.jsonl with huggingface_hub
6 months ago
pmc_full_text.json
11.6 GB
xet
Upload pretrain_data/pmc_full_text.json with huggingface_hub
6 months ago
pmc_full_text.jsonl
11.7 GB
xet
Upload pretrain_data/pmc_full_text.jsonl with huggingface_hub
6 months ago
pubmed_abstract_part1.json
13.5 GB
xet
Upload pretrain_data/pubmed_abstract_part1.json with huggingface_hub
6 months ago
pubmed_abstract_part2.json
13.5 GB
xet
Upload pretrain_data/pubmed_abstract_part2.json with huggingface_hub
6 months ago
seq_in_text.json
13.3 GB
xet
Upload pretrain_data/seq_in_text.json with huggingface_hub
6 months ago
swissProt2Text.json
Safe
983 MB
xet
Upload pretrain_data/swissProt2Text.json with huggingface_hub
6 months ago