Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
LisaMegaWatts
/
SymbioGPT-10M
like
0
PyTorch
English
symbiogenesis
multi-organelle
monarch-mixer
philosophy
License:
mit
Model card
Files
Files and versions
xet
Community
Copy to bucket
new
main
SymbioGPT-10M
/
data
1.38 GB
Ctrl+K
Ctrl+K
1 contributor
History:
4 commits
LisaMegaWatts
Add 2MB val text sample for Gemma/HF tokenizer notebooks
588ff7f
verified
3 months ago
train_curated.txt.tokens.pt
pickle
Detected Pickle imports (3)
"torch._utils._rebuild_tensor_v2"
,
"collections.OrderedDict"
,
"torch.IntStorage"
What is a pickle import?
1.06 GB
xet
Add curated training tokens (266M tokens, Chinchilla-optimal)
3 months ago
train_curated_sample.txt
20 MB
xet
Add 20MB raw text sample for Gemma/HF tokenizer notebooks
3 months ago
val.txt.tokens.pt
pickle
Detected Pickle imports (3)
"torch._utils._rebuild_tensor_v2"
,
"torch.IntStorage"
,
"collections.OrderedDict"
What is a pickle import?
289 MB
xet
Add validation tokens (72M tokens)
3 months ago
val_sample.txt
Safe
2 MB
Add 2MB val text sample for Gemma/HF tokenizer notebooks
3 months ago