Instructions to use crellis/d18-20tpp-base_checkpoints with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use crellis/d18-20tpp-base_checkpoints with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="crellis/d18-20tpp-base_checkpoints")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("crellis/d18-20tpp-base_checkpoints", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use crellis/d18-20tpp-base_checkpoints with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "crellis/d18-20tpp-base_checkpoints" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crellis/d18-20tpp-base_checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/crellis/d18-20tpp-base_checkpoints
- SGLang
How to use crellis/d18-20tpp-base_checkpoints with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "crellis/d18-20tpp-base_checkpoints" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crellis/d18-20tpp-base_checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "crellis/d18-20tpp-base_checkpoints" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "crellis/d18-20tpp-base_checkpoints", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use crellis/d18-20tpp-base_checkpoints with Docker Model Runner:
docker model run hf.co/crellis/d18-20tpp-base_checkpoints
Upload folder using huggingface_hub
Browse files- README.md +180 -0
- meta_006187.json +57 -0
- model_006187.pt +3 -0
- optim_006187_rank0.pt +3 -0
README.md
ADDED
|
@@ -0,0 +1,180 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: transformers
|
| 4 |
+
tags:
|
| 5 |
+
- nanochat
|
| 6 |
+
- causal-lm
|
| 7 |
+
- long-context
|
| 8 |
+
- rope
|
| 9 |
+
datasets:
|
| 10 |
+
- nvidia/ClimbMix
|
| 11 |
+
- HuggingFaceTB/smol-smoltalk
|
| 12 |
+
- cais/mmlu
|
| 13 |
+
- openai/gsm8k
|
| 14 |
+
- allenai/tulu-v2-sft-long-mixture
|
| 15 |
+
pipeline_tag: text-generation
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# nanochat miniseries
|
| 19 |
+
|
| 20 |
+
This repository is part of a miniseries of small (~360M–480M parameter) decoder-only transformers
|
| 21 |
+
trained on top of Andrej Karpathy's [`nanochat`](https://github.com/karpathy/nanochat) codebase.
|
| 22 |
+
The series varies three axes: **depth** (model size), **tokens-per-parameter** (pretraining horizon),
|
| 23 |
+
and **RoPE removal schedule** (fraction of the pretraining token budget spent with RoPE before it
|
| 24 |
+
is dropped for the remainder, used to study positional encoding in long-context generalization). A
|
| 25 |
+
subset of the SFT models is additionally fine-tuned on a long-context mixture (`_long` variants).
|
| 26 |
+
|
| 27 |
+
All models share the same tokenizer: a BPE tokenizer with vocab size 32,768 trained on ~2B characters
|
| 28 |
+
of the pretraining corpus.
|
| 29 |
+
|
| 30 |
+
## Training pipeline
|
| 31 |
+
|
| 32 |
+
Each model goes through the following stages:
|
| 33 |
+
|
| 34 |
+
1. **Tokenizer training** — 32,768-vocab BPE trained on ~2B characters of the pretraining dataset.
|
| 35 |
+
2. **Pretraining (base)** — Next-token prediction on NVIDIA's ClimbMix-400B corpus, hosted at
|
| 36 |
+
[`karpathy/climbmix-400b-shuffle`](https://huggingface.co/datasets/karpathy/climbmix-400b-shuffle).
|
| 37 |
+
Horizon is controlled by `target_param_data_ratio` (aka "tpp" in model names), i.e. tokens
|
| 38 |
+
trained per model parameter. Sequence length 4096, batch size 1,048,576 tokens, AdamW + Muon
|
| 39 |
+
optimizer.
|
| 40 |
+
3. **Supervised fine-tuning (SFT)** — Instruction tuning on a mixture of:
|
| 41 |
+
- [`HuggingFaceTB/smol-smoltalk`](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) — 460K general conversations
|
| 42 |
+
- Synthetic identity conversations (from [karpathy-public S3](https://karpathy-public.s3.us-west-2.amazonaws.com/identity_conversations.jsonl)) — 1K rows × 2 epochs
|
| 43 |
+
- [`cais/mmlu`](https://huggingface.co/datasets/cais/mmlu) `auxiliary_train` — 100K rows × 3 epochs (multiple choice)
|
| 44 |
+
- [`openai/gsm8k`](https://huggingface.co/datasets/openai/gsm8k) `main` — 8K rows × 4 epochs (math + tool use)
|
| 45 |
+
- SimpleSpelling — 200K synthetic spelling examples
|
| 46 |
+
- SpellingBee — 80K synthetic letter-counting examples
|
| 47 |
+
4. **Long-context SFT (`_long` variants only)** — Same mixture plus 100K rows of
|
| 48 |
+
[`allenai/tulu-v2-sft-long-mixture`](https://huggingface.co/datasets/allenai/tulu-v2-sft-long-mixture),
|
| 49 |
+
with sequence length extended to 8,192.
|
| 50 |
+
|
| 51 |
+
## RoPE removal (drope) experiment
|
| 52 |
+
|
| 53 |
+
Model names containing `drope_XX` follow the recipe from
|
| 54 |
+
[*"Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings"*](https://arxiv.org/pdf/2512.12167):
|
| 55 |
+
the model is pretrained normally with RoPE for the first `XX%` of its token budget, RoPE is then
|
| 56 |
+
removed, and the remaining `(100 − XX)%` of the pretraining budget is used to recalibrate the
|
| 57 |
+
model without positional encodings. For example, `drope_50` means 50% of the token budget was
|
| 58 |
+
spent with RoPE and the remaining 50% was spent with RoPE removed. This is intended to preserve
|
| 59 |
+
the optimization benefits of RoPE early in training while producing a NoPE-style model that
|
| 60 |
+
generalizes better to long contexts at inference time. Models without `drope` in the name keep
|
| 61 |
+
RoPE in every layer for the full pretraining budget (theta = 100,000).
|
| 62 |
+
|
| 63 |
+
## Model sizes
|
| 64 |
+
|
| 65 |
+
| Depth | Layers | Hidden | Heads | Intermediate | Approx params |
|
| 66 |
+
|-------|--------|--------|-------|--------------|---------------|
|
| 67 |
+
| d18 | 18 | 1152 | 9 | 3072 | ~360M |
|
| 68 |
+
| d20 | 20 | 1280 | 10 | 3456 | ~480M |
|
| 69 |
+
|
| 70 |
+
All models use head_dim=128, vocab=32,768, RMSNorm (ε=1e-6), SwiGLU MLP, and final logit softcapping at 15.0.
|
| 71 |
+
|
| 72 |
+
## Released checkpoints
|
| 73 |
+
|
| 74 |
+
RoPE schedule column: `none` means RoPE is kept on for the full pretraining budget. A percentage
|
| 75 |
+
(e.g. `50%`) means RoPE is kept on for the first portion of the token budget and then removed for
|
| 76 |
+
the remaining `(100 − XX)%` of pretraining, per the drope recipe above.
|
| 77 |
+
|
| 78 |
+
| Model tag | Depth | tpp | RoPE schedule | Long-ctx SFT |
|
| 79 |
+
|-------------------------------|-------|------|---------------|--------------|
|
| 80 |
+
| d18_9tpp | 18 | 9 | none (always on) | no |
|
| 81 |
+
| d18_9tpp_drope_25 | 18 | 9 | 25% then removed | no |
|
| 82 |
+
| d18_9tpp_drope_50 | 18 | 9 | 50% then removed | no |
|
| 83 |
+
| d18_9tpp_drope_75 | 18 | 9 | 75% then removed | no |
|
| 84 |
+
| d18_20tpp | 18 | 20 | none (always on) | no |
|
| 85 |
+
| d18_20tpp_long | 18 | 20 | none (always on) | yes |
|
| 86 |
+
| d18_20tpp_drope_50 | 18 | 20 | 50% then removed | no |
|
| 87 |
+
| d18_20tpp_drope_50_long | 18 | 20 | 50% then removed | yes |
|
| 88 |
+
| d20_9tpp | 20 | 9 | none (always on) | no |
|
| 89 |
+
| d20_9tpp_drope_25 | 20 | 9 | 25% then removed | no |
|
| 90 |
+
| d20_9tpp_drope_50 | 20 | 9 | 50% then removed | no |
|
| 91 |
+
| d20_9tpp_drope_75 | 20 | 9 | 75% then removed | no |
|
| 92 |
+
| d20_20tpp | 20 | 20 | none (always on) | no |
|
| 93 |
+
| d20_20tpp_long | 20 | 20 | none (always on) | yes |
|
| 94 |
+
| d20_20tpp_drope_50 | 20 | 20 | 50% then removed | no |
|
| 95 |
+
| d20_20tpp_drope_50_long | 20 | 20 | 50% then removed | yes |
|
| 96 |
+
| d20_40tpp | 20 | 40 | none (always on) | no |
|
| 97 |
+
| d20_40tpp_long | 20 | 40 | none (always on) | yes |
|
| 98 |
+
| d20_40tpp_drope_50 | 20 | 40 | 50% then removed | no |
|
| 99 |
+
| d20_40tpp_drope_50_long | 20 | 40 | 50% then removed | yes |
|
| 100 |
+
|
| 101 |
+
`tpp` = tokens-per-parameter pretraining horizon. Total pretraining token budgets:
|
| 102 |
+
|
| 103 |
+
| Depth | tpp | Total pretraining tokens |
|
| 104 |
+
|-------|-----|--------------------------|
|
| 105 |
+
| d18 | 9 | ≈ 2.92 B |
|
| 106 |
+
| d18 | 20 | ≈ 6.49 B |
|
| 107 |
+
| d20 | 9 | ≈ 3.95 B |
|
| 108 |
+
| d20 | 20 | ≈ 8.77 B |
|
| 109 |
+
| d20 | 40 | ≈ 17.54 B |
|
| 110 |
+
|
| 111 |
+
`drope` variants use the same total token budget as their non-drope counterpart; the budget is
|
| 112 |
+
split between the RoPE-on and RoPE-removed phases as described above.
|
| 113 |
+
|
| 114 |
+
## Checkpoint format: which repo should I download?
|
| 115 |
+
|
| 116 |
+
For each model tag we publish **four** Hugging Face repositories:
|
| 117 |
+
|
| 118 |
+
| Repo suffix | Stage | Format | Use case |
|
| 119 |
+
|----------------------|------------------|-------------------------------------------------|----------|
|
| 120 |
+
| `...-base` | post-pretraining | nanochat native (`model_XXXXXX.pt`, `meta_*.json`, optimizer shard) | continue training / run with the `nanochat` repo |
|
| 121 |
+
| `...-sft` | post-SFT | nanochat native (`model_XXXXXX.pt`, `meta_*.json`, optimizer shard) | continue training / run with the `nanochat` repo |
|
| 122 |
+
| `...-hf-base` | post-pretraining | Hugging Face `transformers` (`config.json`, `model.safetensors`, `tokenizer.json`) | drop-in `AutoModelForCausalLM` loading |
|
| 123 |
+
| `...-hf-sft` | post-SFT | Hugging Face `transformers` (`config.json`, `model.safetensors`, `tokenizer.json`) | drop-in `AutoModelForCausalLM` loading |
|
| 124 |
+
|
| 125 |
+
- The **`base_checkpoints`** and **`chatsft_checkpoints`** artifacts are the raw nanochat outputs. They
|
| 126 |
+
include the optimizer state (`optim_*_rank0.pt`) and metadata (`meta_*.json` with training config,
|
| 127 |
+
val BPB, step number, etc.), so you can resume training or evaluate with the nanochat scripts
|
| 128 |
+
exactly as produced by `scripts.base_train` and `scripts.chat_sft`.
|
| 129 |
+
- The **`hf_base`** and **`hf_sft`** artifacts are conversions of those same weights into the
|
| 130 |
+
Hugging Face `transformers` layout (architecture name `NanoChatForCausalLM`, `model_type`
|
| 131 |
+
`nanochat`). Load them with:
|
| 132 |
+
|
| 133 |
+
```python
|
| 134 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 135 |
+
model = AutoModelForCausalLM.from_pretrained("crellis/nanochat-d20-20tpp-hf-sft", trust_remote_code=True)
|
| 136 |
+
tokenizer = AutoTokenizer.from_pretrained("crellis/nanochat-d20-20tpp-hf-sft", trust_remote_code=True)
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
`use_rope` in `config.json` reflects the drope setting: `true` for models that kept RoPE for the
|
| 140 |
+
entire pretraining budget, and `false` for drope variants (where RoPE was removed partway through
|
| 141 |
+
pretraining and the model was recalibrated without it). In the drope case, rotary embeddings are
|
| 142 |
+
not applied at inference time.
|
| 143 |
+
|
| 144 |
+
Pick `-hf-base` / `-hf-sft` for inference. Pick `-base` / `-sft` only if you plan to continue
|
| 145 |
+
training inside the nanochat codebase.
|
| 146 |
+
|
| 147 |
+
## Inference sketch (HF format, SFT)
|
| 148 |
+
|
| 149 |
+
```python
|
| 150 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 151 |
+
import torch
|
| 152 |
+
|
| 153 |
+
repo = "crellis/nanochat-d20-20tpp-hf-sft"
|
| 154 |
+
tok = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
|
| 155 |
+
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, trust_remote_code=True).cuda()
|
| 156 |
+
|
| 157 |
+
messages = [{"role": "user", "content": "Why is the sky blue?"}]
|
| 158 |
+
inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").cuda()
|
| 159 |
+
out = model.generate(inputs, max_new_tokens=256)
|
| 160 |
+
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=True))
|
| 161 |
+
```
|
| 162 |
+
|
| 163 |
+
Base (pretrained-only) checkpoints are next-token predictors and do not understand the chat
|
| 164 |
+
template; use `-hf-base` for completion-style prompting and `-hf-sft` for chat.
|
| 165 |
+
|
| 166 |
+
## Training compute
|
| 167 |
+
|
| 168 |
+
All runs were trained on a single H100 GPU via Slurm. Pretraining wall-clock ranges from
|
| 169 |
+
~4 hours (d18 @ 9tpp) to ~15 hours (d20 @ 40tpp); SFT adds ~30–90 minutes depending on variant.
|
| 170 |
+
|
| 171 |
+
## Citation / acknowledgements
|
| 172 |
+
|
| 173 |
+
- Codebase: [`karpathy/nanochat`](https://github.com/karpathy/nanochat)
|
| 174 |
+
- Pretraining data: NVIDIA ClimbMix (via `karpathy/climbmix-400b-shuffle`)
|
| 175 |
+
- SFT data: HuggingFaceTB SmolTalk, CAIS MMLU, OpenAI GSM8K, AI2 Tulu-v2 long-mixture
|
| 176 |
+
- RoPE-removal recipe: [*Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings*](https://arxiv.org/pdf/2512.12167) (arXiv:2512.12167)
|
| 177 |
+
|
| 178 |
+
## License
|
| 179 |
+
|
| 180 |
+
MIT (inherits from the nanochat repository).
|
meta_006187.json
ADDED
|
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"step": 6187,
|
| 3 |
+
"val_bpb": 0.7721254779551417,
|
| 4 |
+
"model_config": {
|
| 5 |
+
"sequence_len": 4096,
|
| 6 |
+
"vocab_size": 32768,
|
| 7 |
+
"n_layer": 18,
|
| 8 |
+
"n_head": 9,
|
| 9 |
+
"n_kv_head": 9,
|
| 10 |
+
"n_embd": 1152
|
| 11 |
+
},
|
| 12 |
+
"user_config": {
|
| 13 |
+
"run": "d18",
|
| 14 |
+
"device_type": "",
|
| 15 |
+
"fp8": true,
|
| 16 |
+
"fp8_recipe": "tensorwise",
|
| 17 |
+
"depth": 18,
|
| 18 |
+
"aspect_ratio": 64,
|
| 19 |
+
"head_dim": 128,
|
| 20 |
+
"max_seq_len": 4096,
|
| 21 |
+
"num_iterations": -1,
|
| 22 |
+
"target_flops": -1.0,
|
| 23 |
+
"target_param_data_ratio": 20.0,
|
| 24 |
+
"device_batch_size": 16,
|
| 25 |
+
"total_batch_size": -1,
|
| 26 |
+
"embedding_lr": 0.3,
|
| 27 |
+
"unembedding_lr": 0.008,
|
| 28 |
+
"weight_decay": 0.28,
|
| 29 |
+
"matrix_lr": 0.02,
|
| 30 |
+
"warmup_steps": 40,
|
| 31 |
+
"warmdown_ratio": 0.65,
|
| 32 |
+
"final_lr_frac": 0.05,
|
| 33 |
+
"resume_from_step": -1,
|
| 34 |
+
"eval_every": 250,
|
| 35 |
+
"eval_tokens": 41943040,
|
| 36 |
+
"core_metric_every": 2000,
|
| 37 |
+
"core_metric_max_per_task": 500,
|
| 38 |
+
"sample_every": 2000,
|
| 39 |
+
"save_every_pct": 25.0,
|
| 40 |
+
"model_tag": "d18",
|
| 41 |
+
"rope_removal_pct": -1
|
| 42 |
+
},
|
| 43 |
+
"device_batch_size": 16,
|
| 44 |
+
"max_seq_len": 4096,
|
| 45 |
+
"total_batch_size": 1048576,
|
| 46 |
+
"dataloader_state_dict": {
|
| 47 |
+
"pq_idx": 133,
|
| 48 |
+
"rg_idx": 6,
|
| 49 |
+
"epoch": 1
|
| 50 |
+
},
|
| 51 |
+
"loop_state": {
|
| 52 |
+
"min_val_bpb": 0.7721254779551417,
|
| 53 |
+
"smooth_train_loss": 2.535478062795301,
|
| 54 |
+
"total_training_time": 15976.807270765305,
|
| 55 |
+
"use_rope": true
|
| 56 |
+
}
|
| 57 |
+
}
|
model_006187.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a99e443949e8bd838d6bdc80f9333255fb8fbfe64f26d431c1a80a2ae01bfa3d
|
| 3 |
+
size 1373162077
|
optim_006187_rank0.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c8b6732960db8d0ffef63b8123c0219dbaefd136bc696e55f3be6d33e22e0503
|
| 3 |
+
size 1600603109
|