candidate-Llaza-MS2-flat-20BT
Training checkpoint from zip2zip-core. This is a candidate model (not production-ready).
Training Config
| Field | Value |
|---|---|
| model_config | 1B |
| init_from | scratch |
| max_subtokens | 2 |
| max_codebook_size | 4096 |
| seq_len | 4096 |
| lr | 0.0003 |
| max_tokens | 20000000000 |
| step | 19074 |
| data | llaza-20B-tokens-128shards |
Usage
This is a training checkpoint (torchtitan format). To use for inference, export to HuggingFace format first:
python scripts/zip2zip_hf/export_to_zip2zip.py \
--ckpt_dir <local_path>/step_19074 \
--output_dir <export_dir> \
--base_model scratch \
--model_config 1B
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support