candidate-Llaza-MS2-flat-20BT

Training checkpoint from zip2zip-core. This is a candidate model (not production-ready).

Training Config

Field Value
model_config 1B
init_from scratch
max_subtokens 2
max_codebook_size 4096
seq_len 4096
lr 0.0003
max_tokens 20000000000
step 19074
data llaza-20B-tokens-128shards

Usage

This is a training checkpoint (torchtitan format). To use for inference, export to HuggingFace format first:

python scripts/zip2zip_hf/export_to_zip2zip.py \
    --ckpt_dir <local_path>/step_19074 \
    --output_dir <export_dir> \
    --base_model scratch \
    --model_config 1B
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support