Transformers documentation
Nanotron
Get started
Base classes
Models
Preprocessors
Inference
Pipeline API
Generate API
Optimization
Chat with models
Serving
Training
Quantization
Ecosystem integrations
Resources
API
You are viewing main version, which requires installation from source. If you'd like
regular pip install, checkout the latest stable version (v5.8.1).
Nanotron
Nanotron is a distributed training framework with tensor, parallel, and data parallelism (3D parallelism). It is designed for large-scale training workloads across hundreds of GPUs.
Convert any Transformers model to an optimized Nanotron transformer model implementation for pretraining with the convert_hf_to_nanotron.py script.
torchrun --nproc_per_node=1 examples/llama/convert_hf_to_nanotron.py \
--checkpoint_path=meta-llama/Llama-2-7b-hf \
--save_path=./llama-7b-nanotronTransformers integration
- Load a supported Transformers model, like
Llama, with the from_pretrained() function. This reads theconfig.jsonfile from the checkpoint directory and creates a LlamaConfig. - Nanotron maps LlamaConfig to it’s own config format and creates a Nanotron model.
- Convert Transformers weights to Nanotron. A weight mapping guides how to map Nanotron parameter names to Transformers parameter names. This includes handling transformations such as fusing the QKV projections and the gate/up projections.
Nanotron also relies on AutoTokenizer for turning text into token ids during preprocessing and generation.
Resources
- Nanontron repository
- Ultrascale Playbook describes how to efficiently scale training with Nanotron