Buckets:
LoRA for Neuron
LoRA (Low-Rank Adaptation) implementation optimized for distributed training on AWS Trainium devices. This module provides efficient parameter-efficient fine-tuning with tensor parallelism and sequence parallelism support.
PEFT Model Classes
NeuronPeftModel[[optimum.neuron.peft.NeuronPeftModel]]
optimum.neuron.peft.NeuronPeftModel[[optimum.neuron.peft.NeuronPeftModel]]
NeuronPeftModelForCausalLM[[optimum.neuron.peft.NeuronPeftModelForCausalLM]]
optimum.neuron.peft.NeuronPeftModelForCausalLM[[optimum.neuron.peft.NeuronPeftModelForCausalLM]]
LoRA Layer Implementations
Base LoRA Layer[[optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer]]
optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer[[optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer]]
Parallel Linear LoRA[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear]]
optimum.neuron.peft.tuners.lora.layer.ParallelLinear[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear]]
GQA QKV Column Parallel LoRA[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear]]
optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear]]
Parallel Embedding LoRA[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding]]
optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding]]
LoRA Model
NeuronLoraModel[[optimum.neuron.peft.tuners.NeuronLoraModel]]
optimum.neuron.peft.tuners.NeuronLoraModel[[optimum.neuron.peft.tuners.NeuronLoraModel]]
Utility Functions
get_peft_model[[optimum.neuron.peft.get_peft_model]]
optimum.neuron.peft.get_peft_model[[optimum.neuron.peft.get_peft_model]]
Architecture Support
The Neuron LoRA implementation supports the following parallel layer types:
- ColumnParallelLinear: For layers that split weights along the output dimension
- RowParallelLinear: For layers that split weights along the input dimension
- ParallelEmbedding: For embedding layers distributed across ranks
- GQAQKVColumnParallelLinear: For grouped query attention projections with challenging tensor parallel configurations
Each layer type has a corresponding LoRA implementation that maintains the parallelization strategy while adding low-rank adaptation capabilities.
Key Features
- Distributed Training: Full support for tensor parallelism and sequence parallelism
- Checkpoint Consolidation: Automatic conversion between sharded and consolidated checkpoints
- Weight Transformation: Seamless integration with model weight transformation specs
- Compatibility: Works with all supported custom modeling architectures in Optimum Neuron
Xet Storage Details
- Size:
- 3.7 kB
- Xet hash:
- b2040d17503176c409565c8cbe9836a27e61a36b8abbd1a16b8662128c9d8913
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.