Buckets:

rtrm's picture
|
download
raw
3.7 kB

LoRA for Neuron

LoRA (Low-Rank Adaptation) implementation optimized for distributed training on AWS Trainium devices. This module provides efficient parameter-efficient fine-tuning with tensor parallelism and sequence parallelism support.

PEFT Model Classes

NeuronPeftModel[[optimum.neuron.peft.NeuronPeftModel]]

optimum.neuron.peft.NeuronPeftModel[[optimum.neuron.peft.NeuronPeftModel]]

Source

NeuronPeftModelForCausalLM[[optimum.neuron.peft.NeuronPeftModelForCausalLM]]

optimum.neuron.peft.NeuronPeftModelForCausalLM[[optimum.neuron.peft.NeuronPeftModelForCausalLM]]

Source

LoRA Layer Implementations

Base LoRA Layer[[optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer]]

optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer[[optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer]]

Source

Parallel Linear LoRA[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear]]

optimum.neuron.peft.tuners.lora.layer.ParallelLinear[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear]]

Source

GQA QKV Column Parallel LoRA[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear]]

optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear]]

Source

Parallel Embedding LoRA[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding]]

optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding]]

Source

LoRA Model

NeuronLoraModel[[optimum.neuron.peft.tuners.NeuronLoraModel]]

optimum.neuron.peft.tuners.NeuronLoraModel[[optimum.neuron.peft.tuners.NeuronLoraModel]]

Source

Utility Functions

get_peft_model[[optimum.neuron.peft.get_peft_model]]

optimum.neuron.peft.get_peft_model[[optimum.neuron.peft.get_peft_model]]

Source

Architecture Support

The Neuron LoRA implementation supports the following parallel layer types:

  • ColumnParallelLinear: For layers that split weights along the output dimension
  • RowParallelLinear: For layers that split weights along the input dimension
  • ParallelEmbedding: For embedding layers distributed across ranks
  • GQAQKVColumnParallelLinear: For grouped query attention projections with challenging tensor parallel configurations

Each layer type has a corresponding LoRA implementation that maintains the parallelization strategy while adding low-rank adaptation capabilities.

Key Features

  • Distributed Training: Full support for tensor parallelism and sequence parallelism
  • Checkpoint Consolidation: Automatic conversion between sharded and consolidated checkpoints
  • Weight Transformation: Seamless integration with model weight transformation specs
  • Compatibility: Works with all supported custom modeling architectures in Optimum Neuron

Xet Storage Details

Size:
3.7 kB
·
Xet hash:
b2040d17503176c409565c8cbe9836a27e61a36b8abbd1a16b8662128c9d8913

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.