Buckets:

rtrm's picture
|
download
raw
7.99 kB

LoRA for Neuron

LoRA (Low-Rank Adaptation) implementation optimized for distributed training on AWS Trainium devices. This module provides efficient parameter-efficient fine-tuning with tensor parallelism and sequence parallelism support.

PEFT Model Classes

NeuronPeftModel[[optimum.neuron.peft.NeuronPeftModel]]

optimum.neuron.peft.NeuronPeftModel[[optimum.neuron.peft.NeuronPeftModel]]

Source

NeuronPeftModelForCausalLM[[optimum.neuron.peft.NeuronPeftModelForCausalLM]]

optimum.neuron.peft.NeuronPeftModelForCausalLM[[optimum.neuron.peft.NeuronPeftModelForCausalLM]]

Source

LoRA Layer Implementations

Base LoRA Layer[[optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer]]

optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer[[optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer]]

Source

Parallel Linear LoRA[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear]]

optimum.neuron.peft.tuners.lora.layer.ParallelLinear[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear]]

Source

mergeoptimum.neuron.peft.tuners.lora.layer.ParallelLinear.mergehttps://github.com/huggingface/optimum-neuron/blob/vr_1097/optimum/neuron/peft/tuners/lora/layer.py#L307[{"name": "safe_merge", "val": ": bool = False"}, {"name": "adapter_names", "val": ": list[str] | None = None"}]- safe_merge -- If True, perform merge in a copy and check for NaNs before merging.

  • adapter_names -- List of adapter names to merge. If None, all active adapters will be merged.0

Merge the active adapter weights into the base weights.

This works with distributed parallel linear layers (RowParallelLinear, ColumnParallelLinear). The merge happens on the sharded weights - each rank merges its own shard.

Parameters:

safe_merge : If True, perform merge in a copy and check for NaNs before merging.

adapter_names : List of adapter names to merge. If None, all active adapters will be merged.

unmerge[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear.unmerge]]

Source

Unmerge all merged adapter layers from the base weights.

This works with distributed parallel linear layers (RowParallelLinear, ColumnParallelLinear). The unmerge happens on the sharded weights - each rank unmerges its own shard.

GQA QKV Column Parallel LoRA[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear]]

optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear]]

Source

get_delta_weightoptimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear.get_delta_weighthttps://github.com/huggingface/optimum-neuron/blob/vr_1097/optimum/neuron/peft/tuners/lora/layer.py#L578[{"name": "adapter", "val": ": str"}]- adapter -- The name of the adapter for which the delta weight should be computed.0Dict mapping "q"/"k"/"v" (or "qkv") to their delta weight tensors (sharded).

Compute the delta weights for Q, K, V for the given adapter.

Returns a dict with keys "q", "k", "v" (or "qkv" if fused) containing the delta tensors.

Parameters:

adapter : The name of the adapter for which the delta weight should be computed.

Returns:

Dict mapping "q"/"k"/"v" (or "qkv") to their delta weight tensors (sharded).

merge[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear.merge]]

Source

Merge the active adapter weights into the base Q, K, V weights.

This works with GQAQKVColumnParallelLinear layers. The merge happens on the sharded weights - each rank merges its own shard.

Parameters:

safe_merge : If True, perform merge in a copy and check for NaNs before merging.

adapter_names : List of adapter names to merge. If None, all active adapters will be merged.

unmerge[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear.unmerge]]

Source

Unmerge all merged adapter layers from the base Q, K, V weights.

This works with GQAQKVColumnParallelLinear layers. The unmerge happens on the sharded weights - each rank unmerges its own shard.

Parallel Embedding LoRA[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding]]

optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding]]

Source

mergeoptimum.neuron.peft.tuners.lora.layer.ParallelEmbedding.mergehttps://github.com/huggingface/optimum-neuron/blob/vr_1097/optimum/neuron/peft/tuners/lora/layer.py#L847[{"name": "safe_merge", "val": ": bool = False"}, {"name": "adapter_names", "val": ": list[str] | None = None"}]- safe_merge -- If True, perform merge in a copy and check for NaNs before merging.

  • adapter_names -- List of adapter names to merge. If None, all active adapters will be merged.0

Merge the active adapter weights into the base embedding weights.

This works with ParallelEmbedding layers. The merge happens on the sharded weights - each rank merges its own shard.

Parameters:

safe_merge : If True, perform merge in a copy and check for NaNs before merging.

adapter_names : List of adapter names to merge. If None, all active adapters will be merged.

unmerge[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding.unmerge]]

Source

Unmerge all merged adapter layers from the base embedding weights.

This works with ParallelEmbedding layers. The unmerge happens on the sharded weights - each rank unmerges its own shard.

LoRA Model

NeuronLoraModel[[optimum.neuron.peft.tuners.NeuronLoraModel]]

optimum.neuron.peft.tuners.NeuronLoraModel[[optimum.neuron.peft.tuners.NeuronLoraModel]]

Source

Utility Functions

get_peft_model[[optimum.neuron.peft.get_peft_model]]

optimum.neuron.peft.get_peft_model[[optimum.neuron.peft.get_peft_model]]

Source

Architecture Support

The Neuron LoRA implementation supports the following parallel layer types:

  • ColumnParallelLinear: For layers that split weights along the output dimension
  • RowParallelLinear: For layers that split weights along the input dimension
  • ParallelEmbedding: For embedding layers distributed across ranks
  • GQAQKVColumnParallelLinear: For grouped query attention projections with challenging tensor parallel configurations

Each layer type has a corresponding LoRA implementation that maintains the parallelization strategy while adding low-rank adaptation capabilities.

Key Features

  • Distributed Training: Full support for tensor parallelism and sequence parallelism
  • Checkpoint Consolidation: Automatic conversion between sharded and consolidated checkpoints
  • Weight Transformation: Seamless integration with model weight transformation specs
  • Compatibility: Works with all supported custom modeling architectures in Optimum Neuron

Xet Storage Details

Size:
7.99 kB
·
Xet hash:
57046c01f85281b7959cb37e8d4bc21acffef31066dc15d5bb3519566a4b818e

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.