Buckets:
| # LoRA for Neuron | |
| LoRA (Low-Rank Adaptation) implementation optimized for distributed training on AWS Trainium devices. This module provides efficient parameter-efficient fine-tuning with tensor parallelism and sequence parallelism support. | |
| ## PEFT Model Classes | |
| ### NeuronPeftModel[[optimum.neuron.peft.NeuronPeftModel]] | |
| #### optimum.neuron.peft.NeuronPeftModel[[optimum.neuron.peft.NeuronPeftModel]] | |
| [Source](https://github.com/huggingface/optimum-neuron/blob/v0.4.2/optimum/neuron/peft/peft_model.py#L82) | |
| ### NeuronPeftModelForCausalLM[[optimum.neuron.peft.NeuronPeftModelForCausalLM]] | |
| #### optimum.neuron.peft.NeuronPeftModelForCausalLM[[optimum.neuron.peft.NeuronPeftModelForCausalLM]] | |
| [Source](https://github.com/huggingface/optimum-neuron/blob/v0.4.2/optimum/neuron/peft/peft_model.py#L463) | |
| ## LoRA Layer Implementations | |
| ### Base LoRA Layer[[optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer]] | |
| #### optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer[[optimum.neuron.peft.tuners.lora.layer.NeuronLoraLayer]] | |
| [Source](https://github.com/huggingface/optimum-neuron/blob/v0.4.2/optimum/neuron/peft/tuners/lora/layer.py#L73) | |
| ### Parallel Linear LoRA[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear]] | |
| #### optimum.neuron.peft.tuners.lora.layer.ParallelLinear[[optimum.neuron.peft.tuners.lora.layer.ParallelLinear]] | |
| [Source](https://github.com/huggingface/optimum-neuron/blob/v0.4.2/optimum/neuron/peft/tuners/lora/layer.py#L224) | |
| ### GQA QKV Column Parallel LoRA[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear]] | |
| #### optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear[[optimum.neuron.peft.tuners.lora.layer.GQAQKVColumnParallelLinear]] | |
| [Source](https://github.com/huggingface/optimum-neuron/blob/v0.4.2/optimum/neuron/peft/tuners/lora/layer.py#L315) | |
| ### Parallel Embedding LoRA[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding]] | |
| #### optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding[[optimum.neuron.peft.tuners.lora.layer.ParallelEmbedding]] | |
| [Source](https://github.com/huggingface/optimum-neuron/blob/v0.4.2/optimum/neuron/peft/tuners/lora/layer.py#L488) | |
| ## LoRA Model | |
| ### NeuronLoraModel[[optimum.neuron.peft.tuners.NeuronLoraModel]] | |
| #### optimum.neuron.peft.tuners.NeuronLoraModel[[optimum.neuron.peft.tuners.NeuronLoraModel]] | |
| [Source](https://github.com/huggingface/optimum-neuron/blob/v0.4.2/optimum/neuron/peft/tuners/lora/model.py#L29) | |
| ## Utility Functions | |
| ### get_peft_model[[optimum.neuron.peft.get_peft_model]] | |
| #### optimum.neuron.peft.get_peft_model[[optimum.neuron.peft.get_peft_model]] | |
| [Source](https://github.com/huggingface/optimum-neuron/blob/v0.4.2/optimum/neuron/peft/mapping_func.py#L43) | |
| ## Architecture Support | |
| The Neuron LoRA implementation supports the following parallel layer types: | |
| - **ColumnParallelLinear**: For layers that split weights along the output dimension | |
| - **RowParallelLinear**: For layers that split weights along the input dimension | |
| - **ParallelEmbedding**: For embedding layers distributed across ranks | |
| - **GQAQKVColumnParallelLinear**: For grouped query attention projections with challenging tensor parallel configurations | |
| Each layer type has a corresponding LoRA implementation that maintains the parallelization strategy while adding low-rank adaptation capabilities. | |
| ## Key Features | |
| - **Distributed Training**: Full support for tensor parallelism and sequence parallelism | |
| - **Checkpoint Consolidation**: Automatic conversion between sharded and consolidated checkpoints | |
| - **Weight Transformation**: Seamless integration with model weight transformation specs | |
| - **Compatibility**: Works with all supported custom modeling architectures in Optimum Neuron |
Xet Storage Details
- Size:
- 3.7 kB
- Xet hash:
- b2040d17503176c409565c8cbe9836a27e61a36b8abbd1a16b8662128c9d8913
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.