Text Classification
Transformers
lora
fine-tuning
adaptive
research
nested-lora
synaptic-plasticity
rank-adaptation
Instructions to use Simo76/Unified-LoRA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Simo76/Unified-LoRA with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Simo76/Unified-LoRA")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Simo76/Unified-LoRA", dtype="auto") - Notebooks
- Google Colab
- Kaggle
https://github.com/Sva76/Unified-LoRa
#1
by Simo76 - opened
- .gitattributes +35 -0
- .gitignore +0 -56
- LICENSE +0 -201
- README.md +3 -76
- controller.py +0 -41
- docs/architecture.md +0 -171
- docs/experimental_results.md +0 -239
- experiments/stable_task_test.py +0 -226
- experiments/stress_test_task_switch.py +0 -214
- nested_lora.py +0 -130
- notebooks/mrpc_example.ipynb +0 -165
- orbital_controller.py +0 -291
- requirements.txt +0 -6
- unified_lora.py +0 -14
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
.gitignore
DELETED
|
@@ -1,56 +0,0 @@
|
|
| 1 |
-
.# ── PYTHON ─────────────────────────────────────────
|
| 2 |
-
__pycache__/
|
| 3 |
-
*.py[cod]
|
| 4 |
-
*.so
|
| 5 |
-
|
| 6 |
-
# ── ENV ────────────────────────────────────────────
|
| 7 |
-
.env
|
| 8 |
-
.venv/
|
| 9 |
-
venv/
|
| 10 |
-
env/
|
| 11 |
-
|
| 12 |
-
# ── BUILD / DIST ───────────────────────────────────
|
| 13 |
-
build/
|
| 14 |
-
dist/
|
| 15 |
-
*.egg-info/
|
| 16 |
-
|
| 17 |
-
# ── NOTEBOOK ───────────────────────────────────────
|
| 18 |
-
.ipynb_checkpoints
|
| 19 |
-
|
| 20 |
-
# ── CACHE / TEST ───────────────────────────────────
|
| 21 |
-
.pytest_cache/
|
| 22 |
-
.mypy_cache/
|
| 23 |
-
.coverage*
|
| 24 |
-
htmlcov/
|
| 25 |
-
|
| 26 |
-
# ── LOGS ───────────────────────────────────────────
|
| 27 |
-
*.log
|
| 28 |
-
|
| 29 |
-
# ── MODELS / CHECKPOINTS (CRITICO) ─────────────────
|
| 30 |
-
*.pt
|
| 31 |
-
*.bin
|
| 32 |
-
*.ckpt
|
| 33 |
-
*.safetensors
|
| 34 |
-
|
| 35 |
-
# ── DATASETS ───────────────────────────────────────
|
| 36 |
-
data/
|
| 37 |
-
datasets/
|
| 38 |
-
*.parquet
|
| 39 |
-
*.csv
|
| 40 |
-
|
| 41 |
-
# ── HF CACHE ───────────────────────────────────────
|
| 42 |
-
.cache/
|
| 43 |
-
huggingface/
|
| 44 |
-
hf_cache/
|
| 45 |
-
|
| 46 |
-
# ── EXPERIMENT OUTPUT ──────────────────────────────
|
| 47 |
-
outputs/
|
| 48 |
-
runs/
|
| 49 |
-
wandb/
|
| 50 |
-
|
| 51 |
-
# ── SYSTEM ─────────────────────────────────────────
|
| 52 |
-
.DS_Store
|
| 53 |
-
|
| 54 |
-
# ── EDITOR ─────────────────────────────────────────
|
| 55 |
-
.vscode/
|
| 56 |
-
.idea/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
LICENSE
DELETED
|
@@ -1,201 +0,0 @@
|
|
| 1 |
-
Apache License
|
| 2 |
-
Version 2.0, January 2004
|
| 3 |
-
http://www.apache.org/licenses/
|
| 4 |
-
|
| 5 |
-
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
| 6 |
-
|
| 7 |
-
1. Definitions.
|
| 8 |
-
|
| 9 |
-
"License" shall mean the terms and conditions for use, reproduction,
|
| 10 |
-
and distribution as defined by Sections 1 through 9 of this document.
|
| 11 |
-
|
| 12 |
-
"Licensor" shall mean the copyright owner or entity authorized by
|
| 13 |
-
the copyright owner that is granting the License.
|
| 14 |
-
|
| 15 |
-
"Legal Entity" shall mean the union of the acting entity and all
|
| 16 |
-
other entities that control, are controlled by, or are under common
|
| 17 |
-
control with that entity. For the purposes of this definition,
|
| 18 |
-
"control" means (i) the power, direct or indirect, to cause the
|
| 19 |
-
direction or management of such entity, whether by contract or
|
| 20 |
-
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
| 21 |
-
outstanding shares, or (iii) beneficial ownership of such entity.
|
| 22 |
-
|
| 23 |
-
"You" (or "Your") shall mean an individual or Legal Entity
|
| 24 |
-
exercising permissions granted by this License.
|
| 25 |
-
|
| 26 |
-
"Source" form shall mean the preferred form for making modifications,
|
| 27 |
-
including but not limited to software source code, documentation
|
| 28 |
-
source, and configuration files.
|
| 29 |
-
|
| 30 |
-
"Object" form shall mean any form resulting from mechanical
|
| 31 |
-
transformation or translation of a Source form, including but
|
| 32 |
-
not limited to compiled object code, generated documentation,
|
| 33 |
-
and conversions to other media types.
|
| 34 |
-
|
| 35 |
-
"Work" shall mean the work of authorship, whether in Source or
|
| 36 |
-
Object form, made available under the License, as indicated by a
|
| 37 |
-
copyright notice that is included in or attached to the work
|
| 38 |
-
(an example is provided in the Appendix below).
|
| 39 |
-
|
| 40 |
-
"Derivative Works" shall mean any work, whether in Source or Object
|
| 41 |
-
form, that is based on (or derived from) the Work and for which the
|
| 42 |
-
editorial revisions, annotations, elaborations, or other modifications
|
| 43 |
-
represent, as a whole, an original work of authorship. For the purposes
|
| 44 |
-
of this License, Derivative Works shall not include works that remain
|
| 45 |
-
separable from, or merely link (or bind by name) to the interfaces of,
|
| 46 |
-
the Work and Derivative Works thereof.
|
| 47 |
-
|
| 48 |
-
"Contribution" shall mean any work of authorship, including
|
| 49 |
-
the original version of the Work and any modifications or additions
|
| 50 |
-
to that Work or Derivative Works thereof, that is intentionally
|
| 51 |
-
submitted to Licensor for inclusion in the Work by the copyright owner
|
| 52 |
-
or by an individual or Legal Entity authorized to submit on behalf of
|
| 53 |
-
the copyright owner. For the purposes of this definition, "submitted"
|
| 54 |
-
means any form of electronic, verbal, or written communication sent
|
| 55 |
-
to the Licensor or its representatives, including but not limited to
|
| 56 |
-
communication on electronic mailing lists, source code control systems,
|
| 57 |
-
and issue tracking systems that are managed by, or on behalf of, the
|
| 58 |
-
Licensor for the purpose of discussing and improving the Work, but
|
| 59 |
-
excluding communication that is conspicuously marked or otherwise
|
| 60 |
-
designated in writing by the copyright owner as "Not a Contribution."
|
| 61 |
-
|
| 62 |
-
"Contributor" shall mean Licensor and any individual or Legal Entity
|
| 63 |
-
on behalf of whom a Contribution has been received by Licensor and
|
| 64 |
-
subsequently incorporated within the Work.
|
| 65 |
-
|
| 66 |
-
2. Grant of Copyright License. Subject to the terms and conditions of
|
| 67 |
-
this License, each Contributor hereby grants to You a perpetual,
|
| 68 |
-
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 69 |
-
copyright license to reproduce, prepare Derivative Works of,
|
| 70 |
-
publicly display, publicly perform, sublicense, and distribute the
|
| 71 |
-
Work and such Derivative Works in Source or Object form.
|
| 72 |
-
|
| 73 |
-
3. Grant of Patent License. Subject to the terms and conditions of
|
| 74 |
-
this License, each Contributor hereby grants to You a perpetual,
|
| 75 |
-
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
| 76 |
-
(except as stated in this section) patent license to make, have made,
|
| 77 |
-
use, offer to sell, sell, import, and otherwise transfer the Work,
|
| 78 |
-
where such license applies only to those patent claims licensable
|
| 79 |
-
by such Contributor that are necessarily infringed by their
|
| 80 |
-
Contribution(s) alone or by combination of their Contribution(s)
|
| 81 |
-
with the Work to which such Contribution(s) was submitted. If You
|
| 82 |
-
institute patent litigation against any entity (including a
|
| 83 |
-
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
| 84 |
-
or a Contribution incorporated within the Work constitutes direct
|
| 85 |
-
or contributory patent infringement, then any patent licenses
|
| 86 |
-
granted to You under this License for that Work shall terminate
|
| 87 |
-
as of the date such litigation is filed.
|
| 88 |
-
|
| 89 |
-
4. Redistribution. You may reproduce and distribute copies of the
|
| 90 |
-
Work or Derivative Works thereof in any medium, with or without
|
| 91 |
-
modifications, and in Source or Object form, provided that You
|
| 92 |
-
meet the following conditions:
|
| 93 |
-
|
| 94 |
-
(a) You must give any other recipients of the Work or
|
| 95 |
-
Derivative Works a copy of this License; and
|
| 96 |
-
|
| 97 |
-
(b) You must cause any modified files to carry prominent notices
|
| 98 |
-
stating that You changed the files; and
|
| 99 |
-
|
| 100 |
-
(c) You must retain, in the Source form of any Derivative Works
|
| 101 |
-
that You distribute, all copyright, patent, trademark, and
|
| 102 |
-
attribution notices from the Source form of the Work,
|
| 103 |
-
excluding those notices that do not pertain to any part of
|
| 104 |
-
the Derivative Works; and
|
| 105 |
-
|
| 106 |
-
(d) If the Work includes a "NOTICE" text file as part of its
|
| 107 |
-
distribution, then any Derivative Works that You distribute must
|
| 108 |
-
include a readable copy of the attribution notices contained
|
| 109 |
-
within such NOTICE file, excluding those notices that do not
|
| 110 |
-
pertain to any part of the Derivative Works, in at least one
|
| 111 |
-
of the following places: within a NOTICE text file distributed
|
| 112 |
-
as part of the Derivative Works; within the Source form or
|
| 113 |
-
documentation, if provided along with the Derivative Works; or,
|
| 114 |
-
within a display generated by the Derivative Works, if and
|
| 115 |
-
wherever such third-party notices normally appear. The contents
|
| 116 |
-
of the NOTICE file are for informational purposes only and
|
| 117 |
-
do not modify the License. You may add Your own attribution
|
| 118 |
-
notices within Derivative Works that You distribute, alongside
|
| 119 |
-
or as an addendum to the NOTICE text from the Work, provided
|
| 120 |
-
that such additional attribution notices cannot be construed
|
| 121 |
-
as modifying the License.
|
| 122 |
-
|
| 123 |
-
You may add Your own copyright statement to Your modifications and
|
| 124 |
-
may provide additional or different license terms and conditions
|
| 125 |
-
for use, reproduction, or distribution of Your modifications, or
|
| 126 |
-
for any such Derivative Works as a whole, provided Your use,
|
| 127 |
-
reproduction, and distribution of the Work otherwise complies with
|
| 128 |
-
the conditions stated in this License.
|
| 129 |
-
|
| 130 |
-
5. Submission of Contributions. Unless You explicitly state otherwise,
|
| 131 |
-
any Contribution intentionally submitted for inclusion in the Work
|
| 132 |
-
by You to the Licensor shall be under the terms and conditions of
|
| 133 |
-
this License, without any additional terms or conditions.
|
| 134 |
-
Notwithstanding the above, nothing herein shall supersede or modify
|
| 135 |
-
the terms of any separate license agreement you may have executed
|
| 136 |
-
with Licensor regarding such Contributions.
|
| 137 |
-
|
| 138 |
-
6. Trademarks. This License does not grant permission to use the trade
|
| 139 |
-
names, trademarks, service marks, or product names of the Licensor,
|
| 140 |
-
except as required for reasonable and customary use in describing the
|
| 141 |
-
origin of the Work and reproducing the content of the NOTICE file.
|
| 142 |
-
|
| 143 |
-
7. Disclaimer of Warranty. Unless required by applicable law or
|
| 144 |
-
agreed to in writing, Licensor provides the Work (and each
|
| 145 |
-
Contributor provides its Contributions) on an "AS IS" BASIS,
|
| 146 |
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
| 147 |
-
implied, including, without limitation, any warranties or conditions
|
| 148 |
-
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
| 149 |
-
PARTICULAR PURPOSE. You are solely responsible for determining the
|
| 150 |
-
appropriateness of using or redistributing the Work and assume any
|
| 151 |
-
risks associated with Your exercise of permissions under this License.
|
| 152 |
-
|
| 153 |
-
8. Limitation of Liability. In no event and under no legal theory,
|
| 154 |
-
whether in tort (including negligence), contract, or otherwise,
|
| 155 |
-
unless required by applicable law (such as deliberate and grossly
|
| 156 |
-
negligent acts) or agreed to in writing, shall any Contributor be
|
| 157 |
-
liable to You for damages, including any direct, indirect, special,
|
| 158 |
-
incidental, or consequential damages of any character arising as a
|
| 159 |
-
result of this License or out of the use or inability to use the
|
| 160 |
-
Work (including but not limited to damages for loss of goodwill,
|
| 161 |
-
work stoppage, computer failure or malfunction, or any and all
|
| 162 |
-
other commercial damages or losses), even if such Contributor
|
| 163 |
-
has been advised of the possibility of such damages.
|
| 164 |
-
|
| 165 |
-
9. Accepting Warranty or Additional Liability. While redistributing
|
| 166 |
-
the Work or Derivative Works thereof, You may choose to offer,
|
| 167 |
-
and charge a fee for, acceptance of support, warranty, indemnity,
|
| 168 |
-
or other liability obligations and/or rights consistent with this
|
| 169 |
-
License. However, in accepting such obligations, You may act only
|
| 170 |
-
on Your own behalf and on Your sole responsibility, not on behalf
|
| 171 |
-
of any other Contributor, and only if You agree to indemnify,
|
| 172 |
-
defend, and hold each Contributor harmless for any liability
|
| 173 |
-
incurred by, or claims asserted against, such Contributor by reason
|
| 174 |
-
of your accepting any such warranty or additional liability.
|
| 175 |
-
|
| 176 |
-
END OF TERMS AND CONDITIONS
|
| 177 |
-
|
| 178 |
-
APPENDIX: How to apply the Apache License to your work.
|
| 179 |
-
|
| 180 |
-
To apply the Apache License to your work, attach the following
|
| 181 |
-
boilerplate notice, with the fields enclosed by brackets "[]"
|
| 182 |
-
replaced with your own identifying information. (Don't include
|
| 183 |
-
the brackets!) The text should be enclosed in the appropriate
|
| 184 |
-
comment syntax for the file format. We also recommend that a
|
| 185 |
-
file or class name and description of purpose be included on the
|
| 186 |
-
same "printed page" as the copyright notice for easier
|
| 187 |
-
identification within third-party archives.
|
| 188 |
-
|
| 189 |
-
Copyright [yyyy] [name of copyright owner]
|
| 190 |
-
|
| 191 |
-
Licensed under the Apache License, Version 2.0 (the "License");
|
| 192 |
-
you may not use this file except in compliance with the License.
|
| 193 |
-
You may obtain a copy of the License at
|
| 194 |
-
|
| 195 |
-
http://www.apache.org/licenses/LICENSE-2.0
|
| 196 |
-
|
| 197 |
-
Unless required by applicable law or agreed to in writing, software
|
| 198 |
-
distributed under the License is distributed on an "AS IS" BASIS,
|
| 199 |
-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 200 |
-
See the License for the specific language governing permissions and
|
| 201 |
-
limitations under the License.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,76 +1,3 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
| 4 |
-
- lora
|
| 5 |
-
- fine-tuning
|
| 6 |
-
- adaptive
|
| 7 |
-
- research
|
| 8 |
-
- nested-lora
|
| 9 |
-
- synaptic-plasticity
|
| 10 |
-
- rank-adaptation
|
| 11 |
-
library_name: transformers
|
| 12 |
-
datasets:
|
| 13 |
-
- nyu-mll/glue
|
| 14 |
-
pipeline_tag: text-classification
|
| 15 |
-
---
|
| 16 |
-
|
| 17 |
-
# Unified-LoRA
|
| 18 |
-
|
| 19 |
-
**LoRA fine-tuning with synaptic plasticity: a neurobiologically-inspired controller that switches between qualitatively different operational modes based on training stress.**
|
| 20 |
-
|
| 21 |
-
⚠️ **This is NOT a pretrained model.** Unified-LoRA is a training method/controller.
|
| 22 |
-
|
| 23 |
-
👉 **Code**: [github.com/Sva76/Unified-LoRa](https://github.com/Sva76/Unified-LoRa)
|
| 24 |
-
👉 **Demo**: [unified_lora_demo.ipynb](https://github.com/Sva76/Unified-LoRa/blob/main/notebooks/unified_lora_demo.ipynb)
|
| 25 |
-
|
| 26 |
-
## What It Does
|
| 27 |
-
|
| 28 |
-
A composite synaptic stress signal **φ(t) = f(Convergence, Entropy, Stress)** drives a 3-state FSM:
|
| 29 |
-
|
| 30 |
-
| Mode | φ range | Rank | Behavior |
|
| 31 |
-
|------|---------|------|----------|
|
| 32 |
-
| SINGLE | φ < 0.3 | r=4 | Efficient cruise |
|
| 33 |
-
| MULTI | 0.3 ≤ φ < 0.7 | r=8 | Active learning |
|
| 34 |
-
| MIRROR | φ ≥ 0.7 | r=16 | Max capacity + weight snapshot for rollback |
|
| 35 |
-
|
| 36 |
-
Rank transitions use **nested matrix slicing** (r4 ⊂ r8 ⊂ r16) — zero cold-start, zero re-allocation.
|
| 37 |
-
|
| 38 |
-
Mirror mode saves a weight snapshot on entry. On exit, if weights drifted <5% (transient noise), the snapshot is restored. If drift was significant (real signal), the new weights are kept.
|
| 39 |
-
|
| 40 |
-
## Results
|
| 41 |
-
|
| 42 |
-
**GLUE (DistilBERT):** 3/4 tasks equal or better with 33–56% rank reduction.
|
| 43 |
-
|
| 44 |
-
**Noise resilience:** +31 F1 at 50% label noise, 9× lower variance. No benefit on clean data. Confirmed at 67M–3B.
|
| 45 |
-
|
| 46 |
-
**Stress-recovery cycle (Tinker/Llama-3.2-1B):** φ returns to pre-shock baseline (0.33 → 0.83 → 0.33), demonstrating fully reversible stress handling.
|
| 47 |
-
|
| 48 |
-
## Quick Start
|
| 49 |
-
|
| 50 |
-
```python
|
| 51 |
-
from controller import setup_unified_lora
|
| 52 |
-
|
| 53 |
-
adapters, ctrl = setup_unified_lora(model, target_modules=["q_proj", "v_proj"])
|
| 54 |
-
|
| 55 |
-
for batch in dataloader:
|
| 56 |
-
loss = model(**batch).loss
|
| 57 |
-
loss.backward()
|
| 58 |
-
ctrl.step(loss=loss.item()) # φ(t) needs the loss for convergence signal
|
| 59 |
-
optimizer.step()
|
| 60 |
-
optimizer.zero_grad()
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
## Citation
|
| 64 |
-
|
| 65 |
-
```bibtex
|
| 66 |
-
@software{unified_lora_2025,
|
| 67 |
-
author = {Simona Vargiu},
|
| 68 |
-
title = {Unified-LoRA: Synaptic Plasticity Controller for Adaptive LoRA Fine-Tuning},
|
| 69 |
-
year = {2025},
|
| 70 |
-
url = {https://github.com/Sva76/Unified-LoRa}
|
| 71 |
-
}
|
| 72 |
-
```
|
| 73 |
-
|
| 74 |
-
## Contact
|
| 75 |
-
|
| 76 |
-
Simona Vargiu (Independent Researcher) — simona.vargiu.malta@gmail.com
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
controller.py
DELETED
|
@@ -1,41 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Unified-LoRA Controller
|
| 3 |
-
======================
|
| 4 |
-
|
| 5 |
-
Convenience wrapper that exposes the full Unified-LoRA stack:
|
| 6 |
-
|
| 7 |
-
- nested_lora.py → execution engine (LoRA with dynamic rank slicing)
|
| 8 |
-
- orbital_controller.py → control logic (stress-driven rank adaptation)
|
| 9 |
-
|
| 10 |
-
Use this module for simple integration, or import submodules directly
|
| 11 |
-
for fine-grained control.
|
| 12 |
-
|
| 13 |
-
Author: Simona Vargiu
|
| 14 |
-
License: Apache 2.0
|
| 15 |
-
"""
|
| 16 |
-
|
| 17 |
-
# ── ENGINE ──────────────────────────────────────────
|
| 18 |
-
from nested_lora import (
|
| 19 |
-
NestedLoRALinear,
|
| 20 |
-
inject_nested_lora,
|
| 21 |
-
set_rank,
|
| 22 |
-
get_lora_params,
|
| 23 |
-
count_params,
|
| 24 |
-
)
|
| 25 |
-
|
| 26 |
-
# ── CONTROLLER ──────────────────────────────────────
|
| 27 |
-
from orbital_controller import (
|
| 28 |
-
OrbitalController,
|
| 29 |
-
setup_unified_lora,
|
| 30 |
-
)
|
| 31 |
-
|
| 32 |
-
# ── EXPORT ──────────────────────────────────────────
|
| 33 |
-
__all__ = [
|
| 34 |
-
"NestedLoRALinear",
|
| 35 |
-
"inject_nested_lora",
|
| 36 |
-
"set_rank",
|
| 37 |
-
"get_lora_params",
|
| 38 |
-
"count_params",
|
| 39 |
-
"OrbitalController",
|
| 40 |
-
"setup_unified_lora",
|
| 41 |
-
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/architecture.md
DELETED
|
@@ -1,171 +0,0 @@
|
|
| 1 |
-
Architecture — Nested Orbital LoRA
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
Core idea: dynamic rank control via stress-driven orbital transitions with weight persistence (no cold start).
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
Problem: cold start on rank transitions
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
Standard multi-rank LoRA keeps separate adapters per rank:
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
r=4, r=8, r=16 → independent weights
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
Switching rank causes partial cold restarts → performance drop.
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
Solution: Nested LoRA (one adapter, multiple ranks)
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
Single adapter at max rank:
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
A(16, d), B(d, 16)
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
Active rank is obtained by slicing:
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
r=4 → A[:4, :], B[:, :4]
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
r=8 → A[:8, :], B[:, :8]
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
r=16 → full matrix
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
r4 ⊂ r8 ⊂ r16
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
Lower ranks reuse trained weights → no cold start.
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
Scaling
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
To keep output magnitude consistent:
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
scale = max_rank / max(r, 1)
|
| 61 |
-
scale = min(scale, 4.0) # optional clamp
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
Orbital Controller (no thresholds)
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
Dynamic trajectory instead of static FSM:
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
Ascend → stress detected → increase rank
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
Hold → oscillation → stay
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
Descend → stable → decrease rank
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
Uses a stack to ensure symmetric return.
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
Stress signal
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
φ(t) = |loss - EMA(loss)| + 2.0 × max(0, loss - prev_loss)
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
Auto-calibrated thresholds:
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
t_stress = μ + 0.7σ
|
| 99 |
-
|
| 100 |
-
t_stable = max(μ - 0.3σ, 0)
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
Robust stats can be used to reduce noise.
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
Why it matters
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
avoids cold starts across rank changes
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
adapts capacity in real-time
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
works in black-box settings
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
O(1) overhead
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
Comparison
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
Property
|
| 133 |
-
Standard LoRA
|
| 134 |
-
AdaLoRA
|
| 135 |
-
Orbital LoRA
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
Rank control
|
| 141 |
-
Fixed
|
| 142 |
-
SVD
|
| 143 |
-
Stress
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
Control type
|
| 147 |
-
None
|
| 148 |
-
Open
|
| 149 |
-
Closed-loop
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
Transition cost
|
| 153 |
-
N/A
|
| 154 |
-
High
|
| 155 |
-
O(1)
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
Architecture
|
| 159 |
-
Single
|
| 160 |
-
Pruned
|
| 161 |
-
Nested
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
Black-box
|
| 165 |
-
Yes
|
| 166 |
-
No
|
| 167 |
-
Yes
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/experimental_results.md
DELETED
|
@@ -1,239 +0,0 @@
|
|
| 1 |
-
Experimental Results
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
Core result: parity with baseline performance with ~15% rank reduction and dynamic shock response.
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
1. Stress Test — Task Switch
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
Setup
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
Model: DistilBERT-base-uncased + NestedLoRALinear (max_rank=16)
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
Protocol: MRPC x 60 steps → SST-2 x 60 steps (shock at step 60)
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
Seeds: 0, 1, 2
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
Baseline: same architecture, fixed rank=16
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
Hardware: Colab T4
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
Results
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
Baseline (r=16)
|
| 40 |
-
Orbital LoRA
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
SST-2 Accuracy
|
| 46 |
-
0.736
|
| 47 |
-
0.740
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
MRPC F1 (retention)
|
| 51 |
-
0.526
|
| 52 |
-
0.515
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
Effective rank
|
| 56 |
-
16.0
|
| 57 |
-
13.6
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
Parity with ~15% rank saving
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
Behavior
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
Post-shock:
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
detect → descend (r16 → r4)
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
stabilize
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
re-ascend (r4 → r16)
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
Baseline: no reaction (fixed r=16)
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
2. Stable Task — Parity
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
Setup
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
Task: MRPC only (120 steps)
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
Seeds: 0, 1, 2
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
Baseline: fixed r=16
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
Results
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
Seed
|
| 113 |
-
Baseline F1
|
| 114 |
-
Orbital F1
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
0
|
| 120 |
-
0.806
|
| 121 |
-
0.808
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
1
|
| 125 |
-
0.822
|
| 126 |
-
0.826
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
2
|
| 130 |
-
0.824
|
| 131 |
-
0.824
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
Mean
|
| 135 |
-
0.818
|
| 136 |
-
0.820
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
No degradation on stable training
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
3. Rank Dynamics (Black-box — Tinker)
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
Methods
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
Method
|
| 154 |
-
Control
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
Standard LoRA
|
| 160 |
-
Fixed rank
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
AdaLoRA-like
|
| 164 |
-
Open-loop
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
Orbital LoRA
|
| 168 |
-
Closed-loop
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
|
| 173 |
-
Disturbance response
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
Method
|
| 179 |
-
Reaction
|
| 180 |
-
Stability
|
| 181 |
-
Recovery
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
Standard
|
| 187 |
-
None
|
| 188 |
-
Passive
|
| 189 |
-
—
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
AdaLoRA-like
|
| 193 |
-
Indirect
|
| 194 |
-
Partial
|
| 195 |
-
Limited
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
Orbital LoRA
|
| 199 |
-
Immediate
|
| 200 |
-
Stable
|
| 201 |
-
Immediate
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
4. Architecture Insight
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
Root cause: cold start from separate adapters.
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
Fix: nested slicing → no cold start → parity restored.
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
5. Black-box compatibility
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
Uses only loss signal.
|
| 221 |
-
|
| 222 |
-
No gradients required.
|
| 223 |
-
|
| 224 |
-
O(1) overhead.
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
Next
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
7B+ validation (ongoing)
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
LR controller integration
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
experiments/stable_task_test.py
DELETED
|
@@ -1,226 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Orbital LoRA — Stable Task Parity Test
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
MRPC only, 120 steps, 3 seeds.
|
| 6 |
-
Validates that the controller causes zero degradation on stable training.
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Usage:
|
| 10 |
-
pip install transformers datasets evaluate
|
| 11 |
-
python stable_task_test.py
|
| 12 |
-
"""
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
import time, random, math, numpy as np, torch, torch.nn as nn
|
| 16 |
-
import torch.nn.functional as F, evaluate
|
| 17 |
-
from datasets import load_dataset
|
| 18 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 19 |
-
from torch.utils.data import DataLoader
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
import sys, os
|
| 23 |
-
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(file))))
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
from nested_lora import NestedLoRALinear, inject_nested_lora
|
| 27 |
-
from orbital_controller import OrbitalController
|
| 28 |
-
from controller import set_rank
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
── CONFIG ──────────────────────────────────────────
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
| 35 |
-
MODEL = "distilbert-base-uncased"
|
| 36 |
-
BATCH = 8
|
| 37 |
-
STEPS = 120
|
| 38 |
-
LR = 5e-5
|
| 39 |
-
SEEDS = [0, 1, 2]
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
MAX_RANK = 16
|
| 43 |
-
WARMUP = 15
|
| 44 |
-
STABLE_WINDOW = 8
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
── DATA ────────────────────────────────────────────
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
print("Loading data...")
|
| 51 |
-
tok = AutoTokenizer.from_pretrained(MODEL)
|
| 52 |
-
ds = load_dataset("glue", "mrpc")
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
def tok_fn(x):
|
| 56 |
-
return tok(x["sentence1"], x["sentence2"],
|
| 57 |
-
truncation=True, padding="max_length", max_length=128)
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
ds = ds.map(tok_fn, batched=True)
|
| 61 |
-
ds.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])
|
| 62 |
-
train_loader = DataLoader(ds["train"], batch_size=BATCH, shuffle=True)
|
| 63 |
-
val_loader = DataLoader(ds["validation"], batch_size=BATCH)
|
| 64 |
-
metric = evaluate.load("glue", "mrpc")
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
── HELPERS ─────────────────────────────────────────
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
def build_model():
|
| 71 |
-
base = AutoModelForSequenceClassification.from_pretrained(
|
| 72 |
-
MODEL, num_labels=2, ignore_mismatched_sizes=True
|
| 73 |
-
)
|
| 74 |
-
return inject_nested_lora(base, MAX_RANK).to(DEVICE)
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
def eval_model(model):
|
| 78 |
-
model.eval()
|
| 79 |
-
preds, labels = [], []
|
| 80 |
-
with torch.no_grad():
|
| 81 |
-
for batch in val_loader:
|
| 82 |
-
x = batch["input_ids"].to(DEVICE)
|
| 83 |
-
m = batch["attention_mask"].to(DEVICE)
|
| 84 |
-
y = batch["label"].to(DEVICE)
|
| 85 |
-
logits = model(input_ids=x, attention_mask=m).logits
|
| 86 |
-
preds.extend(logits.argmax(dim=-1).cpu().numpy())
|
| 87 |
-
labels.extend(y.cpu().numpy())
|
| 88 |
-
return metric.compute(predictions=preds, references=labels)["f1"]
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
def eff_rank(usage):
|
| 92 |
-
tot = sum(usage.values())
|
| 93 |
-
return sum(k * v for k, v in usage.items()) / tot if tot > 0 else 0
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
── TRAIN BASELINE ──────────────────────────────────
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
def train_baseline(model):
|
| 100 |
-
opt = torch.optim.AdamW(model.parameters(), lr=LR)
|
| 101 |
-
set_rank(model, 16)
|
| 102 |
-
it = iter(train_loader)
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
for step in range(STEPS):
|
| 106 |
-
try:
|
| 107 |
-
batch = next(it)
|
| 108 |
-
except StopIteration:
|
| 109 |
-
it = iter(train_loader); batch = next(it)
|
| 110 |
-
|
| 111 |
-
x = batch["input_ids"].to(DEVICE)
|
| 112 |
-
m = batch["attention_mask"].to(DEVICE)
|
| 113 |
-
y = batch["label"].to(DEVICE)
|
| 114 |
-
|
| 115 |
-
loss = model(input_ids=x, attention_mask=m, labels=y).loss
|
| 116 |
-
loss.backward()
|
| 117 |
-
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
|
| 118 |
-
opt.step()
|
| 119 |
-
opt.zero_grad()
|
| 120 |
-
|
| 121 |
-
return model
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
── TRAIN ORBITAL ───────────────────────────────────
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
def train_orbital(model):
|
| 129 |
-
ctrl = OrbitalController(warmup=WARMUP, stable_window=STABLE_WINDOW)
|
| 130 |
-
opt = torch.optim.AdamW(model.parameters(), lr=LR)
|
| 131 |
-
usage = {4: 0, 8: 0, 16: 0}
|
| 132 |
-
rank_trace = []
|
| 133 |
-
it = iter(train_loader)
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
for step in range(STEPS):
|
| 137 |
-
try:
|
| 138 |
-
batch = next(it)
|
| 139 |
-
except StopIteration:
|
| 140 |
-
it = iter(train_loader); batch = next(it)
|
| 141 |
-
|
| 142 |
-
x = batch["input_ids"].to(DEVICE)
|
| 143 |
-
m = batch["attention_mask"].to(DEVICE)
|
| 144 |
-
y = batch["label"].to(DEVICE)
|
| 145 |
-
|
| 146 |
-
loss = model(input_ids=x, attention_mask=m, labels=y).loss
|
| 147 |
-
loss.backward()
|
| 148 |
-
|
| 149 |
-
new_rank = ctrl.step(loss.item())
|
| 150 |
-
new_rank = max(4, min(16, new_rank))
|
| 151 |
-
set_rank(model, new_rank)
|
| 152 |
-
|
| 153 |
-
usage[new_rank] += 1
|
| 154 |
-
rank_trace.append(new_rank)
|
| 155 |
-
|
| 156 |
-
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
|
| 157 |
-
opt.step()
|
| 158 |
-
opt.zero_grad()
|
| 159 |
-
|
| 160 |
-
return model, usage, rank_trace, ctrl
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
── RUN ─────────────────────────────────────────────
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
print(f"\nDevice: {DEVICE}")
|
| 168 |
-
print(f"Task: MRPC, {STEPS} steps")
|
| 169 |
-
print("=" * 55)
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
results = []
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
for seed in SEEDS:
|
| 176 |
-
print(f"\n{'─' * 50}\n SEED {seed}\n{'─' * 50}")
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
torch.manual_seed(seed)
|
| 180 |
-
torch.cuda.manual_seed_all(seed)
|
| 181 |
-
np.random.seed(seed)
|
| 182 |
-
random.seed(seed)
|
| 183 |
-
|
| 184 |
-
base_model = build_model()
|
| 185 |
-
base_model = train_baseline(base_model)
|
| 186 |
-
f1_base = eval_model(base_model)
|
| 187 |
-
del base_model; torch.cuda.empty_cache()
|
| 188 |
-
|
| 189 |
-
torch.manual_seed(seed)
|
| 190 |
-
torch.cuda.manual_seed_all(seed)
|
| 191 |
-
np.random.seed(seed)
|
| 192 |
-
random.seed(seed)
|
| 193 |
-
|
| 194 |
-
uni_model = build_model()
|
| 195 |
-
uni_model, usage, trace, ctrl = train_orbital(uni_model)
|
| 196 |
-
f1_uni = eval_model(uni_model)
|
| 197 |
-
|
| 198 |
-
er = eff_rank(usage)
|
| 199 |
-
saving = 1 - er / 16
|
| 200 |
-
transitions = sum(1 for i in range(1, len(trace)) if trace[i] != trace[i-1])
|
| 201 |
-
|
| 202 |
-
print(f"\n BASELINE F1 = {f1_base:.3f} (rank=16 fixed)")
|
| 203 |
-
print(f" ORBITAL F1 = {f1_uni:.3f} (eff_rank={er:.1f}, saving={saving*100:.0f}%)")
|
| 204 |
-
print(f" delta F1 = {f1_uni - f1_base:+.3f}")
|
| 205 |
-
print(f" Usage: r4={usage[4]} r8={usage[8]} r16={usage[16]} transitions={transitions}")
|
| 206 |
-
|
| 207 |
-
results.append({
|
| 208 |
-
'seed': seed, 'f1_base': f1_base, 'f1_uni': f1_uni,
|
| 209 |
-
'delta': f1_uni - f1_base, 'eff_rank': er,
|
| 210 |
-
})
|
| 211 |
-
del uni_model; torch.cuda.empty_cache()
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
── SUMMARY ─────────────────────────────────────────
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
print(f"\n{'=' * 55}\n SUMMARY\n{'=' * 55}")
|
| 219 |
-
f1b = [r['f1_base'] for r in results]
|
| 220 |
-
f1u = [r['f1_uni'] for r in results]
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
print(f"\n Baseline F1: {np.mean(f1b):.3f} +/- {np.std(f1b):.3f}")
|
| 224 |
-
print(f" Orbital F1: {np.mean(f1u):.3f} +/- {np.std(f1u):.3f}")
|
| 225 |
-
print(f" delta F1: {np.mean([r['delta'] for r in results]):+.3f}")
|
| 226 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
experiments/stress_test_task_switch.py
DELETED
|
@@ -1,214 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Orbital LoRA — Stress Test: Task Switch
|
| 3 |
-
|
| 4 |
-
MRPC (60 steps) → SST-2 (60 steps)
|
| 5 |
-
Baseline (r=16 fixed) vs Orbital Controller
|
| 6 |
-
"""
|
| 7 |
-
|
| 8 |
-
import time, random, math, numpy as np, torch, torch.nn as nn
|
| 9 |
-
import torch.nn.functional as F, evaluate
|
| 10 |
-
from datasets import load_dataset
|
| 11 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 12 |
-
from torch.utils.data import DataLoader
|
| 13 |
-
|
| 14 |
-
import sys, os
|
| 15 |
-
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(file))))
|
| 16 |
-
|
| 17 |
-
from nested_lora import NestedLoRALinear, inject_nested_lora
|
| 18 |
-
from orbital_controller import OrbitalController
|
| 19 |
-
from controller import set_rank
|
| 20 |
-
|
| 21 |
-
── CONFIG ──────────────────────────────────────────
|
| 22 |
-
|
| 23 |
-
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
|
| 24 |
-
MODEL = "distilbert-base-uncased"
|
| 25 |
-
BATCH = 8
|
| 26 |
-
LR = 5e-5
|
| 27 |
-
SEEDS = [0, 1, 2]
|
| 28 |
-
|
| 29 |
-
MAX_RANK = 16
|
| 30 |
-
WARMUP = 10
|
| 31 |
-
STABLE_WINDOW = 6
|
| 32 |
-
|
| 33 |
-
STEPS_TASK1 = 60
|
| 34 |
-
STEPS_TASK2 = 60
|
| 35 |
-
TOTAL_STEPS = STEPS_TASK1 + STEPS_TASK2
|
| 36 |
-
|
| 37 |
-
── DATA ────────────────────────────────────────────
|
| 38 |
-
|
| 39 |
-
print("Loading data...")
|
| 40 |
-
tok = AutoTokenizer.from_pretrained(MODEL)
|
| 41 |
-
|
| 42 |
-
ds_mrpc = load_dataset("glue", "mrpc")
|
| 43 |
-
def tok_mrpc(x):
|
| 44 |
-
return tok(x["sentence1"], x["sentence2"],
|
| 45 |
-
truncation=True, padding="max_length", max_length=128)
|
| 46 |
-
ds_mrpc = ds_mrpc.map(tok_mrpc, batched=True)
|
| 47 |
-
ds_mrpc.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])
|
| 48 |
-
train_mrpc = DataLoader(ds_mrpc["train"], batch_size=BATCH, shuffle=True)
|
| 49 |
-
val_mrpc = DataLoader(ds_mrpc["validation"], batch_size=BATCH)
|
| 50 |
-
|
| 51 |
-
ds_sst2 = load_dataset("glue", "sst2")
|
| 52 |
-
def tok_sst2(x):
|
| 53 |
-
return tok(x["sentence"], truncation=True, padding="max_length", max_length=128)
|
| 54 |
-
ds_sst2 = ds_sst2.map(tok_sst2, batched=True)
|
| 55 |
-
ds_sst2.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])
|
| 56 |
-
train_sst2 = DataLoader(ds_sst2["train"], batch_size=BATCH, shuffle=True)
|
| 57 |
-
val_sst2 = DataLoader(ds_sst2["validation"], batch_size=BATCH)
|
| 58 |
-
|
| 59 |
-
metric_mrpc = evaluate.load("glue", "mrpc")
|
| 60 |
-
metric_sst2 = evaluate.load("glue", "sst2")
|
| 61 |
-
|
| 62 |
-
── HELPERS ─────────────────────────────────────────
|
| 63 |
-
|
| 64 |
-
def make_iter(loader):
|
| 65 |
-
while True:
|
| 66 |
-
for batch in loader:
|
| 67 |
-
yield batch
|
| 68 |
-
|
| 69 |
-
def get_batch(it):
|
| 70 |
-
batch = next(it)
|
| 71 |
-
return (batch["input_ids"].to(DEVICE),
|
| 72 |
-
batch["attention_mask"].to(DEVICE),
|
| 73 |
-
batch["label"].to(DEVICE))
|
| 74 |
-
|
| 75 |
-
def build_model():
|
| 76 |
-
base = AutoModelForSequenceClassification.from_pretrained(
|
| 77 |
-
MODEL, num_labels=2, ignore_mismatched_sizes=True
|
| 78 |
-
)
|
| 79 |
-
return inject_nested_lora(base, MAX_RANK).to(DEVICE)
|
| 80 |
-
|
| 81 |
-
def eval_f1(model, loader, metric_fn):
|
| 82 |
-
model.eval()
|
| 83 |
-
preds, labels = [], []
|
| 84 |
-
with torch.no_grad():
|
| 85 |
-
for batch in loader:
|
| 86 |
-
x = batch["input_ids"].to(DEVICE)
|
| 87 |
-
m = batch["attention_mask"].to(DEVICE)
|
| 88 |
-
y = batch["label"].to(DEVICE)
|
| 89 |
-
logits = model(input_ids=x, attention_mask=m).logits
|
| 90 |
-
preds.extend(logits.argmax(dim=-1).cpu().numpy())
|
| 91 |
-
labels.extend(y.cpu().numpy())
|
| 92 |
-
model.train()
|
| 93 |
-
result = metric_fn.compute(predictions=preds, references=labels)
|
| 94 |
-
return result.get("f1", result.get("accuracy", 0.0))
|
| 95 |
-
|
| 96 |
-
def eff_rank(usage):
|
| 97 |
-
tot = sum(usage.values())
|
| 98 |
-
return sum(k * v for k, v in usage.items()) / tot if tot > 0 else 0
|
| 99 |
-
|
| 100 |
-
── TRAIN BASELINE ──────────────────────────────────
|
| 101 |
-
|
| 102 |
-
def train_baseline(model):
|
| 103 |
-
opt = torch.optim.AdamW(model.parameters(), lr=LR)
|
| 104 |
-
set_rank(model, 16)
|
| 105 |
-
it_mrpc = make_iter(train_mrpc)
|
| 106 |
-
it_sst2 = make_iter(train_sst2)
|
| 107 |
-
|
| 108 |
-
for step in range(TOTAL_STEPS):
|
| 109 |
-
x, m, y = get_batch(it_mrpc if step < STEPS_TASK1 else it_sst2)
|
| 110 |
-
|
| 111 |
-
loss = model(input_ids=x, attention_mask=m, labels=y).loss
|
| 112 |
-
loss.backward()
|
| 113 |
-
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
|
| 114 |
-
opt.step()
|
| 115 |
-
opt.zero_grad()
|
| 116 |
-
|
| 117 |
-
return model
|
| 118 |
-
|
| 119 |
-
── TRAIN ORBITAL ───────────────────────────────────
|
| 120 |
-
|
| 121 |
-
def train_orbital(model):
|
| 122 |
-
ctrl = OrbitalController(warmup=WARMUP, stable_window=STABLE_WINDOW)
|
| 123 |
-
ctrl.rank = 4
|
| 124 |
-
set_rank(model, 4)
|
| 125 |
-
|
| 126 |
-
opt = torch.optim.AdamW(model.parameters(), lr=LR)
|
| 127 |
-
usage = {4: 0, 8: 0, 16: 0}
|
| 128 |
-
rank_trace = []
|
| 129 |
-
it_mrpc = make_iter(train_mrpc)
|
| 130 |
-
it_sst2 = make_iter(train_sst2)
|
| 131 |
-
|
| 132 |
-
for step in range(TOTAL_STEPS):
|
| 133 |
-
x, m, y = get_batch(it_mrpc if step < STEPS_TASK1 else it_sst2)
|
| 134 |
-
|
| 135 |
-
loss = model(input_ids=x, attention_mask=m, labels=y).loss
|
| 136 |
-
loss.backward()
|
| 137 |
-
|
| 138 |
-
new_rank = ctrl.step(loss.item())
|
| 139 |
-
new_rank = max(4, min(16, new_rank))
|
| 140 |
-
set_rank(model, new_rank)
|
| 141 |
-
|
| 142 |
-
usage[new_rank] += 1
|
| 143 |
-
rank_trace.append(new_rank)
|
| 144 |
-
|
| 145 |
-
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
|
| 146 |
-
opt.step()
|
| 147 |
-
opt.zero_grad()
|
| 148 |
-
|
| 149 |
-
return model, usage, rank_trace
|
| 150 |
-
|
| 151 |
-
── RUN ─────────────────────────────────────────────
|
| 152 |
-
|
| 153 |
-
print(f"\nDevice: {DEVICE}")
|
| 154 |
-
print(f"Plan: MRPC × {STEPS_TASK1} → SST-2 × {STEPS_TASK2}")
|
| 155 |
-
print(f"Shock at step {STEPS_TASK1}")
|
| 156 |
-
print("=" * 55)
|
| 157 |
-
|
| 158 |
-
results = []
|
| 159 |
-
|
| 160 |
-
for seed in SEEDS:
|
| 161 |
-
print(f"\n{'─' * 55}\n SEED {seed}\n{'─' * 55}")
|
| 162 |
-
|
| 163 |
-
torch.manual_seed(seed)
|
| 164 |
-
torch.cuda.manual_seed_all(seed)
|
| 165 |
-
np.random.seed(seed)
|
| 166 |
-
random.seed(seed)
|
| 167 |
-
|
| 168 |
-
base_model = build_model()
|
| 169 |
-
base_model = train_baseline(base_model)
|
| 170 |
-
f1_mrpc_base = eval_f1(base_model, val_mrpc, metric_mrpc)
|
| 171 |
-
f1_sst2_base = eval_f1(base_model, val_sst2, metric_sst2)
|
| 172 |
-
del base_model; torch.cuda.empty_cache()
|
| 173 |
-
|
| 174 |
-
torch.manual_seed(seed)
|
| 175 |
-
torch.cuda.manual_seed_all(seed)
|
| 176 |
-
np.random.seed(seed)
|
| 177 |
-
random.seed(seed)
|
| 178 |
-
|
| 179 |
-
uni_model = build_model()
|
| 180 |
-
uni_model, usage, rank_trace = train_orbital(uni_model)
|
| 181 |
-
f1_mrpc_uni = eval_f1(uni_model, val_mrpc, metric_mrpc)
|
| 182 |
-
f1_sst2_uni = eval_f1(uni_model, val_sst2, metric_sst2)
|
| 183 |
-
|
| 184 |
-
er = eff_rank(usage)
|
| 185 |
-
saving = 1 - er / 16
|
| 186 |
-
transitions = sum(1 for i in range(1, len(rank_trace)) if rank_trace[i] != rank_trace[i-1])
|
| 187 |
-
|
| 188 |
-
print(f"\n {'':30s} {'BASELINE':>10s} {'ORBITAL':>10s}")
|
| 189 |
-
print(f" {'─' * 55}")
|
| 190 |
-
print(f" {'MRPC F1 (retention)':30s} {f1_mrpc_base:10.3f} {f1_mrpc_uni:10.3f}")
|
| 191 |
-
print(f" {'SST-2 Acc (new task)':30s} {f1_sst2_base:10.3f} {f1_sst2_uni:10.3f}")
|
| 192 |
-
print(f"\n Orbital: eff_rank={er:.1f} saving={saving*100:.0f}% transitions={transitions}")
|
| 193 |
-
|
| 194 |
-
results.append({
|
| 195 |
-
'f1_mrpc_base': f1_mrpc_base, 'f1_sst2_base': f1_sst2_base,
|
| 196 |
-
'f1_mrpc_uni': f1_mrpc_uni, 'f1_sst2_uni': f1_sst2_uni,
|
| 197 |
-
'eff_rank': er, 'saving': saving
|
| 198 |
-
})
|
| 199 |
-
del uni_model; torch.cuda.empty_cache()
|
| 200 |
-
|
| 201 |
-
── SUMMARY ─────────────────────────────────────────
|
| 202 |
-
|
| 203 |
-
print(f"\n{'=' * 55}\n SUMMARY\n{'=' * 55}")
|
| 204 |
-
mrpc_b = np.mean([r['f1_mrpc_base'] for r in results])
|
| 205 |
-
mrpc_u = np.mean([r['f1_mrpc_uni'] for r in results])
|
| 206 |
-
sst2_b = np.mean([r['f1_sst2_base'] for r in results])
|
| 207 |
-
sst2_u = np.mean([r['f1_sst2_uni'] for r in results])
|
| 208 |
-
er_avg = np.mean([r['eff_rank'] for r in results])
|
| 209 |
-
sv_avg = np.mean([r['saving'] for r in results])
|
| 210 |
-
|
| 211 |
-
print(f"\n {'MRPC F1':20s} {mrpc_b:.3f} → {mrpc_u:.3f}")
|
| 212 |
-
print(f" {'SST-2 Acc':20s} {sst2_b:.3f} → {sst2_u:.3f}")
|
| 213 |
-
print(f" {'Eff rank':20s} 16.0 → {er_avg:.1f}")
|
| 214 |
-
print(f" {'Saving':20s} 0% → {sv_avg*100:.0f}%")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
nested_lora.py
DELETED
|
@@ -1,130 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Nested LoRA — One Particle, Multiple Orbitals
|
| 3 |
-
===============================================
|
| 4 |
-
|
| 5 |
-
Single LoRA adapter pair with dynamic rank via slicing.
|
| 6 |
-
r4 ⊂ r8 ⊂ r16 — descending pauses dimensions, ascending resumes them.
|
| 7 |
-
Zero cold start on transitions.
|
| 8 |
-
|
| 9 |
-
This module is the "engine" — pure architecture, no control logic.
|
| 10 |
-
Pair with OrbitalController for adaptive rank decisions.
|
| 11 |
-
|
| 12 |
-
Author: Simona Vargiu
|
| 13 |
-
License: Apache 2.0
|
| 14 |
-
"""
|
| 15 |
-
|
| 16 |
-
import math
|
| 17 |
-
import torch
|
| 18 |
-
import torch.nn as nn
|
| 19 |
-
import torch.nn.functional as F
|
| 20 |
-
from typing import List
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
class NestedLoRALinear(nn.Module):
|
| 24 |
-
"""
|
| 25 |
-
Single LoRA adapter with dynamic rank via slicing.
|
| 26 |
-
|
| 27 |
-
A single pair of matrices A(max_rank, in) and B(out, max_rank) is shared
|
| 28 |
-
across all rank levels. The active rank is controlled by slicing:
|
| 29 |
-
|
| 30 |
-
r=4 → A[:4, :], B[:, :4]
|
| 31 |
-
r=8 → A[:8, :], B[:, :8]
|
| 32 |
-
r=16 → A[:16,:], B[:, :16]
|
| 33 |
-
|
| 34 |
-
When descending from r=16 to r=4, dimensions 0-3 retain all learned
|
| 35 |
-
weights. Dimensions 4-15 are paused (no gradient), not destroyed.
|
| 36 |
-
When ascending back, they resume exactly where they left off.
|
| 37 |
-
|
| 38 |
-
Output is scaled by max_rank/active_rank to maintain consistent
|
| 39 |
-
magnitude across rank changes (analogous to alpha/r in standard LoRA).
|
| 40 |
-
|
| 41 |
-
Args:
|
| 42 |
-
linear: Original nn.Linear layer to wrap
|
| 43 |
-
max_rank: Maximum LoRA rank (default: 16)
|
| 44 |
-
|
| 45 |
-
Example:
|
| 46 |
-
>>> layer = NestedLoRALinear(original_linear, max_rank=16)
|
| 47 |
-
>>> layer.set_rank(4) # use 4 dimensions
|
| 48 |
-
>>> out = layer(x) # forward with r=4
|
| 49 |
-
>>> layer.set_rank(16) # expand to full rank
|
| 50 |
-
>>> out = layer(x) # forward with r=16, dimensions 0-3 unchanged
|
| 51 |
-
"""
|
| 52 |
-
|
| 53 |
-
def __init__(self, linear: nn.Linear, max_rank: int = 16):
|
| 54 |
-
super().__init__()
|
| 55 |
-
self.linear = linear
|
| 56 |
-
self.max_rank = max_rank
|
| 57 |
-
self.active_rank = max_rank
|
| 58 |
-
|
| 59 |
-
# Freeze original weights
|
| 60 |
-
for p in self.linear.parameters():
|
| 61 |
-
p.requires_grad = False
|
| 62 |
-
|
| 63 |
-
# One particle: single A and B
|
| 64 |
-
self.lora_A = nn.Parameter(torch.empty(max_rank, linear.in_features))
|
| 65 |
-
self.lora_B = nn.Parameter(torch.zeros(linear.out_features, max_rank))
|
| 66 |
-
|
| 67 |
-
# Standard LoRA init: A = kaiming, B = zeros → initial delta = 0
|
| 68 |
-
nn.init.kaiming_uniform_(self.lora_A, a=math.sqrt(5))
|
| 69 |
-
|
| 70 |
-
def set_rank(self, r: int):
|
| 71 |
-
"""Set the active orbital. Must be <= max_rank."""
|
| 72 |
-
self.active_rank = min(r, self.max_rank)
|
| 73 |
-
|
| 74 |
-
def forward(self, x: torch.Tensor) -> torch.Tensor:
|
| 75 |
-
base = self.linear(x)
|
| 76 |
-
r = self.active_rank
|
| 77 |
-
|
| 78 |
-
h = F.linear(x, self.lora_A[:r, :])
|
| 79 |
-
delta = F.linear(h, self.lora_B[:, :r])
|
| 80 |
-
|
| 81 |
-
scale = self.max_rank / r
|
| 82 |
-
return base + delta * scale
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
def inject_nested_lora(model: nn.Module, max_rank: int = 16) -> nn.Module:
|
| 86 |
-
"""
|
| 87 |
-
Replace attention Linear layers with NestedLoRALinear.
|
| 88 |
-
|
| 89 |
-
Targets any nn.Linear whose full name contains "attention".
|
| 90 |
-
Original weights are frozen; only LoRA parameters are trainable.
|
| 91 |
-
|
| 92 |
-
Args:
|
| 93 |
-
model: PyTorch model
|
| 94 |
-
max_rank: Maximum LoRA rank
|
| 95 |
-
|
| 96 |
-
Returns:
|
| 97 |
-
Model with NestedLoRA injected
|
| 98 |
-
"""
|
| 99 |
-
for name, module in list(model.named_modules()):
|
| 100 |
-
if isinstance(module, nn.Linear) and "attention" in name:
|
| 101 |
-
parent = model
|
| 102 |
-
*path, last = name.split(".")
|
| 103 |
-
for p in path:
|
| 104 |
-
parent = getattr(parent, p)
|
| 105 |
-
setattr(parent, last, NestedLoRALinear(module, max_rank))
|
| 106 |
-
return model
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
def set_rank(model: nn.Module, r: int):
|
| 110 |
-
"""Set active rank on all NestedLoRALinear modules in the model."""
|
| 111 |
-
for m in model.modules():
|
| 112 |
-
if isinstance(m, NestedLoRALinear):
|
| 113 |
-
m.set_rank(r)
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
def get_lora_params(model: nn.Module) -> List[nn.Parameter]:
|
| 117 |
-
"""Get all LoRA parameters (for optimizer setup)."""
|
| 118 |
-
params = []
|
| 119 |
-
for m in model.modules():
|
| 120 |
-
if isinstance(m, NestedLoRALinear):
|
| 121 |
-
params.extend([m.lora_A, m.lora_B])
|
| 122 |
-
return params
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
def count_params(model: nn.Module) -> dict:
|
| 126 |
-
"""Count total, trainable, and LoRA parameters."""
|
| 127 |
-
total = sum(p.numel() for p in model.parameters())
|
| 128 |
-
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
|
| 129 |
-
lora = sum(p.numel() for p in get_lora_params(model))
|
| 130 |
-
return {"total": total, "trainable": trainable, "lora": lora}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
notebooks/mrpc_example.ipynb
DELETED
|
@@ -1,165 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"cells": [
|
| 3 |
-
{
|
| 4 |
-
"cell_type": "markdown",
|
| 5 |
-
"metadata": {},
|
| 6 |
-
"source": [
|
| 7 |
-
"# Orbital LoRA - MRPC Benchmark Example\n",
|
| 8 |
-
"\n",
|
| 9 |
-
"**Expected:** performance parity with baseline + adaptive behavior\n"
|
| 10 |
-
]
|
| 11 |
-
},
|
| 12 |
-
{
|
| 13 |
-
"cell_type": "code",
|
| 14 |
-
"metadata": {},
|
| 15 |
-
"source": [
|
| 16 |
-
"!pip install -q transformers datasets evaluate scikit-learn accelerate"
|
| 17 |
-
],
|
| 18 |
-
"outputs": [],
|
| 19 |
-
"execution_count": null
|
| 20 |
-
},
|
| 21 |
-
{
|
| 22 |
-
"cell_type": "code",
|
| 23 |
-
"metadata": {},
|
| 24 |
-
"source": [
|
| 25 |
-
"import torch\n",
|
| 26 |
-
"from datasets import load_dataset\n",
|
| 27 |
-
"from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
|
| 28 |
-
"from torch.utils.data import DataLoader\n",
|
| 29 |
-
"import evaluate\n",
|
| 30 |
-
"\n",
|
| 31 |
-
"import sys\n",
|
| 32 |
-
"sys.path.append('..')\n",
|
| 33 |
-
"\n",
|
| 34 |
-
"from nested_lora import inject_nested_lora\n",
|
| 35 |
-
"from orbital_controller import OrbitalController\n",
|
| 36 |
-
"from controller import set_rank\n",
|
| 37 |
-
"\n",
|
| 38 |
-
"device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
|
| 39 |
-
"print(device)"
|
| 40 |
-
],
|
| 41 |
-
"outputs": [],
|
| 42 |
-
"execution_count": null
|
| 43 |
-
},
|
| 44 |
-
{
|
| 45 |
-
"cell_type": "code",
|
| 46 |
-
"metadata": {},
|
| 47 |
-
"source": [
|
| 48 |
-
"dataset = load_dataset('glue','mrpc')\n",
|
| 49 |
-
"tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')\n",
|
| 50 |
-
"\n",
|
| 51 |
-
"def tok(x):\n",
|
| 52 |
-
" return tokenizer(x['sentence1'], x['sentence2'], truncation=True, padding='max_length', max_length=128)\n",
|
| 53 |
-
"\n",
|
| 54 |
-
"train = dataset['train'].map(tok, batched=True)\n",
|
| 55 |
-
"val = dataset['validation'].map(tok, batched=True)\n",
|
| 56 |
-
"\n",
|
| 57 |
-
"train.set_format(type='torch', columns=['input_ids','attention_mask','label'])\n",
|
| 58 |
-
"val.set_format(type='torch', columns=['input_ids','attention_mask','label'])\n",
|
| 59 |
-
"\n",
|
| 60 |
-
"train_loader = DataLoader(train, batch_size=16, shuffle=True)\n",
|
| 61 |
-
"val_loader = DataLoader(val, batch_size=16)\n",
|
| 62 |
-
"\n",
|
| 63 |
-
"metric = evaluate.load('glue','mrpc')"
|
| 64 |
-
],
|
| 65 |
-
"outputs": [],
|
| 66 |
-
"execution_count": null
|
| 67 |
-
},
|
| 68 |
-
{
|
| 69 |
-
"cell_type": "code",
|
| 70 |
-
"metadata": {},
|
| 71 |
-
"source": [
|
| 72 |
-
"def eval_model(model):\n",
|
| 73 |
-
" model.eval()\n",
|
| 74 |
-
" preds, labels = [], []\n",
|
| 75 |
-
" with torch.no_grad():\n",
|
| 76 |
-
" for b in val_loader:\n",
|
| 77 |
-
" x=b['input_ids'].to(device)\n",
|
| 78 |
-
" m=b['attention_mask'].to(device)\n",
|
| 79 |
-
" y=b['label'].to(device)\n",
|
| 80 |
-
" p=model(input_ids=x,attention_mask=m).logits.argmax(-1)\n",
|
| 81 |
-
" preds.extend(p.cpu().numpy()); labels.extend(y.cpu().numpy())\n",
|
| 82 |
-
" return metric.compute(predictions=preds,references=labels)['f1']"
|
| 83 |
-
],
|
| 84 |
-
"outputs": [],
|
| 85 |
-
"execution_count": null
|
| 86 |
-
},
|
| 87 |
-
{
|
| 88 |
-
"cell_type": "code",
|
| 89 |
-
"metadata": {},
|
| 90 |
-
"source": [
|
| 91 |
-
"# BASELINE\n",
|
| 92 |
-
"model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)\n",
|
| 93 |
-
"model = inject_nested_lora(model,16).to(device)\n",
|
| 94 |
-
"set_rank(model,16)\n",
|
| 95 |
-
"\n",
|
| 96 |
-
"opt = torch.optim.AdamW(model.parameters(), lr=5e-5)\n",
|
| 97 |
-
"\n",
|
| 98 |
-
"for step,b in enumerate(train_loader):\n",
|
| 99 |
-
" if step>200: break\n",
|
| 100 |
-
" x=b['input_ids'].to(device); m=b['attention_mask'].to(device); y=b['label'].to(device)\n",
|
| 101 |
-
" loss=model(input_ids=x,attention_mask=m,labels=y).loss\n",
|
| 102 |
-
" loss.backward(); opt.step(); opt.zero_grad()\n",
|
| 103 |
-
"\n",
|
| 104 |
-
"f1_base = eval_model(model)\n",
|
| 105 |
-
"print('Baseline F1:', round(f1_base,3))"
|
| 106 |
-
],
|
| 107 |
-
"outputs": [],
|
| 108 |
-
"execution_count": null
|
| 109 |
-
},
|
| 110 |
-
{
|
| 111 |
-
"cell_type": "code",
|
| 112 |
-
"metadata": {},
|
| 113 |
-
"source": [
|
| 114 |
-
"# ORBITAL\n",
|
| 115 |
-
"model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)\n",
|
| 116 |
-
"model = inject_nested_lora(model,16).to(device)\n",
|
| 117 |
-
"\n",
|
| 118 |
-
"ctrl = OrbitalController(warmup=10, stable_window=6)\n",
|
| 119 |
-
"set_rank(model,4)\n",
|
| 120 |
-
"\n",
|
| 121 |
-
"opt = torch.optim.AdamW(model.parameters(), lr=5e-5)\n",
|
| 122 |
-
"\n",
|
| 123 |
-
"for step,b in enumerate(train_loader):\n",
|
| 124 |
-
" if step>200: break\n",
|
| 125 |
-
" x=b['input_ids'].to(device); m=b['attention_mask'].to(device); y=b['label'].to(device)\n",
|
| 126 |
-
" loss=model(input_ids=x,attention_mask=m,labels=y).loss\n",
|
| 127 |
-
" loss.backward()\n",
|
| 128 |
-
"\n",
|
| 129 |
-
" r = ctrl.step(loss.item())\n",
|
| 130 |
-
" r = max(4,min(16,r))\n",
|
| 131 |
-
" set_rank(model,r)\n",
|
| 132 |
-
"\n",
|
| 133 |
-
" opt.step(); opt.zero_grad()\n",
|
| 134 |
-
"\n",
|
| 135 |
-
"f1_orb = eval_model(model)\n",
|
| 136 |
-
"print('Orbital F1:', round(f1_orb,3))"
|
| 137 |
-
],
|
| 138 |
-
"outputs": [],
|
| 139 |
-
"execution_count": null
|
| 140 |
-
},
|
| 141 |
-
{
|
| 142 |
-
"cell_type": "code",
|
| 143 |
-
"metadata": {},
|
| 144 |
-
"source": [
|
| 145 |
-
"print('\\nBaseline:', round(f1_base,3))\n",
|
| 146 |
-
"print('Orbital:', round(f1_orb,3))\n",
|
| 147 |
-
"print('Delta:', round(f1_orb-f1_base,3))"
|
| 148 |
-
],
|
| 149 |
-
"outputs": [],
|
| 150 |
-
"execution_count": null
|
| 151 |
-
}
|
| 152 |
-
],
|
| 153 |
-
"metadata": {
|
| 154 |
-
"kernelspec": {
|
| 155 |
-
"display_name": "Python 3",
|
| 156 |
-
"language": "python",
|
| 157 |
-
"name": "python3"
|
| 158 |
-
},
|
| 159 |
-
"language_info": {
|
| 160 |
-
"name": "python"
|
| 161 |
-
}
|
| 162 |
-
},
|
| 163 |
-
"nbformat": 4,
|
| 164 |
-
"nbformat_minor": 4
|
| 165 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
orbital_controller.py
DELETED
|
@@ -1,291 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Orbital Controller — Trajectory Control with Memory
|
| 3 |
-
=====================================================
|
| 4 |
-
|
| 5 |
-
Closed-loop rank controller that adapts model capacity based on
|
| 6 |
-
observed training stress. Works with any rank-adjustable system
|
| 7 |
-
(NestedLoRA, adaptive LR, or API-based training).
|
| 8 |
-
|
| 9 |
-
This module is the "intelligence" — pure control logic, no model code.
|
| 10 |
-
Pair with NestedLoRA for the complete Unified-LoRA system.
|
| 11 |
-
|
| 12 |
-
Author: Simona Vargiu
|
| 13 |
-
License: Apache 2.0
|
| 14 |
-
"""
|
| 15 |
-
|
| 16 |
-
import numpy as np
|
| 17 |
-
from typing import Dict, List, Optional
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
class OrbitalController:
|
| 21 |
-
"""
|
| 22 |
-
Closed-loop trajectory controller for dynamic capacity adaptation.
|
| 23 |
-
|
| 24 |
-
Unlike threshold-based controllers that map stress to rank statically,
|
| 25 |
-
this implements orbital dynamics with memory:
|
| 26 |
-
|
| 27 |
-
Ascend: stress detected → jump to higher orbital, push delta
|
| 28 |
-
Hold: oscillating → stay, don't move
|
| 29 |
-
Descend: confirmed stable → pop delta, symmetric return
|
| 30 |
-
|
| 31 |
-
Each capacity increase is tracked on a stack and reversed only under
|
| 32 |
-
confirmed stability. This prevents premature compression (returning
|
| 33 |
-
too early) and oscillatory collapse (bouncing between ranks).
|
| 34 |
-
|
| 35 |
-
The stress signal and thresholds are adaptive — they auto-calibrate
|
| 36 |
-
to any model/task/loss scale without manual tuning.
|
| 37 |
-
|
| 38 |
-
Args:
|
| 39 |
-
ranks: Available capacity levels (default: [4, 8, 16])
|
| 40 |
-
warmup: Steps at max capacity to build EMA baseline
|
| 41 |
-
stable_window: Consecutive stable steps required for descent
|
| 42 |
-
|
| 43 |
-
Example:
|
| 44 |
-
>>> from nested_lora import inject_nested_lora, set_rank
|
| 45 |
-
>>> from orbital_controller import OrbitalController
|
| 46 |
-
>>>
|
| 47 |
-
>>> model = inject_nested_lora(model, max_rank=16)
|
| 48 |
-
>>> ctrl = OrbitalController()
|
| 49 |
-
>>>
|
| 50 |
-
>>> for step, batch in enumerate(loader):
|
| 51 |
-
... loss = model(**batch).loss
|
| 52 |
-
... new_rank = ctrl.step(loss.item())
|
| 53 |
-
... set_rank(model, new_rank)
|
| 54 |
-
... loss.backward()
|
| 55 |
-
... optimizer.step()
|
| 56 |
-
"""
|
| 57 |
-
|
| 58 |
-
def __init__(
|
| 59 |
-
self,
|
| 60 |
-
ranks: Optional[List[int]] = None,
|
| 61 |
-
warmup: int = 10,
|
| 62 |
-
stable_window: int = 6,
|
| 63 |
-
):
|
| 64 |
-
self.RANKS = ranks or [4, 8, 16]
|
| 65 |
-
self.warmup = warmup
|
| 66 |
-
self.stable_window = stable_window
|
| 67 |
-
self.reset()
|
| 68 |
-
|
| 69 |
-
def reset(self):
|
| 70 |
-
"""Reset controller to initial state."""
|
| 71 |
-
self.rank = self.RANKS[-1]
|
| 72 |
-
self.orbit_stack = []
|
| 73 |
-
self.loss_ema = 0.0
|
| 74 |
-
self.prev_loss = None
|
| 75 |
-
self.phi_hist = []
|
| 76 |
-
self.stable_count = 0
|
| 77 |
-
self.step_count = 0
|
| 78 |
-
self.post_warmup = False
|
| 79 |
-
|
| 80 |
-
self.history = {
|
| 81 |
-
"rank": [],
|
| 82 |
-
"phi": [],
|
| 83 |
-
"stable_count": [],
|
| 84 |
-
}
|
| 85 |
-
|
| 86 |
-
# ── Stress signal ───────────────────────────────
|
| 87 |
-
|
| 88 |
-
def _compute_phi(self, loss: float) -> float:
|
| 89 |
-
"""
|
| 90 |
-
Stress signal from loss trajectory.
|
| 91 |
-
|
| 92 |
-
φ = |loss - EMA| + 2.0 × max(0, loss - prev_loss)
|
| 93 |
-
|
| 94 |
-
Combines deviation from trend (general instability)
|
| 95 |
-
with spike detection (sudden deterioration).
|
| 96 |
-
"""
|
| 97 |
-
self.loss_ema = 0.9 * self.loss_ema + 0.1 * loss
|
| 98 |
-
delta = abs(loss - self.loss_ema)
|
| 99 |
-
spike = max(0.0, loss - self.prev_loss) if self.prev_loss is not None else 0.0
|
| 100 |
-
self.prev_loss = loss
|
| 101 |
-
return delta + 2.0 * spike
|
| 102 |
-
|
| 103 |
-
def _thresholds(self):
|
| 104 |
-
"""
|
| 105 |
-
Adaptive thresholds from running statistics.
|
| 106 |
-
|
| 107 |
-
t_stress = μ + 0.7σ (above this → ascend)
|
| 108 |
-
t_stable = μ - 0.3σ (below this → stability confirmed)
|
| 109 |
-
|
| 110 |
-
Auto-calibrates to loss scale. No manual tuning.
|
| 111 |
-
"""
|
| 112 |
-
if len(self.phi_hist) < 10:
|
| 113 |
-
return 0.15, 0.04
|
| 114 |
-
recent = self.phi_hist[-40:]
|
| 115 |
-
mu = np.mean(recent)
|
| 116 |
-
sigma = np.std(recent) + 1e-8
|
| 117 |
-
t_stress = mu + 0.7 * sigma
|
| 118 |
-
t_stable = max(mu - 0.3 * sigma, 0.0)
|
| 119 |
-
return t_stress, t_stable
|
| 120 |
-
|
| 121 |
-
# ── Core logic ──────────────────────────────────
|
| 122 |
-
|
| 123 |
-
def _rank_index(self) -> int:
|
| 124 |
-
return self.RANKS.index(self.rank)
|
| 125 |
-
|
| 126 |
-
def step(self, loss: float) -> int:
|
| 127 |
-
"""
|
| 128 |
-
Called once per training step. Returns the capacity level to use.
|
| 129 |
-
|
| 130 |
-
Args:
|
| 131 |
-
loss: Current step loss value
|
| 132 |
-
|
| 133 |
-
Returns:
|
| 134 |
-
int: Active rank (or capacity level) for next step
|
| 135 |
-
"""
|
| 136 |
-
self.step_count += 1
|
| 137 |
-
|
| 138 |
-
# First step: initialize EMA
|
| 139 |
-
if self.prev_loss is None:
|
| 140 |
-
self.loss_ema = loss
|
| 141 |
-
self.prev_loss = loss
|
| 142 |
-
self._log(0.0)
|
| 143 |
-
return self.rank
|
| 144 |
-
|
| 145 |
-
phi = self._compute_phi(loss)
|
| 146 |
-
self.phi_hist.append(phi)
|
| 147 |
-
|
| 148 |
-
# Warmup: build baseline at max capacity
|
| 149 |
-
if self.step_count <= self.warmup:
|
| 150 |
-
self._log(phi)
|
| 151 |
-
return self.rank
|
| 152 |
-
|
| 153 |
-
# Transition: warmup → ground state
|
| 154 |
-
if not self.post_warmup:
|
| 155 |
-
self.post_warmup = True
|
| 156 |
-
self.rank = self.RANKS[0]
|
| 157 |
-
self.orbit_stack = []
|
| 158 |
-
self.stable_count = 0
|
| 159 |
-
self._log(phi)
|
| 160 |
-
return self.rank
|
| 161 |
-
|
| 162 |
-
t_stress, t_stable = self._thresholds()
|
| 163 |
-
|
| 164 |
-
# Stability counter
|
| 165 |
-
if phi <= t_stable:
|
| 166 |
-
self.stable_count += 1
|
| 167 |
-
elif phi > t_stress:
|
| 168 |
-
self.stable_count = 0
|
| 169 |
-
else:
|
| 170 |
-
self.stable_count = max(0, self.stable_count - 1)
|
| 171 |
-
|
| 172 |
-
# ASCEND: stress → jump to higher orbital
|
| 173 |
-
if phi > t_stress and self.rank < self.RANKS[-1]:
|
| 174 |
-
idx = self._rank_index()
|
| 175 |
-
new_idx = min(idx + 1, len(self.RANKS) - 1)
|
| 176 |
-
new_rank = self.RANKS[new_idx]
|
| 177 |
-
if new_rank != self.rank:
|
| 178 |
-
self.orbit_stack.append(new_rank - self.rank)
|
| 179 |
-
self.rank = new_rank
|
| 180 |
-
self.stable_count = 0
|
| 181 |
-
self._log(phi)
|
| 182 |
-
return self.rank
|
| 183 |
-
|
| 184 |
-
# DESCEND: confirmed stability → symmetric return
|
| 185 |
-
if self.stable_count >= self.stable_window and self.orbit_stack:
|
| 186 |
-
delta = self.orbit_stack.pop()
|
| 187 |
-
target = self.rank - delta
|
| 188 |
-
self.rank = min(self.RANKS, key=lambda r: abs(r - target))
|
| 189 |
-
self.rank = max(self.rank, self.RANKS[0])
|
| 190 |
-
self.stable_count = 0
|
| 191 |
-
self._log(phi)
|
| 192 |
-
return self.rank
|
| 193 |
-
|
| 194 |
-
# HOLD: neutral → don't move
|
| 195 |
-
self._log(phi)
|
| 196 |
-
return self.rank
|
| 197 |
-
|
| 198 |
-
# ── Introspection ───────────────────────────────
|
| 199 |
-
|
| 200 |
-
def _log(self, phi: float):
|
| 201 |
-
self.history["rank"].append(self.rank)
|
| 202 |
-
self.history["phi"].append(phi)
|
| 203 |
-
self.history["stable_count"].append(self.stable_count)
|
| 204 |
-
|
| 205 |
-
def get_state(self) -> Dict:
|
| 206 |
-
"""Current controller state."""
|
| 207 |
-
return {
|
| 208 |
-
"rank": self.rank,
|
| 209 |
-
"step": self.step_count,
|
| 210 |
-
"orbit_stack": list(self.orbit_stack),
|
| 211 |
-
"stable_count": self.stable_count,
|
| 212 |
-
"phi": self.phi_hist[-1] if self.phi_hist else 0.0,
|
| 213 |
-
}
|
| 214 |
-
|
| 215 |
-
def get_history(self) -> Dict[str, list]:
|
| 216 |
-
"""Complete training history."""
|
| 217 |
-
return self.history
|
| 218 |
-
|
| 219 |
-
def __repr__(self) -> str:
|
| 220 |
-
return (
|
| 221 |
-
f"OrbitalController(step={self.step_count}, rank={self.rank}, "
|
| 222 |
-
f"stack={self.orbit_stack}, stable={self.stable_count})"
|
| 223 |
-
)
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
# ============================================================
|
| 227 |
-
# CONVENIENCE: setup helper
|
| 228 |
-
# ============================================================
|
| 229 |
-
|
| 230 |
-
def setup_unified_lora(model, max_rank=16, ranks=None, warmup=10, stable_window=6):
|
| 231 |
-
"""
|
| 232 |
-
One-call setup: inject NestedLoRA + create OrbitalController.
|
| 233 |
-
|
| 234 |
-
Args:
|
| 235 |
-
model: PyTorch model
|
| 236 |
-
max_rank: Maximum LoRA rank
|
| 237 |
-
ranks: Available rank levels
|
| 238 |
-
warmup: Controller warmup steps
|
| 239 |
-
stable_window: Steps of stability before descent
|
| 240 |
-
|
| 241 |
-
Returns:
|
| 242 |
-
(model, controller) tuple
|
| 243 |
-
|
| 244 |
-
Example:
|
| 245 |
-
>>> from orbital_controller import setup_unified_lora
|
| 246 |
-
>>> from nested_lora import set_rank
|
| 247 |
-
>>>
|
| 248 |
-
>>> model, ctrl = setup_unified_lora(model)
|
| 249 |
-
>>> for step, batch in enumerate(loader):
|
| 250 |
-
... loss = model(**batch).loss
|
| 251 |
-
... set_rank(model, ctrl.step(loss.item()))
|
| 252 |
-
... loss.backward(); optimizer.step(); optimizer.zero_grad()
|
| 253 |
-
"""
|
| 254 |
-
from nested_lora import inject_nested_lora
|
| 255 |
-
|
| 256 |
-
model = inject_nested_lora(model, max_rank)
|
| 257 |
-
controller = OrbitalController(
|
| 258 |
-
ranks=ranks or [4, 8, 16],
|
| 259 |
-
warmup=warmup,
|
| 260 |
-
stable_window=stable_window,
|
| 261 |
-
)
|
| 262 |
-
return model, controller
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
# ============================================================
|
| 266 |
-
# DEMO
|
| 267 |
-
# ============================================================
|
| 268 |
-
|
| 269 |
-
if __name__ == "__main__":
|
| 270 |
-
print("Orbital Controller — Demo")
|
| 271 |
-
print("=" * 50)
|
| 272 |
-
print("Simulating: 30 stable → 10 shock → 30 recovery\n")
|
| 273 |
-
|
| 274 |
-
ctrl = OrbitalController(warmup=8, stable_window=5)
|
| 275 |
-
|
| 276 |
-
for step in range(70):
|
| 277 |
-
if step < 30:
|
| 278 |
-
loss = np.random.uniform(0.4, 0.6)
|
| 279 |
-
elif step < 40:
|
| 280 |
-
loss = np.random.uniform(1.5, 3.0)
|
| 281 |
-
else:
|
| 282 |
-
loss = np.random.uniform(0.3, 0.5)
|
| 283 |
-
|
| 284 |
-
rank = ctrl.step(loss)
|
| 285 |
-
|
| 286 |
-
if step % 5 == 0 or step == 30:
|
| 287 |
-
s = ctrl.get_state()
|
| 288 |
-
tag = " <<<SHOCK" if step == 30 else ""
|
| 289 |
-
print(f" [{step:3d}] rank={rank:2d} phi={s['phi']:.3f} stack={s['orbit_stack']}{tag}")
|
| 290 |
-
|
| 291 |
-
print(f"\nFinal: {ctrl}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
DELETED
|
@@ -1,6 +0,0 @@
|
|
| 1 |
-
torch
|
| 2 |
-
transformers
|
| 3 |
-
datasets
|
| 4 |
-
evaluate
|
| 5 |
-
accelerate
|
| 6 |
-
scikit-learn
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
unified_lora.py
DELETED
|
@@ -1,14 +0,0 @@
|
|
| 1 |
-
"""
|
| 2 |
-
Legacy Adaptive LoRA (Deprecated)
|
| 3 |
-
================================
|
| 4 |
-
|
| 5 |
-
Early gradient-based adaptive rank prototype.
|
| 6 |
-
|
| 7 |
-
Replaced by:
|
| 8 |
-
- NestedLoRA (shared orbital architecture)
|
| 9 |
-
- OrbitalController (stress-based closed-loop control)
|
| 10 |
-
|
| 11 |
-
This file is kept for reference only.
|
| 12 |
-
|
| 13 |
-
Status: deprecated
|
| 14 |
-
"""
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|