https://github.com/Sva76/Unified-LoRa

#2
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.gitignore DELETED
@@ -1,56 +0,0 @@
1
- .# ── PYTHON ─────────────────────────────────────────
2
- __pycache__/
3
- *.py[cod]
4
- *.so
5
-
6
- # ── ENV ────────────────────────────────────────────
7
- .env
8
- .venv/
9
- venv/
10
- env/
11
-
12
- # ── BUILD / DIST ───────────────────────────────────
13
- build/
14
- dist/
15
- *.egg-info/
16
-
17
- # ── NOTEBOOK ───────────────────────────────────────
18
- .ipynb_checkpoints
19
-
20
- # ── CACHE / TEST ───────────────────────────────────
21
- .pytest_cache/
22
- .mypy_cache/
23
- .coverage*
24
- htmlcov/
25
-
26
- # ── LOGS ───────────────────────────────────────────
27
- *.log
28
-
29
- # ── MODELS / CHECKPOINTS (CRITICO) ─────────────────
30
- *.pt
31
- *.bin
32
- *.ckpt
33
- *.safetensors
34
-
35
- # ── DATASETS ───────────────────────────────────────
36
- data/
37
- datasets/
38
- *.parquet
39
- *.csv
40
-
41
- # ── HF CACHE ───────────────────────────────────────
42
- .cache/
43
- huggingface/
44
- hf_cache/
45
-
46
- # ── EXPERIMENT OUTPUT ──────────────────────────────
47
- outputs/
48
- runs/
49
- wandb/
50
-
51
- # ── SYSTEM ─────────────────────────────────────────
52
- .DS_Store
53
-
54
- # ── EDITOR ─────────────────────────────────────────
55
- .vscode/
56
- .idea/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
LICENSE DELETED
@@ -1,201 +0,0 @@
1
- Apache License
2
- Version 2.0, January 2004
3
- http://www.apache.org/licenses/
4
-
5
- TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
-
7
- 1. Definitions.
8
-
9
- "License" shall mean the terms and conditions for use, reproduction,
10
- and distribution as defined by Sections 1 through 9 of this document.
11
-
12
- "Licensor" shall mean the copyright owner or entity authorized by
13
- the copyright owner that is granting the License.
14
-
15
- "Legal Entity" shall mean the union of the acting entity and all
16
- other entities that control, are controlled by, or are under common
17
- control with that entity. For the purposes of this definition,
18
- "control" means (i) the power, direct or indirect, to cause the
19
- direction or management of such entity, whether by contract or
20
- otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
- outstanding shares, or (iii) beneficial ownership of such entity.
22
-
23
- "You" (or "Your") shall mean an individual or Legal Entity
24
- exercising permissions granted by this License.
25
-
26
- "Source" form shall mean the preferred form for making modifications,
27
- including but not limited to software source code, documentation
28
- source, and configuration files.
29
-
30
- "Object" form shall mean any form resulting from mechanical
31
- transformation or translation of a Source form, including but
32
- not limited to compiled object code, generated documentation,
33
- and conversions to other media types.
34
-
35
- "Work" shall mean the work of authorship, whether in Source or
36
- Object form, made available under the License, as indicated by a
37
- copyright notice that is included in or attached to the work
38
- (an example is provided in the Appendix below).
39
-
40
- "Derivative Works" shall mean any work, whether in Source or Object
41
- form, that is based on (or derived from) the Work and for which the
42
- editorial revisions, annotations, elaborations, or other modifications
43
- represent, as a whole, an original work of authorship. For the purposes
44
- of this License, Derivative Works shall not include works that remain
45
- separable from, or merely link (or bind by name) to the interfaces of,
46
- the Work and Derivative Works thereof.
47
-
48
- "Contribution" shall mean any work of authorship, including
49
- the original version of the Work and any modifications or additions
50
- to that Work or Derivative Works thereof, that is intentionally
51
- submitted to Licensor for inclusion in the Work by the copyright owner
52
- or by an individual or Legal Entity authorized to submit on behalf of
53
- the copyright owner. For the purposes of this definition, "submitted"
54
- means any form of electronic, verbal, or written communication sent
55
- to the Licensor or its representatives, including but not limited to
56
- communication on electronic mailing lists, source code control systems,
57
- and issue tracking systems that are managed by, or on behalf of, the
58
- Licensor for the purpose of discussing and improving the Work, but
59
- excluding communication that is conspicuously marked or otherwise
60
- designated in writing by the copyright owner as "Not a Contribution."
61
-
62
- "Contributor" shall mean Licensor and any individual or Legal Entity
63
- on behalf of whom a Contribution has been received by Licensor and
64
- subsequently incorporated within the Work.
65
-
66
- 2. Grant of Copyright License. Subject to the terms and conditions of
67
- this License, each Contributor hereby grants to You a perpetual,
68
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
- copyright license to reproduce, prepare Derivative Works of,
70
- publicly display, publicly perform, sublicense, and distribute the
71
- Work and such Derivative Works in Source or Object form.
72
-
73
- 3. Grant of Patent License. Subject to the terms and conditions of
74
- this License, each Contributor hereby grants to You a perpetual,
75
- worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
- (except as stated in this section) patent license to make, have made,
77
- use, offer to sell, sell, import, and otherwise transfer the Work,
78
- where such license applies only to those patent claims licensable
79
- by such Contributor that are necessarily infringed by their
80
- Contribution(s) alone or by combination of their Contribution(s)
81
- with the Work to which such Contribution(s) was submitted. If You
82
- institute patent litigation against any entity (including a
83
- cross-claim or counterclaim in a lawsuit) alleging that the Work
84
- or a Contribution incorporated within the Work constitutes direct
85
- or contributory patent infringement, then any patent licenses
86
- granted to You under this License for that Work shall terminate
87
- as of the date such litigation is filed.
88
-
89
- 4. Redistribution. You may reproduce and distribute copies of the
90
- Work or Derivative Works thereof in any medium, with or without
91
- modifications, and in Source or Object form, provided that You
92
- meet the following conditions:
93
-
94
- (a) You must give any other recipients of the Work or
95
- Derivative Works a copy of this License; and
96
-
97
- (b) You must cause any modified files to carry prominent notices
98
- stating that You changed the files; and
99
-
100
- (c) You must retain, in the Source form of any Derivative Works
101
- that You distribute, all copyright, patent, trademark, and
102
- attribution notices from the Source form of the Work,
103
- excluding those notices that do not pertain to any part of
104
- the Derivative Works; and
105
-
106
- (d) If the Work includes a "NOTICE" text file as part of its
107
- distribution, then any Derivative Works that You distribute must
108
- include a readable copy of the attribution notices contained
109
- within such NOTICE file, excluding those notices that do not
110
- pertain to any part of the Derivative Works, in at least one
111
- of the following places: within a NOTICE text file distributed
112
- as part of the Derivative Works; within the Source form or
113
- documentation, if provided along with the Derivative Works; or,
114
- within a display generated by the Derivative Works, if and
115
- wherever such third-party notices normally appear. The contents
116
- of the NOTICE file are for informational purposes only and
117
- do not modify the License. You may add Your own attribution
118
- notices within Derivative Works that You distribute, alongside
119
- or as an addendum to the NOTICE text from the Work, provided
120
- that such additional attribution notices cannot be construed
121
- as modifying the License.
122
-
123
- You may add Your own copyright statement to Your modifications and
124
- may provide additional or different license terms and conditions
125
- for use, reproduction, or distribution of Your modifications, or
126
- for any such Derivative Works as a whole, provided Your use,
127
- reproduction, and distribution of the Work otherwise complies with
128
- the conditions stated in this License.
129
-
130
- 5. Submission of Contributions. Unless You explicitly state otherwise,
131
- any Contribution intentionally submitted for inclusion in the Work
132
- by You to the Licensor shall be under the terms and conditions of
133
- this License, without any additional terms or conditions.
134
- Notwithstanding the above, nothing herein shall supersede or modify
135
- the terms of any separate license agreement you may have executed
136
- with Licensor regarding such Contributions.
137
-
138
- 6. Trademarks. This License does not grant permission to use the trade
139
- names, trademarks, service marks, or product names of the Licensor,
140
- except as required for reasonable and customary use in describing the
141
- origin of the Work and reproducing the content of the NOTICE file.
142
-
143
- 7. Disclaimer of Warranty. Unless required by applicable law or
144
- agreed to in writing, Licensor provides the Work (and each
145
- Contributor provides its Contributions) on an "AS IS" BASIS,
146
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
- implied, including, without limitation, any warranties or conditions
148
- of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
- PARTICULAR PURPOSE. You are solely responsible for determining the
150
- appropriateness of using or redistributing the Work and assume any
151
- risks associated with Your exercise of permissions under this License.
152
-
153
- 8. Limitation of Liability. In no event and under no legal theory,
154
- whether in tort (including negligence), contract, or otherwise,
155
- unless required by applicable law (such as deliberate and grossly
156
- negligent acts) or agreed to in writing, shall any Contributor be
157
- liable to You for damages, including any direct, indirect, special,
158
- incidental, or consequential damages of any character arising as a
159
- result of this License or out of the use or inability to use the
160
- Work (including but not limited to damages for loss of goodwill,
161
- work stoppage, computer failure or malfunction, or any and all
162
- other commercial damages or losses), even if such Contributor
163
- has been advised of the possibility of such damages.
164
-
165
- 9. Accepting Warranty or Additional Liability. While redistributing
166
- the Work or Derivative Works thereof, You may choose to offer,
167
- and charge a fee for, acceptance of support, warranty, indemnity,
168
- or other liability obligations and/or rights consistent with this
169
- License. However, in accepting such obligations, You may act only
170
- on Your own behalf and on Your sole responsibility, not on behalf
171
- of any other Contributor, and only if You agree to indemnify,
172
- defend, and hold each Contributor harmless for any liability
173
- incurred by, or claims asserted against, such Contributor by reason
174
- of your accepting any such warranty or additional liability.
175
-
176
- END OF TERMS AND CONDITIONS
177
-
178
- APPENDIX: How to apply the Apache License to your work.
179
-
180
- To apply the Apache License to your work, attach the following
181
- boilerplate notice, with the fields enclosed by brackets "[]"
182
- replaced with your own identifying information. (Don't include
183
- the brackets!) The text should be enclosed in the appropriate
184
- comment syntax for the file format. We also recommend that a
185
- file or class name and description of purpose be included on the
186
- same "printed page" as the copyright notice for easier
187
- identification within third-party archives.
188
-
189
- Copyright [yyyy] [name of copyright owner]
190
-
191
- Licensed under the Apache License, Version 2.0 (the "License");
192
- you may not use this file except in compliance with the License.
193
- You may obtain a copy of the License at
194
-
195
- http://www.apache.org/licenses/LICENSE-2.0
196
-
197
- Unless required by applicable law or agreed to in writing, software
198
- distributed under the License is distributed on an "AS IS" BASIS,
199
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
- See the License for the specific language governing permissions and
201
- limitations under the License.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -1,76 +1,3 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - lora
5
- - fine-tuning
6
- - adaptive
7
- - research
8
- - nested-lora
9
- - synaptic-plasticity
10
- - rank-adaptation
11
- library_name: transformers
12
- datasets:
13
- - nyu-mll/glue
14
- pipeline_tag: text-classification
15
- ---
16
-
17
- # Unified-LoRA
18
-
19
- **LoRA fine-tuning with synaptic plasticity: a neurobiologically-inspired controller that switches between qualitatively different operational modes based on training stress.**
20
-
21
- ⚠️ **This is NOT a pretrained model.** Unified-LoRA is a training method/controller.
22
-
23
- 👉 **Code**: [github.com/Sva76/Unified-LoRa](https://github.com/Sva76/Unified-LoRa)
24
- 👉 **Demo**: [unified_lora_demo.ipynb](https://github.com/Sva76/Unified-LoRa/blob/main/notebooks/unified_lora_demo.ipynb)
25
-
26
- ## What It Does
27
-
28
- A composite synaptic stress signal **φ(t) = f(Convergence, Entropy, Stress)** drives a 3-state FSM:
29
-
30
- | Mode | φ range | Rank | Behavior |
31
- |------|---------|------|----------|
32
- | SINGLE | φ < 0.3 | r=4 | Efficient cruise |
33
- | MULTI | 0.3 ≤ φ < 0.7 | r=8 | Active learning |
34
- | MIRROR | φ ≥ 0.7 | r=16 | Max capacity + weight snapshot for rollback |
35
-
36
- Rank transitions use **nested matrix slicing** (r4 ⊂ r8 ⊂ r16) — zero cold-start, zero re-allocation.
37
-
38
- Mirror mode saves a weight snapshot on entry. On exit, if weights drifted <5% (transient noise), the snapshot is restored. If drift was significant (real signal), the new weights are kept.
39
-
40
- ## Results
41
-
42
- **GLUE (DistilBERT):** 3/4 tasks equal or better with 33–56% rank reduction.
43
-
44
- **Noise resilience:** +31 F1 at 50% label noise, 9× lower variance. No benefit on clean data. Confirmed at 67M–3B.
45
-
46
- **Stress-recovery cycle (Tinker/Llama-3.2-1B):** φ returns to pre-shock baseline (0.33 → 0.83 → 0.33), demonstrating fully reversible stress handling.
47
-
48
- ## Quick Start
49
-
50
- ```python
51
- from controller import setup_unified_lora
52
-
53
- adapters, ctrl = setup_unified_lora(model, target_modules=["q_proj", "v_proj"])
54
-
55
- for batch in dataloader:
56
- loss = model(**batch).loss
57
- loss.backward()
58
- ctrl.step(loss=loss.item()) # φ(t) needs the loss for convergence signal
59
- optimizer.step()
60
- optimizer.zero_grad()
61
- ```
62
-
63
- ## Citation
64
-
65
- ```bibtex
66
- @software{unified_lora_2025,
67
- author = {Simona Vargiu},
68
- title = {Unified-LoRA: Synaptic Plasticity Controller for Adaptive LoRA Fine-Tuning},
69
- year = {2025},
70
- url = {https://github.com/Sva76/Unified-LoRa}
71
- }
72
- ```
73
-
74
- ## Contact
75
-
76
- Simona Vargiu (Independent Researcher) — simona.vargiu.malta@gmail.com
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
controller.py DELETED
@@ -1,41 +0,0 @@
1
- """
2
- Unified-LoRA Controller
3
- ======================
4
-
5
- Convenience wrapper that exposes the full Unified-LoRA stack:
6
-
7
- - nested_lora.py → execution engine (LoRA with dynamic rank slicing)
8
- - orbital_controller.py → control logic (stress-driven rank adaptation)
9
-
10
- Use this module for simple integration, or import submodules directly
11
- for fine-grained control.
12
-
13
- Author: Simona Vargiu
14
- License: Apache 2.0
15
- """
16
-
17
- # ── ENGINE ──────────────────────────────────────────
18
- from nested_lora import (
19
- NestedLoRALinear,
20
- inject_nested_lora,
21
- set_rank,
22
- get_lora_params,
23
- count_params,
24
- )
25
-
26
- # ── CONTROLLER ──────────────────────────────────────
27
- from orbital_controller import (
28
- OrbitalController,
29
- setup_unified_lora,
30
- )
31
-
32
- # ── EXPORT ──────────────────────────────────────────
33
- __all__ = [
34
- "NestedLoRALinear",
35
- "inject_nested_lora",
36
- "set_rank",
37
- "get_lora_params",
38
- "count_params",
39
- "OrbitalController",
40
- "setup_unified_lora",
41
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/architecture.md DELETED
@@ -1,171 +0,0 @@
1
- Architecture — Nested Orbital LoRA
2
-
3
-
4
- Core idea: dynamic rank control via stress-driven orbital transitions with weight persistence (no cold start).
5
-
6
-
7
-
8
- Problem: cold start on rank transitions
9
-
10
-
11
- Standard multi-rank LoRA keeps separate adapters per rank:
12
-
13
-
14
- r=4, r=8, r=16 → independent weights
15
-
16
-
17
- Switching rank causes partial cold restarts → performance drop.
18
-
19
-
20
-
21
- Solution: Nested LoRA (one adapter, multiple ranks)
22
-
23
-
24
- Single adapter at max rank:
25
-
26
-
27
- A(16, d), B(d, 16)
28
-
29
-
30
- Active rank is obtained by slicing:
31
-
32
-
33
-
34
-
35
- r=4 → A[:4, :], B[:, :4]
36
-
37
-
38
- r=8 → A[:8, :], B[:, :8]
39
-
40
-
41
- r=16 → full matrix
42
-
43
-
44
-
45
-
46
- r4 ⊂ r8 ⊂ r16
47
-
48
-
49
-
50
- Lower ranks reuse trained weights → no cold start.
51
-
52
-
53
-
54
- Scaling
55
-
56
-
57
- To keep output magnitude consistent:
58
-
59
-
60
- scale = max_rank / max(r, 1)
61
- scale = min(scale, 4.0) # optional clamp
62
-
63
-
64
-
65
-
66
- Orbital Controller (no thresholds)
67
-
68
-
69
- Dynamic trajectory instead of static FSM:
70
-
71
-
72
-
73
-
74
- Ascend → stress detected → increase rank
75
-
76
-
77
- Hold → oscillation → stay
78
-
79
-
80
- Descend → stable → decrease rank
81
-
82
-
83
-
84
-
85
- Uses a stack to ensure symmetric return.
86
-
87
-
88
-
89
- Stress signal
90
-
91
-
92
- φ(t) = |loss - EMA(loss)| + 2.0 × max(0, loss - prev_loss)
93
-
94
-
95
- Auto-calibrated thresholds:
96
-
97
-
98
- t_stress = μ + 0.7σ
99
-
100
- t_stable = max(μ - 0.3σ, 0)
101
-
102
-
103
- Robust stats can be used to reduce noise.
104
-
105
-
106
-
107
- Why it matters
108
-
109
-
110
-
111
-
112
- avoids cold starts across rank changes
113
-
114
-
115
- adapts capacity in real-time
116
-
117
-
118
- works in black-box settings
119
-
120
-
121
- O(1) overhead
122
-
123
-
124
-
125
-
126
-
127
- Comparison
128
-
129
-
130
-
131
-
132
- Property
133
- Standard LoRA
134
- AdaLoRA
135
- Orbital LoRA
136
-
137
-
138
-
139
-
140
- Rank control
141
- Fixed
142
- SVD
143
- Stress
144
-
145
-
146
- Control type
147
- None
148
- Open
149
- Closed-loop
150
-
151
-
152
- Transition cost
153
- N/A
154
- High
155
- O(1)
156
-
157
-
158
- Architecture
159
- Single
160
- Pruned
161
- Nested
162
-
163
-
164
- Black-box
165
- Yes
166
- No
167
- Yes
168
-
169
-
170
-
171
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/experimental_results.md DELETED
@@ -1,239 +0,0 @@
1
- Experimental Results
2
-
3
-
4
- Core result: parity with baseline performance with ~15% rank reduction and dynamic shock response.
5
-
6
-
7
-
8
- 1. Stress Test — Task Switch
9
-
10
-
11
- Setup
12
-
13
-
14
-
15
-
16
- Model: DistilBERT-base-uncased + NestedLoRALinear (max_rank=16)
17
-
18
-
19
- Protocol: MRPC x 60 steps → SST-2 x 60 steps (shock at step 60)
20
-
21
-
22
- Seeds: 0, 1, 2
23
-
24
-
25
- Baseline: same architecture, fixed rank=16
26
-
27
-
28
- Hardware: Colab T4
29
-
30
-
31
-
32
-
33
- Results
34
-
35
-
36
-
37
-
38
-
39
- Baseline (r=16)
40
- Orbital LoRA
41
-
42
-
43
-
44
-
45
- SST-2 Accuracy
46
- 0.736
47
- 0.740
48
-
49
-
50
- MRPC F1 (retention)
51
- 0.526
52
- 0.515
53
-
54
-
55
- Effective rank
56
- 16.0
57
- 13.6
58
-
59
-
60
-
61
-
62
- Parity with ~15% rank saving
63
-
64
-
65
- Behavior
66
-
67
-
68
- Post-shock:
69
-
70
-
71
-
72
-
73
- detect → descend (r16 → r4)
74
-
75
-
76
- stabilize
77
-
78
-
79
- re-ascend (r4 → r16)
80
-
81
-
82
-
83
-
84
- Baseline: no reaction (fixed r=16)
85
-
86
-
87
-
88
- 2. Stable Task — Parity
89
-
90
-
91
- Setup
92
-
93
-
94
-
95
-
96
- Task: MRPC only (120 steps)
97
-
98
-
99
- Seeds: 0, 1, 2
100
-
101
-
102
- Baseline: fixed r=16
103
-
104
-
105
-
106
-
107
- Results
108
-
109
-
110
-
111
-
112
- Seed
113
- Baseline F1
114
- Orbital F1
115
-
116
-
117
-
118
-
119
- 0
120
- 0.806
121
- 0.808
122
-
123
-
124
- 1
125
- 0.822
126
- 0.826
127
-
128
-
129
- 2
130
- 0.824
131
- 0.824
132
-
133
-
134
- Mean
135
- 0.818
136
- 0.820
137
-
138
-
139
-
140
-
141
- No degradation on stable training
142
-
143
-
144
-
145
- 3. Rank Dynamics (Black-box — Tinker)
146
-
147
-
148
- Methods
149
-
150
-
151
-
152
-
153
- Method
154
- Control
155
-
156
-
157
-
158
-
159
- Standard LoRA
160
- Fixed rank
161
-
162
-
163
- AdaLoRA-like
164
- Open-loop
165
-
166
-
167
- Orbital LoRA
168
- Closed-loop
169
-
170
-
171
-
172
-
173
- Disturbance response
174
-
175
-
176
-
177
-
178
- Method
179
- Reaction
180
- Stability
181
- Recovery
182
-
183
-
184
-
185
-
186
- Standard
187
- None
188
- Passive
189
-
190
-
191
-
192
- AdaLoRA-like
193
- Indirect
194
- Partial
195
- Limited
196
-
197
-
198
- Orbital LoRA
199
- Immediate
200
- Stable
201
- Immediate
202
-
203
-
204
-
205
-
206
-
207
- 4. Architecture Insight
208
-
209
-
210
- Root cause: cold start from separate adapters.
211
-
212
-
213
- Fix: nested slicing → no cold start → parity restored.
214
-
215
-
216
-
217
- 5. Black-box compatibility
218
-
219
-
220
- Uses only loss signal.
221
-
222
- No gradients required.
223
-
224
- O(1) overhead.
225
-
226
-
227
-
228
- Next
229
-
230
-
231
-
232
-
233
- 7B+ validation (ongoing)
234
-
235
-
236
- LR controller integration
237
-
238
-
239
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
experiments/stable_task_test.py DELETED
@@ -1,226 +0,0 @@
1
- """
2
- Orbital LoRA — Stable Task Parity Test
3
-
4
-
5
- MRPC only, 120 steps, 3 seeds.
6
- Validates that the controller causes zero degradation on stable training.
7
-
8
-
9
- Usage:
10
- pip install transformers datasets evaluate
11
- python stable_task_test.py
12
- """
13
-
14
-
15
- import time, random, math, numpy as np, torch, torch.nn as nn
16
- import torch.nn.functional as F, evaluate
17
- from datasets import load_dataset
18
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
19
- from torch.utils.data import DataLoader
20
-
21
-
22
- import sys, os
23
- sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(file))))
24
-
25
-
26
- from nested_lora import NestedLoRALinear, inject_nested_lora
27
- from orbital_controller import OrbitalController
28
- from controller import set_rank
29
-
30
-
31
- ── CONFIG ──────────────────────────────────────────
32
-
33
-
34
- DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
35
- MODEL = "distilbert-base-uncased"
36
- BATCH = 8
37
- STEPS = 120
38
- LR = 5e-5
39
- SEEDS = [0, 1, 2]
40
-
41
-
42
- MAX_RANK = 16
43
- WARMUP = 15
44
- STABLE_WINDOW = 8
45
-
46
-
47
- ── DATA ────────────────────────────────────────────
48
-
49
-
50
- print("Loading data...")
51
- tok = AutoTokenizer.from_pretrained(MODEL)
52
- ds = load_dataset("glue", "mrpc")
53
-
54
-
55
- def tok_fn(x):
56
- return tok(x["sentence1"], x["sentence2"],
57
- truncation=True, padding="max_length", max_length=128)
58
-
59
-
60
- ds = ds.map(tok_fn, batched=True)
61
- ds.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])
62
- train_loader = DataLoader(ds["train"], batch_size=BATCH, shuffle=True)
63
- val_loader = DataLoader(ds["validation"], batch_size=BATCH)
64
- metric = evaluate.load("glue", "mrpc")
65
-
66
-
67
- ── HELPERS ─────────────────────────────────────────
68
-
69
-
70
- def build_model():
71
- base = AutoModelForSequenceClassification.from_pretrained(
72
- MODEL, num_labels=2, ignore_mismatched_sizes=True
73
- )
74
- return inject_nested_lora(base, MAX_RANK).to(DEVICE)
75
-
76
-
77
- def eval_model(model):
78
- model.eval()
79
- preds, labels = [], []
80
- with torch.no_grad():
81
- for batch in val_loader:
82
- x = batch["input_ids"].to(DEVICE)
83
- m = batch["attention_mask"].to(DEVICE)
84
- y = batch["label"].to(DEVICE)
85
- logits = model(input_ids=x, attention_mask=m).logits
86
- preds.extend(logits.argmax(dim=-1).cpu().numpy())
87
- labels.extend(y.cpu().numpy())
88
- return metric.compute(predictions=preds, references=labels)["f1"]
89
-
90
-
91
- def eff_rank(usage):
92
- tot = sum(usage.values())
93
- return sum(k * v for k, v in usage.items()) / tot if tot > 0 else 0
94
-
95
-
96
- ── TRAIN BASELINE ──────────────────────────────────
97
-
98
-
99
- def train_baseline(model):
100
- opt = torch.optim.AdamW(model.parameters(), lr=LR)
101
- set_rank(model, 16)
102
- it = iter(train_loader)
103
-
104
-
105
- for step in range(STEPS):
106
- try:
107
- batch = next(it)
108
- except StopIteration:
109
- it = iter(train_loader); batch = next(it)
110
-
111
- x = batch["input_ids"].to(DEVICE)
112
- m = batch["attention_mask"].to(DEVICE)
113
- y = batch["label"].to(DEVICE)
114
-
115
- loss = model(input_ids=x, attention_mask=m, labels=y).loss
116
- loss.backward()
117
- torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
118
- opt.step()
119
- opt.zero_grad()
120
-
121
- return model
122
-
123
-
124
-
125
- ── TRAIN ORBITAL ───────────────────────────────────
126
-
127
-
128
- def train_orbital(model):
129
- ctrl = OrbitalController(warmup=WARMUP, stable_window=STABLE_WINDOW)
130
- opt = torch.optim.AdamW(model.parameters(), lr=LR)
131
- usage = {4: 0, 8: 0, 16: 0}
132
- rank_trace = []
133
- it = iter(train_loader)
134
-
135
-
136
- for step in range(STEPS):
137
- try:
138
- batch = next(it)
139
- except StopIteration:
140
- it = iter(train_loader); batch = next(it)
141
-
142
- x = batch["input_ids"].to(DEVICE)
143
- m = batch["attention_mask"].to(DEVICE)
144
- y = batch["label"].to(DEVICE)
145
-
146
- loss = model(input_ids=x, attention_mask=m, labels=y).loss
147
- loss.backward()
148
-
149
- new_rank = ctrl.step(loss.item())
150
- new_rank = max(4, min(16, new_rank))
151
- set_rank(model, new_rank)
152
-
153
- usage[new_rank] += 1
154
- rank_trace.append(new_rank)
155
-
156
- torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
157
- opt.step()
158
- opt.zero_grad()
159
-
160
- return model, usage, rank_trace, ctrl
161
-
162
-
163
-
164
- ── RUN ─────────────────────────────────────────────
165
-
166
-
167
- print(f"\nDevice: {DEVICE}")
168
- print(f"Task: MRPC, {STEPS} steps")
169
- print("=" * 55)
170
-
171
-
172
- results = []
173
-
174
-
175
- for seed in SEEDS:
176
- print(f"\n{'─' * 50}\n SEED {seed}\n{'─' * 50}")
177
-
178
-
179
- torch.manual_seed(seed)
180
- torch.cuda.manual_seed_all(seed)
181
- np.random.seed(seed)
182
- random.seed(seed)
183
-
184
- base_model = build_model()
185
- base_model = train_baseline(base_model)
186
- f1_base = eval_model(base_model)
187
- del base_model; torch.cuda.empty_cache()
188
-
189
- torch.manual_seed(seed)
190
- torch.cuda.manual_seed_all(seed)
191
- np.random.seed(seed)
192
- random.seed(seed)
193
-
194
- uni_model = build_model()
195
- uni_model, usage, trace, ctrl = train_orbital(uni_model)
196
- f1_uni = eval_model(uni_model)
197
-
198
- er = eff_rank(usage)
199
- saving = 1 - er / 16
200
- transitions = sum(1 for i in range(1, len(trace)) if trace[i] != trace[i-1])
201
-
202
- print(f"\n BASELINE F1 = {f1_base:.3f} (rank=16 fixed)")
203
- print(f" ORBITAL F1 = {f1_uni:.3f} (eff_rank={er:.1f}, saving={saving*100:.0f}%)")
204
- print(f" delta F1 = {f1_uni - f1_base:+.3f}")
205
- print(f" Usage: r4={usage[4]} r8={usage[8]} r16={usage[16]} transitions={transitions}")
206
-
207
- results.append({
208
- 'seed': seed, 'f1_base': f1_base, 'f1_uni': f1_uni,
209
- 'delta': f1_uni - f1_base, 'eff_rank': er,
210
- })
211
- del uni_model; torch.cuda.empty_cache()
212
-
213
-
214
-
215
- ── SUMMARY ─────────────────────────────────────────
216
-
217
-
218
- print(f"\n{'=' * 55}\n SUMMARY\n{'=' * 55}")
219
- f1b = [r['f1_base'] for r in results]
220
- f1u = [r['f1_uni'] for r in results]
221
-
222
-
223
- print(f"\n Baseline F1: {np.mean(f1b):.3f} +/- {np.std(f1b):.3f}")
224
- print(f" Orbital F1: {np.mean(f1u):.3f} +/- {np.std(f1u):.3f}")
225
- print(f" delta F1: {np.mean([r['delta'] for r in results]):+.3f}")
226
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
experiments/stress_test_task_switch.py DELETED
@@ -1,214 +0,0 @@
1
- """
2
- Orbital LoRA — Stress Test: Task Switch
3
-
4
- MRPC (60 steps) → SST-2 (60 steps)
5
- Baseline (r=16 fixed) vs Orbital Controller
6
- """
7
-
8
- import time, random, math, numpy as np, torch, torch.nn as nn
9
- import torch.nn.functional as F, evaluate
10
- from datasets import load_dataset
11
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
12
- from torch.utils.data import DataLoader
13
-
14
- import sys, os
15
- sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(file))))
16
-
17
- from nested_lora import NestedLoRALinear, inject_nested_lora
18
- from orbital_controller import OrbitalController
19
- from controller import set_rank
20
-
21
- ── CONFIG ──────────────────────────────────────────
22
-
23
- DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
24
- MODEL = "distilbert-base-uncased"
25
- BATCH = 8
26
- LR = 5e-5
27
- SEEDS = [0, 1, 2]
28
-
29
- MAX_RANK = 16
30
- WARMUP = 10
31
- STABLE_WINDOW = 6
32
-
33
- STEPS_TASK1 = 60
34
- STEPS_TASK2 = 60
35
- TOTAL_STEPS = STEPS_TASK1 + STEPS_TASK2
36
-
37
- ── DATA ────────────────────────────────────────────
38
-
39
- print("Loading data...")
40
- tok = AutoTokenizer.from_pretrained(MODEL)
41
-
42
- ds_mrpc = load_dataset("glue", "mrpc")
43
- def tok_mrpc(x):
44
- return tok(x["sentence1"], x["sentence2"],
45
- truncation=True, padding="max_length", max_length=128)
46
- ds_mrpc = ds_mrpc.map(tok_mrpc, batched=True)
47
- ds_mrpc.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])
48
- train_mrpc = DataLoader(ds_mrpc["train"], batch_size=BATCH, shuffle=True)
49
- val_mrpc = DataLoader(ds_mrpc["validation"], batch_size=BATCH)
50
-
51
- ds_sst2 = load_dataset("glue", "sst2")
52
- def tok_sst2(x):
53
- return tok(x["sentence"], truncation=True, padding="max_length", max_length=128)
54
- ds_sst2 = ds_sst2.map(tok_sst2, batched=True)
55
- ds_sst2.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])
56
- train_sst2 = DataLoader(ds_sst2["train"], batch_size=BATCH, shuffle=True)
57
- val_sst2 = DataLoader(ds_sst2["validation"], batch_size=BATCH)
58
-
59
- metric_mrpc = evaluate.load("glue", "mrpc")
60
- metric_sst2 = evaluate.load("glue", "sst2")
61
-
62
- ── HELPERS ─────────────────────────────────────────
63
-
64
- def make_iter(loader):
65
- while True:
66
- for batch in loader:
67
- yield batch
68
-
69
- def get_batch(it):
70
- batch = next(it)
71
- return (batch["input_ids"].to(DEVICE),
72
- batch["attention_mask"].to(DEVICE),
73
- batch["label"].to(DEVICE))
74
-
75
- def build_model():
76
- base = AutoModelForSequenceClassification.from_pretrained(
77
- MODEL, num_labels=2, ignore_mismatched_sizes=True
78
- )
79
- return inject_nested_lora(base, MAX_RANK).to(DEVICE)
80
-
81
- def eval_f1(model, loader, metric_fn):
82
- model.eval()
83
- preds, labels = [], []
84
- with torch.no_grad():
85
- for batch in loader:
86
- x = batch["input_ids"].to(DEVICE)
87
- m = batch["attention_mask"].to(DEVICE)
88
- y = batch["label"].to(DEVICE)
89
- logits = model(input_ids=x, attention_mask=m).logits
90
- preds.extend(logits.argmax(dim=-1).cpu().numpy())
91
- labels.extend(y.cpu().numpy())
92
- model.train()
93
- result = metric_fn.compute(predictions=preds, references=labels)
94
- return result.get("f1", result.get("accuracy", 0.0))
95
-
96
- def eff_rank(usage):
97
- tot = sum(usage.values())
98
- return sum(k * v for k, v in usage.items()) / tot if tot > 0 else 0
99
-
100
- ── TRAIN BASELINE ──────────────────────────────────
101
-
102
- def train_baseline(model):
103
- opt = torch.optim.AdamW(model.parameters(), lr=LR)
104
- set_rank(model, 16)
105
- it_mrpc = make_iter(train_mrpc)
106
- it_sst2 = make_iter(train_sst2)
107
-
108
- for step in range(TOTAL_STEPS):
109
- x, m, y = get_batch(it_mrpc if step < STEPS_TASK1 else it_sst2)
110
-
111
- loss = model(input_ids=x, attention_mask=m, labels=y).loss
112
- loss.backward()
113
- torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
114
- opt.step()
115
- opt.zero_grad()
116
-
117
- return model
118
-
119
- ── TRAIN ORBITAL ───────────────────────────────────
120
-
121
- def train_orbital(model):
122
- ctrl = OrbitalController(warmup=WARMUP, stable_window=STABLE_WINDOW)
123
- ctrl.rank = 4
124
- set_rank(model, 4)
125
-
126
- opt = torch.optim.AdamW(model.parameters(), lr=LR)
127
- usage = {4: 0, 8: 0, 16: 0}
128
- rank_trace = []
129
- it_mrpc = make_iter(train_mrpc)
130
- it_sst2 = make_iter(train_sst2)
131
-
132
- for step in range(TOTAL_STEPS):
133
- x, m, y = get_batch(it_mrpc if step < STEPS_TASK1 else it_sst2)
134
-
135
- loss = model(input_ids=x, attention_mask=m, labels=y).loss
136
- loss.backward()
137
-
138
- new_rank = ctrl.step(loss.item())
139
- new_rank = max(4, min(16, new_rank))
140
- set_rank(model, new_rank)
141
-
142
- usage[new_rank] += 1
143
- rank_trace.append(new_rank)
144
-
145
- torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
146
- opt.step()
147
- opt.zero_grad()
148
-
149
- return model, usage, rank_trace
150
-
151
- ── RUN ─────────────────────────────────────────────
152
-
153
- print(f"\nDevice: {DEVICE}")
154
- print(f"Plan: MRPC × {STEPS_TASK1} → SST-2 × {STEPS_TASK2}")
155
- print(f"Shock at step {STEPS_TASK1}")
156
- print("=" * 55)
157
-
158
- results = []
159
-
160
- for seed in SEEDS:
161
- print(f"\n{'─' * 55}\n SEED {seed}\n{'─' * 55}")
162
-
163
- torch.manual_seed(seed)
164
- torch.cuda.manual_seed_all(seed)
165
- np.random.seed(seed)
166
- random.seed(seed)
167
-
168
- base_model = build_model()
169
- base_model = train_baseline(base_model)
170
- f1_mrpc_base = eval_f1(base_model, val_mrpc, metric_mrpc)
171
- f1_sst2_base = eval_f1(base_model, val_sst2, metric_sst2)
172
- del base_model; torch.cuda.empty_cache()
173
-
174
- torch.manual_seed(seed)
175
- torch.cuda.manual_seed_all(seed)
176
- np.random.seed(seed)
177
- random.seed(seed)
178
-
179
- uni_model = build_model()
180
- uni_model, usage, rank_trace = train_orbital(uni_model)
181
- f1_mrpc_uni = eval_f1(uni_model, val_mrpc, metric_mrpc)
182
- f1_sst2_uni = eval_f1(uni_model, val_sst2, metric_sst2)
183
-
184
- er = eff_rank(usage)
185
- saving = 1 - er / 16
186
- transitions = sum(1 for i in range(1, len(rank_trace)) if rank_trace[i] != rank_trace[i-1])
187
-
188
- print(f"\n {'':30s} {'BASELINE':>10s} {'ORBITAL':>10s}")
189
- print(f" {'─' * 55}")
190
- print(f" {'MRPC F1 (retention)':30s} {f1_mrpc_base:10.3f} {f1_mrpc_uni:10.3f}")
191
- print(f" {'SST-2 Acc (new task)':30s} {f1_sst2_base:10.3f} {f1_sst2_uni:10.3f}")
192
- print(f"\n Orbital: eff_rank={er:.1f} saving={saving*100:.0f}% transitions={transitions}")
193
-
194
- results.append({
195
- 'f1_mrpc_base': f1_mrpc_base, 'f1_sst2_base': f1_sst2_base,
196
- 'f1_mrpc_uni': f1_mrpc_uni, 'f1_sst2_uni': f1_sst2_uni,
197
- 'eff_rank': er, 'saving': saving
198
- })
199
- del uni_model; torch.cuda.empty_cache()
200
-
201
- ── SUMMARY ─────────────────────────────────────────
202
-
203
- print(f"\n{'=' * 55}\n SUMMARY\n{'=' * 55}")
204
- mrpc_b = np.mean([r['f1_mrpc_base'] for r in results])
205
- mrpc_u = np.mean([r['f1_mrpc_uni'] for r in results])
206
- sst2_b = np.mean([r['f1_sst2_base'] for r in results])
207
- sst2_u = np.mean([r['f1_sst2_uni'] for r in results])
208
- er_avg = np.mean([r['eff_rank'] for r in results])
209
- sv_avg = np.mean([r['saving'] for r in results])
210
-
211
- print(f"\n {'MRPC F1':20s} {mrpc_b:.3f} → {mrpc_u:.3f}")
212
- print(f" {'SST-2 Acc':20s} {sst2_b:.3f} → {sst2_u:.3f}")
213
- print(f" {'Eff rank':20s} 16.0 → {er_avg:.1f}")
214
- print(f" {'Saving':20s} 0% → {sv_avg*100:.0f}%")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
nested_lora.py DELETED
@@ -1,130 +0,0 @@
1
- """
2
- Nested LoRA — One Particle, Multiple Orbitals
3
- ===============================================
4
-
5
- Single LoRA adapter pair with dynamic rank via slicing.
6
- r4 ⊂ r8 ⊂ r16 — descending pauses dimensions, ascending resumes them.
7
- Zero cold start on transitions.
8
-
9
- This module is the "engine" — pure architecture, no control logic.
10
- Pair with OrbitalController for adaptive rank decisions.
11
-
12
- Author: Simona Vargiu
13
- License: Apache 2.0
14
- """
15
-
16
- import math
17
- import torch
18
- import torch.nn as nn
19
- import torch.nn.functional as F
20
- from typing import List
21
-
22
-
23
- class NestedLoRALinear(nn.Module):
24
- """
25
- Single LoRA adapter with dynamic rank via slicing.
26
-
27
- A single pair of matrices A(max_rank, in) and B(out, max_rank) is shared
28
- across all rank levels. The active rank is controlled by slicing:
29
-
30
- r=4 → A[:4, :], B[:, :4]
31
- r=8 → A[:8, :], B[:, :8]
32
- r=16 → A[:16,:], B[:, :16]
33
-
34
- When descending from r=16 to r=4, dimensions 0-3 retain all learned
35
- weights. Dimensions 4-15 are paused (no gradient), not destroyed.
36
- When ascending back, they resume exactly where they left off.
37
-
38
- Output is scaled by max_rank/active_rank to maintain consistent
39
- magnitude across rank changes (analogous to alpha/r in standard LoRA).
40
-
41
- Args:
42
- linear: Original nn.Linear layer to wrap
43
- max_rank: Maximum LoRA rank (default: 16)
44
-
45
- Example:
46
- >>> layer = NestedLoRALinear(original_linear, max_rank=16)
47
- >>> layer.set_rank(4) # use 4 dimensions
48
- >>> out = layer(x) # forward with r=4
49
- >>> layer.set_rank(16) # expand to full rank
50
- >>> out = layer(x) # forward with r=16, dimensions 0-3 unchanged
51
- """
52
-
53
- def __init__(self, linear: nn.Linear, max_rank: int = 16):
54
- super().__init__()
55
- self.linear = linear
56
- self.max_rank = max_rank
57
- self.active_rank = max_rank
58
-
59
- # Freeze original weights
60
- for p in self.linear.parameters():
61
- p.requires_grad = False
62
-
63
- # One particle: single A and B
64
- self.lora_A = nn.Parameter(torch.empty(max_rank, linear.in_features))
65
- self.lora_B = nn.Parameter(torch.zeros(linear.out_features, max_rank))
66
-
67
- # Standard LoRA init: A = kaiming, B = zeros → initial delta = 0
68
- nn.init.kaiming_uniform_(self.lora_A, a=math.sqrt(5))
69
-
70
- def set_rank(self, r: int):
71
- """Set the active orbital. Must be <= max_rank."""
72
- self.active_rank = min(r, self.max_rank)
73
-
74
- def forward(self, x: torch.Tensor) -> torch.Tensor:
75
- base = self.linear(x)
76
- r = self.active_rank
77
-
78
- h = F.linear(x, self.lora_A[:r, :])
79
- delta = F.linear(h, self.lora_B[:, :r])
80
-
81
- scale = self.max_rank / r
82
- return base + delta * scale
83
-
84
-
85
- def inject_nested_lora(model: nn.Module, max_rank: int = 16) -> nn.Module:
86
- """
87
- Replace attention Linear layers with NestedLoRALinear.
88
-
89
- Targets any nn.Linear whose full name contains "attention".
90
- Original weights are frozen; only LoRA parameters are trainable.
91
-
92
- Args:
93
- model: PyTorch model
94
- max_rank: Maximum LoRA rank
95
-
96
- Returns:
97
- Model with NestedLoRA injected
98
- """
99
- for name, module in list(model.named_modules()):
100
- if isinstance(module, nn.Linear) and "attention" in name:
101
- parent = model
102
- *path, last = name.split(".")
103
- for p in path:
104
- parent = getattr(parent, p)
105
- setattr(parent, last, NestedLoRALinear(module, max_rank))
106
- return model
107
-
108
-
109
- def set_rank(model: nn.Module, r: int):
110
- """Set active rank on all NestedLoRALinear modules in the model."""
111
- for m in model.modules():
112
- if isinstance(m, NestedLoRALinear):
113
- m.set_rank(r)
114
-
115
-
116
- def get_lora_params(model: nn.Module) -> List[nn.Parameter]:
117
- """Get all LoRA parameters (for optimizer setup)."""
118
- params = []
119
- for m in model.modules():
120
- if isinstance(m, NestedLoRALinear):
121
- params.extend([m.lora_A, m.lora_B])
122
- return params
123
-
124
-
125
- def count_params(model: nn.Module) -> dict:
126
- """Count total, trainable, and LoRA parameters."""
127
- total = sum(p.numel() for p in model.parameters())
128
- trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
129
- lora = sum(p.numel() for p in get_lora_params(model))
130
- return {"total": total, "trainable": trainable, "lora": lora}
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
notebooks/mrpc_example.ipynb DELETED
@@ -1,165 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {},
6
- "source": [
7
- "# Orbital LoRA - MRPC Benchmark Example\n",
8
- "\n",
9
- "**Expected:** performance parity with baseline + adaptive behavior\n"
10
- ]
11
- },
12
- {
13
- "cell_type": "code",
14
- "metadata": {},
15
- "source": [
16
- "!pip install -q transformers datasets evaluate scikit-learn accelerate"
17
- ],
18
- "outputs": [],
19
- "execution_count": null
20
- },
21
- {
22
- "cell_type": "code",
23
- "metadata": {},
24
- "source": [
25
- "import torch\n",
26
- "from datasets import load_dataset\n",
27
- "from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
28
- "from torch.utils.data import DataLoader\n",
29
- "import evaluate\n",
30
- "\n",
31
- "import sys\n",
32
- "sys.path.append('..')\n",
33
- "\n",
34
- "from nested_lora import inject_nested_lora\n",
35
- "from orbital_controller import OrbitalController\n",
36
- "from controller import set_rank\n",
37
- "\n",
38
- "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
39
- "print(device)"
40
- ],
41
- "outputs": [],
42
- "execution_count": null
43
- },
44
- {
45
- "cell_type": "code",
46
- "metadata": {},
47
- "source": [
48
- "dataset = load_dataset('glue','mrpc')\n",
49
- "tokenizer = AutoTokenizer.from_pretrained('distilbert-base-uncased')\n",
50
- "\n",
51
- "def tok(x):\n",
52
- " return tokenizer(x['sentence1'], x['sentence2'], truncation=True, padding='max_length', max_length=128)\n",
53
- "\n",
54
- "train = dataset['train'].map(tok, batched=True)\n",
55
- "val = dataset['validation'].map(tok, batched=True)\n",
56
- "\n",
57
- "train.set_format(type='torch', columns=['input_ids','attention_mask','label'])\n",
58
- "val.set_format(type='torch', columns=['input_ids','attention_mask','label'])\n",
59
- "\n",
60
- "train_loader = DataLoader(train, batch_size=16, shuffle=True)\n",
61
- "val_loader = DataLoader(val, batch_size=16)\n",
62
- "\n",
63
- "metric = evaluate.load('glue','mrpc')"
64
- ],
65
- "outputs": [],
66
- "execution_count": null
67
- },
68
- {
69
- "cell_type": "code",
70
- "metadata": {},
71
- "source": [
72
- "def eval_model(model):\n",
73
- " model.eval()\n",
74
- " preds, labels = [], []\n",
75
- " with torch.no_grad():\n",
76
- " for b in val_loader:\n",
77
- " x=b['input_ids'].to(device)\n",
78
- " m=b['attention_mask'].to(device)\n",
79
- " y=b['label'].to(device)\n",
80
- " p=model(input_ids=x,attention_mask=m).logits.argmax(-1)\n",
81
- " preds.extend(p.cpu().numpy()); labels.extend(y.cpu().numpy())\n",
82
- " return metric.compute(predictions=preds,references=labels)['f1']"
83
- ],
84
- "outputs": [],
85
- "execution_count": null
86
- },
87
- {
88
- "cell_type": "code",
89
- "metadata": {},
90
- "source": [
91
- "# BASELINE\n",
92
- "model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)\n",
93
- "model = inject_nested_lora(model,16).to(device)\n",
94
- "set_rank(model,16)\n",
95
- "\n",
96
- "opt = torch.optim.AdamW(model.parameters(), lr=5e-5)\n",
97
- "\n",
98
- "for step,b in enumerate(train_loader):\n",
99
- " if step>200: break\n",
100
- " x=b['input_ids'].to(device); m=b['attention_mask'].to(device); y=b['label'].to(device)\n",
101
- " loss=model(input_ids=x,attention_mask=m,labels=y).loss\n",
102
- " loss.backward(); opt.step(); opt.zero_grad()\n",
103
- "\n",
104
- "f1_base = eval_model(model)\n",
105
- "print('Baseline F1:', round(f1_base,3))"
106
- ],
107
- "outputs": [],
108
- "execution_count": null
109
- },
110
- {
111
- "cell_type": "code",
112
- "metadata": {},
113
- "source": [
114
- "# ORBITAL\n",
115
- "model = AutoModelForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=2)\n",
116
- "model = inject_nested_lora(model,16).to(device)\n",
117
- "\n",
118
- "ctrl = OrbitalController(warmup=10, stable_window=6)\n",
119
- "set_rank(model,4)\n",
120
- "\n",
121
- "opt = torch.optim.AdamW(model.parameters(), lr=5e-5)\n",
122
- "\n",
123
- "for step,b in enumerate(train_loader):\n",
124
- " if step>200: break\n",
125
- " x=b['input_ids'].to(device); m=b['attention_mask'].to(device); y=b['label'].to(device)\n",
126
- " loss=model(input_ids=x,attention_mask=m,labels=y).loss\n",
127
- " loss.backward()\n",
128
- "\n",
129
- " r = ctrl.step(loss.item())\n",
130
- " r = max(4,min(16,r))\n",
131
- " set_rank(model,r)\n",
132
- "\n",
133
- " opt.step(); opt.zero_grad()\n",
134
- "\n",
135
- "f1_orb = eval_model(model)\n",
136
- "print('Orbital F1:', round(f1_orb,3))"
137
- ],
138
- "outputs": [],
139
- "execution_count": null
140
- },
141
- {
142
- "cell_type": "code",
143
- "metadata": {},
144
- "source": [
145
- "print('\\nBaseline:', round(f1_base,3))\n",
146
- "print('Orbital:', round(f1_orb,3))\n",
147
- "print('Delta:', round(f1_orb-f1_base,3))"
148
- ],
149
- "outputs": [],
150
- "execution_count": null
151
- }
152
- ],
153
- "metadata": {
154
- "kernelspec": {
155
- "display_name": "Python 3",
156
- "language": "python",
157
- "name": "python3"
158
- },
159
- "language_info": {
160
- "name": "python"
161
- }
162
- },
163
- "nbformat": 4,
164
- "nbformat_minor": 4
165
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
orbital_controller.py DELETED
@@ -1,291 +0,0 @@
1
- """
2
- Orbital Controller — Trajectory Control with Memory
3
- =====================================================
4
-
5
- Closed-loop rank controller that adapts model capacity based on
6
- observed training stress. Works with any rank-adjustable system
7
- (NestedLoRA, adaptive LR, or API-based training).
8
-
9
- This module is the "intelligence" — pure control logic, no model code.
10
- Pair with NestedLoRA for the complete Unified-LoRA system.
11
-
12
- Author: Simona Vargiu
13
- License: Apache 2.0
14
- """
15
-
16
- import numpy as np
17
- from typing import Dict, List, Optional
18
-
19
-
20
- class OrbitalController:
21
- """
22
- Closed-loop trajectory controller for dynamic capacity adaptation.
23
-
24
- Unlike threshold-based controllers that map stress to rank statically,
25
- this implements orbital dynamics with memory:
26
-
27
- Ascend: stress detected → jump to higher orbital, push delta
28
- Hold: oscillating → stay, don't move
29
- Descend: confirmed stable → pop delta, symmetric return
30
-
31
- Each capacity increase is tracked on a stack and reversed only under
32
- confirmed stability. This prevents premature compression (returning
33
- too early) and oscillatory collapse (bouncing between ranks).
34
-
35
- The stress signal and thresholds are adaptive — they auto-calibrate
36
- to any model/task/loss scale without manual tuning.
37
-
38
- Args:
39
- ranks: Available capacity levels (default: [4, 8, 16])
40
- warmup: Steps at max capacity to build EMA baseline
41
- stable_window: Consecutive stable steps required for descent
42
-
43
- Example:
44
- >>> from nested_lora import inject_nested_lora, set_rank
45
- >>> from orbital_controller import OrbitalController
46
- >>>
47
- >>> model = inject_nested_lora(model, max_rank=16)
48
- >>> ctrl = OrbitalController()
49
- >>>
50
- >>> for step, batch in enumerate(loader):
51
- ... loss = model(**batch).loss
52
- ... new_rank = ctrl.step(loss.item())
53
- ... set_rank(model, new_rank)
54
- ... loss.backward()
55
- ... optimizer.step()
56
- """
57
-
58
- def __init__(
59
- self,
60
- ranks: Optional[List[int]] = None,
61
- warmup: int = 10,
62
- stable_window: int = 6,
63
- ):
64
- self.RANKS = ranks or [4, 8, 16]
65
- self.warmup = warmup
66
- self.stable_window = stable_window
67
- self.reset()
68
-
69
- def reset(self):
70
- """Reset controller to initial state."""
71
- self.rank = self.RANKS[-1]
72
- self.orbit_stack = []
73
- self.loss_ema = 0.0
74
- self.prev_loss = None
75
- self.phi_hist = []
76
- self.stable_count = 0
77
- self.step_count = 0
78
- self.post_warmup = False
79
-
80
- self.history = {
81
- "rank": [],
82
- "phi": [],
83
- "stable_count": [],
84
- }
85
-
86
- # ── Stress signal ───────────────────────────────
87
-
88
- def _compute_phi(self, loss: float) -> float:
89
- """
90
- Stress signal from loss trajectory.
91
-
92
- φ = |loss - EMA| + 2.0 × max(0, loss - prev_loss)
93
-
94
- Combines deviation from trend (general instability)
95
- with spike detection (sudden deterioration).
96
- """
97
- self.loss_ema = 0.9 * self.loss_ema + 0.1 * loss
98
- delta = abs(loss - self.loss_ema)
99
- spike = max(0.0, loss - self.prev_loss) if self.prev_loss is not None else 0.0
100
- self.prev_loss = loss
101
- return delta + 2.0 * spike
102
-
103
- def _thresholds(self):
104
- """
105
- Adaptive thresholds from running statistics.
106
-
107
- t_stress = μ + 0.7σ (above this → ascend)
108
- t_stable = μ - 0.3σ (below this → stability confirmed)
109
-
110
- Auto-calibrates to loss scale. No manual tuning.
111
- """
112
- if len(self.phi_hist) < 10:
113
- return 0.15, 0.04
114
- recent = self.phi_hist[-40:]
115
- mu = np.mean(recent)
116
- sigma = np.std(recent) + 1e-8
117
- t_stress = mu + 0.7 * sigma
118
- t_stable = max(mu - 0.3 * sigma, 0.0)
119
- return t_stress, t_stable
120
-
121
- # ── Core logic ──────────────────────────────────
122
-
123
- def _rank_index(self) -> int:
124
- return self.RANKS.index(self.rank)
125
-
126
- def step(self, loss: float) -> int:
127
- """
128
- Called once per training step. Returns the capacity level to use.
129
-
130
- Args:
131
- loss: Current step loss value
132
-
133
- Returns:
134
- int: Active rank (or capacity level) for next step
135
- """
136
- self.step_count += 1
137
-
138
- # First step: initialize EMA
139
- if self.prev_loss is None:
140
- self.loss_ema = loss
141
- self.prev_loss = loss
142
- self._log(0.0)
143
- return self.rank
144
-
145
- phi = self._compute_phi(loss)
146
- self.phi_hist.append(phi)
147
-
148
- # Warmup: build baseline at max capacity
149
- if self.step_count <= self.warmup:
150
- self._log(phi)
151
- return self.rank
152
-
153
- # Transition: warmup → ground state
154
- if not self.post_warmup:
155
- self.post_warmup = True
156
- self.rank = self.RANKS[0]
157
- self.orbit_stack = []
158
- self.stable_count = 0
159
- self._log(phi)
160
- return self.rank
161
-
162
- t_stress, t_stable = self._thresholds()
163
-
164
- # Stability counter
165
- if phi <= t_stable:
166
- self.stable_count += 1
167
- elif phi > t_stress:
168
- self.stable_count = 0
169
- else:
170
- self.stable_count = max(0, self.stable_count - 1)
171
-
172
- # ASCEND: stress → jump to higher orbital
173
- if phi > t_stress and self.rank < self.RANKS[-1]:
174
- idx = self._rank_index()
175
- new_idx = min(idx + 1, len(self.RANKS) - 1)
176
- new_rank = self.RANKS[new_idx]
177
- if new_rank != self.rank:
178
- self.orbit_stack.append(new_rank - self.rank)
179
- self.rank = new_rank
180
- self.stable_count = 0
181
- self._log(phi)
182
- return self.rank
183
-
184
- # DESCEND: confirmed stability → symmetric return
185
- if self.stable_count >= self.stable_window and self.orbit_stack:
186
- delta = self.orbit_stack.pop()
187
- target = self.rank - delta
188
- self.rank = min(self.RANKS, key=lambda r: abs(r - target))
189
- self.rank = max(self.rank, self.RANKS[0])
190
- self.stable_count = 0
191
- self._log(phi)
192
- return self.rank
193
-
194
- # HOLD: neutral → don't move
195
- self._log(phi)
196
- return self.rank
197
-
198
- # ── Introspection ───────────────────────────────
199
-
200
- def _log(self, phi: float):
201
- self.history["rank"].append(self.rank)
202
- self.history["phi"].append(phi)
203
- self.history["stable_count"].append(self.stable_count)
204
-
205
- def get_state(self) -> Dict:
206
- """Current controller state."""
207
- return {
208
- "rank": self.rank,
209
- "step": self.step_count,
210
- "orbit_stack": list(self.orbit_stack),
211
- "stable_count": self.stable_count,
212
- "phi": self.phi_hist[-1] if self.phi_hist else 0.0,
213
- }
214
-
215
- def get_history(self) -> Dict[str, list]:
216
- """Complete training history."""
217
- return self.history
218
-
219
- def __repr__(self) -> str:
220
- return (
221
- f"OrbitalController(step={self.step_count}, rank={self.rank}, "
222
- f"stack={self.orbit_stack}, stable={self.stable_count})"
223
- )
224
-
225
-
226
- # ============================================================
227
- # CONVENIENCE: setup helper
228
- # ============================================================
229
-
230
- def setup_unified_lora(model, max_rank=16, ranks=None, warmup=10, stable_window=6):
231
- """
232
- One-call setup: inject NestedLoRA + create OrbitalController.
233
-
234
- Args:
235
- model: PyTorch model
236
- max_rank: Maximum LoRA rank
237
- ranks: Available rank levels
238
- warmup: Controller warmup steps
239
- stable_window: Steps of stability before descent
240
-
241
- Returns:
242
- (model, controller) tuple
243
-
244
- Example:
245
- >>> from orbital_controller import setup_unified_lora
246
- >>> from nested_lora import set_rank
247
- >>>
248
- >>> model, ctrl = setup_unified_lora(model)
249
- >>> for step, batch in enumerate(loader):
250
- ... loss = model(**batch).loss
251
- ... set_rank(model, ctrl.step(loss.item()))
252
- ... loss.backward(); optimizer.step(); optimizer.zero_grad()
253
- """
254
- from nested_lora import inject_nested_lora
255
-
256
- model = inject_nested_lora(model, max_rank)
257
- controller = OrbitalController(
258
- ranks=ranks or [4, 8, 16],
259
- warmup=warmup,
260
- stable_window=stable_window,
261
- )
262
- return model, controller
263
-
264
-
265
- # ============================================================
266
- # DEMO
267
- # ============================================================
268
-
269
- if __name__ == "__main__":
270
- print("Orbital Controller — Demo")
271
- print("=" * 50)
272
- print("Simulating: 30 stable → 10 shock → 30 recovery\n")
273
-
274
- ctrl = OrbitalController(warmup=8, stable_window=5)
275
-
276
- for step in range(70):
277
- if step < 30:
278
- loss = np.random.uniform(0.4, 0.6)
279
- elif step < 40:
280
- loss = np.random.uniform(1.5, 3.0)
281
- else:
282
- loss = np.random.uniform(0.3, 0.5)
283
-
284
- rank = ctrl.step(loss)
285
-
286
- if step % 5 == 0 or step == 30:
287
- s = ctrl.get_state()
288
- tag = " <<<SHOCK" if step == 30 else ""
289
- print(f" [{step:3d}] rank={rank:2d} phi={s['phi']:.3f} stack={s['orbit_stack']}{tag}")
290
-
291
- print(f"\nFinal: {ctrl}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt DELETED
@@ -1,6 +0,0 @@
1
- torch
2
- transformers
3
- datasets
4
- evaluate
5
- accelerate
6
- scikit-learn
 
 
 
 
 
 
 
unified_lora.py DELETED
@@ -1,14 +0,0 @@
1
- """
2
- Legacy Adaptive LoRA (Deprecated)
3
- ================================
4
-
5
- Early gradient-based adaptive rank prototype.
6
-
7
- Replaced by:
8
- - NestedLoRA (shared orbital architecture)
9
- - OrbitalController (stress-based closed-loop control)
10
-
11
- This file is kept for reference only.
12
-
13
- Status: deprecated
14
- """