HiMind commited on
Commit
299b8f3
Β·
verified Β·
1 Parent(s): 5d6993d

Upload 5 files

Browse files
Files changed (6) hide show
  1. .gitattributes +1 -0
  2. LICENSE +197 -0
  3. PackedLLM.py +0 -0
  4. README.md +222 -5
  5. images/packedllm_architecture.png +3 -0
  6. requirements.txt +26 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ images/packedllm_architecture.png filter=lfs diff=lfs merge=lfs -text
LICENSE CHANGED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PackedLLM Source-Available Research and Controlled Commercial Use License
2
+
3
+ PackedLicense v1.0
4
+ Copyright (c) [2026] [Chance Brownfield]
5
+
6
+ This license is intended to govern the use, modification, distribution, and commercial licensing of the PackedLLM system and its protected modules, including but not limited to GATOR, WebSearchModule, and CodeBox.
7
+
8
+ This license governs the use, modification, distribution, and commercial licensing of the PackedLLM system, including the GATOR, WebSearchModule, and CodeBox modules.
9
+ ---
10
+
11
+ ## 1. Definitions
12
+
13
+ For purposes of this License:
14
+
15
+ **β€œLicensed Work”** means the software, source code, model weights, checkpoints, configurations, prompt templates, documentation, build artifacts, scripts, and related materials distributed by the Licensor under this License, including all modifications and derivative works of those materials.
16
+
17
+ **β€œPackedLLM System”** means the orchestration framework, routing logic, checkpoint packing and loading format, expert dispatch system, runtime coordination logic, and related components distributed as part of or alongside the Licensed Work.
18
+
19
+ **β€œProtected Modules”** means the following components and any modifications or derivative works thereof:
20
+
21
+ * **GATOR** β€” including memory bank, semantic tree structures, retrieval logic, profile storage, command registry, and related memory orchestration code;
22
+ * **WebSearchModule** β€” including multi-engine search, page fetching, content extraction, crawling, ranking, summarization, and related web-research code;
23
+ * **CodeBox** β€” including sandbox execution, asset registry, virtual environment management, DAG pipeline execution, loader injection, and related code-execution infrastructure.
24
+
25
+ **β€œNon-Commercial Use”** means use that is not primarily intended for or directed toward commercial advantage or monetary compensation, including personal, academic, research, hobby, and internal evaluation use.
26
+
27
+ **β€œCommercial Use”** means any use, deployment, distribution, offering, hosting, embedding, sublicensing, or other exploitation of the Licensed Work or any derivative work that is directly or indirectly intended for or associated with commercial advantage, monetary compensation, paid access, enterprise use, SaaS, resale, bundling, or product incorporation.
28
+
29
+ **β€œWritten Commercial License”** means a separate written agreement signed by Licensor expressly authorizing Commercial Use of the Licensed Work or any portion of it.
30
+
31
+ **β€œDerivative Work”** means any work based upon, adapted from, translated from, modified from, or otherwise incorporating the Licensed Work or any Protected Module.
32
+
33
+ ---
34
+
35
+ ## 2. Grant of Rights for Non-Commercial Use
36
+
37
+ Subject to the terms and conditions of this License, Licensor grants you a worldwide, non-exclusive, non-transferable, royalty-free license to:
38
+
39
+ 1. use the Licensed Work for Non-Commercial Use;
40
+ 2. copy the Licensed Work for Non-Commercial Use;
41
+ 3. modify the Licensed Work for Non-Commercial Use;
42
+ 4. create Derivative Works for Non-Commercial Use; and
43
+ 5. distribute the Licensed Work or Derivative Works solely for Non-Commercial Use and only if all recipients are bound by this License.
44
+
45
+ No rights are granted except as expressly stated in this License.
46
+
47
+ ---
48
+
49
+ ## 3. Commercial Use Requires Written Permission
50
+
51
+ Commercial Use of the Licensed Work, the PackedLLM System, or any Protected Module is prohibited unless you first obtain a Written Commercial License from Licensor.
52
+
53
+ Without limiting the foregoing, Commercial Use includes, without limitation:
54
+
55
+ * offering the Licensed Work as a service, hosted endpoint, API, or subscription product;
56
+ * incorporating the Licensed Work into a paid or revenue-generating product;
57
+ * redistributing the Licensed Work to paying customers or enterprise users;
58
+ * using the Licensed Work in internal business operations where the Licensed Work is a material part of a revenue-producing workflow, product, or service;
59
+ * sublicensing the Licensed Work or any Protected Module for commercial purposes.
60
+
61
+ Licensor may approve, deny, condition, or limit Commercial Use in Licensor’s sole discretion.
62
+
63
+ Any Commercial Use without a Written Commercial License is a material breach of this License.
64
+
65
+ ---
66
+
67
+ ## 4. Protected Modules and No Extraction or Standalone Reuse
68
+
69
+ The Protected Modules are expressly identified as core and protected components of the Licensed Work.
70
+
71
+ You may not, without a Written Commercial License:
72
+
73
+ 1. extract, isolate, copy, port, transplant, or repackage any Protected Module as a standalone library, standalone service, plugin, product, or subsystem;
74
+ 2. incorporate any Protected Module into another system, agent framework, model runtime, or commercial product;
75
+ 3. distribute a derivative implementation of any Protected Module under a substantially similar interface, architecture, workflow, or operational behavior where such implementation is based on or derived from the Licensed Work;
76
+ 4. remove, conceal, or fragment the Protected Modules for the purpose of avoiding the terms of this License.
77
+
78
+ For avoidance of doubt, independent reimplementation from scratch without copying code, checkpoints, configuration, documentation, or other protectable expression is not governed by this License; however, any direct use of the Licensed Work or its protectable expression remains subject to this License.
79
+
80
+ ---
81
+
82
+ ## 5. Attribution and Credit Requirements
83
+
84
+ If you use, distribute, modify, or publish any part of the Licensed Work, you must retain and reproduce all copyright notices, license notices, and attribution notices contained in the Licensed Work.
85
+
86
+ In addition, any public distribution, publication, presentation, product documentation, or user-facing deployment that references the Licensed Work must include prominent attribution substantially in the following form:
87
+
88
+ > β€œIncludes or is based on PackedLLM, including the GATOR memory system, WebSearchModule, CodeBox sandboxing system, and the PackedLLM checkpoint packing architecture.”
89
+
90
+ You may not remove, obscure, or alter this attribution in a way that misrepresents authorship or origin.
91
+
92
+ ---
93
+
94
+ ## 6. Modification and Redistribution
95
+
96
+ You may modify the Licensed Work and create Derivative Works for Non-Commercial Use, provided that:
97
+
98
+ 1. you clearly mark your modifications;
99
+ 2. you retain this License and all notices;
100
+ 3. you do not misrepresent modified work as the original;
101
+ 4. you do not remove attribution required by this License; and
102
+ 5. you do not use the modifications for Commercial Use without a Written Commercial License.
103
+
104
+ Redistribution is permitted only if the recipient receives the Licensed Work under this same License and only for Non-Commercial Use unless otherwise authorized in writing by Licensor.
105
+
106
+ ---
107
+
108
+ ## 7. No Trademark License
109
+
110
+ This License does not grant any right to use Licensor’s trade names, trademarks, logos, product names, or branding, including the name β€œPackedLLM,” except as necessary to comply with the attribution requirements of Section 5.
111
+
112
+ Any goodwill arising from permitted use of the trademarks or product names shall inure to the benefit of Licensor.
113
+
114
+ ---
115
+
116
+ ## 8. Optional Commercial Terms; Revenue Share
117
+
118
+ If Licensor grants a Written Commercial License, Licensor may impose additional terms, including without limitation:
119
+
120
+ * a fee schedule;
121
+ * per-seat, per-instance, or per-deployment pricing;
122
+ * revenue share obligations;
123
+ * reporting and audit obligations;
124
+ * attribution requirements;
125
+ * support and maintenance terms;
126
+ * field-of-use restrictions;
127
+ * geographic restrictions;
128
+ * termination rights.
129
+
130
+ Any such terms must be set out in a separate written agreement signed by Licensor and the commercial user.
131
+
132
+ ---
133
+
134
+ ## 9. Patent Rights
135
+
136
+ To the extent Licensor holds patent rights that are necessarily infringed by the Licensed Work as distributed by Licensor, Licensor grants you a non-exclusive, worldwide, royalty-free patent license to practice those patent claims solely for Non-Commercial Use under this License.
137
+
138
+ This patent grant does not apply to Commercial Use unless expressly authorized in a Written Commercial License.
139
+
140
+ ---
141
+
142
+ ## 10. No Warranty
143
+
144
+ The Licensed Work is provided β€œAS IS,” without warranty of any kind, express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, non-infringement, title, or quiet enjoyment.
145
+
146
+ You are solely responsible for evaluating the suitability, safety, legality, and correctness of the Licensed Work for your use.
147
+
148
+ ---
149
+
150
+ ## 11. Limitation of Liability
151
+
152
+ To the maximum extent permitted by law, Licensor shall not be liable for any damages arising out of or related to the Licensed Work, including without limitation direct, indirect, incidental, special, consequential, exemplary, or punitive damages, or loss of profits, data, business opportunity, goodwill, or reputation.
153
+
154
+ ---
155
+
156
+ ## 12. Termination
157
+
158
+ This License automatically terminates upon any breach of its terms by you.
159
+
160
+ Upon termination, you must immediately cease all use, copying, modification, distribution, and Commercial Use of the Licensed Work except to the extent otherwise permitted by applicable law.
161
+
162
+ Sections relating to attribution, trademark, warranty disclaimer, limitation of liability, and any obligations that by their nature should survive termination shall survive termination.
163
+
164
+ ---
165
+
166
+ ## 13. Reservation of Rights
167
+
168
+ All rights not expressly granted to you under this License are reserved by Licensor.
169
+
170
+ No implied rights are granted by estoppel, implication, exhaustion, or otherwise.
171
+
172
+ ---
173
+
174
+ ## 14. Governing Terms for Commercial Access
175
+
176
+ Any Commercial Use requires a separate written agreement. In the event of conflict between this License and a Written Commercial License, the Written Commercial License controls solely with respect to the covered commercial use and only to the extent of the conflict.
177
+
178
+ ---
179
+
180
+ ## 15. Entire License
181
+
182
+ This License constitutes the entire public license governing Non-Commercial Use of the Licensed Work unless superseded by a separate written commercial agreement signed by Licensor.
183
+
184
+ ---
185
+
186
+ ## 16. Contact for Commercial Licensing
187
+
188
+ To request Commercial Use authorization, contact:
189
+
190
+ **Licensor:** [Chance Brownfield]
191
+ **Email:** [HiMindAi@proton.me]
192
+
193
+ ---
194
+
195
+ ## 17. Acceptance
196
+
197
+ By using, copying, modifying, or distributing the Licensed Work, you agree to be bound by the terms of this License.
PackedLLM.py ADDED
The diff for this file is too large to render. See raw diff
 
README.md CHANGED
@@ -1,5 +1,222 @@
1
- ---
2
- license: other
3
- license_name: packedlicensev1.0
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - en
5
+ - zh
6
+ tags:
7
+ - routing-of-experts
8
+ - compound-ai
9
+ - multi-expert
10
+ - agentic
11
+ - code-execution
12
+ - web-search
13
+ - persistent-memory
14
+ - persona
15
+ - chain-of-thought
16
+ - llm-orchestration
17
+ - llama-cpp
18
+ - gguf
19
+ - packed-model
20
+ pipeline_tag: text-generation
21
+ ---
22
+ # PackedLLM
23
+
24
+ **~10B total parameters Β· ~3B active per inference Β· Routing-of-Experts (RoE) architecture**
25
+
26
+ PackedLLM is a self-contained multi-expert language model system built around a **Routing-of-Experts (RoE)** mechanism. Rather than mixing expert outputs at the token level inside a shared transformer (Mixture-of-Experts), PackedLLM routes each request β€” and each stage of a multi-stage reasoning pipeline β€” to a dedicated, fully independent specialist model. At most one or two experts are active simultaneously, keeping peak memory around 3B parameters regardless of the 10B total footprint.
27
+
28
+ The system runs entirely on consumer hardware via llama.cpp persists its full state to a single ZIP checkpoint, and integrates persistent vector memory, sandboxed Python execution, and multi-engine web search as first-class pipeline citizens.
29
+
30
+ ---
31
+
32
+ ## Architecture overview
33
+
34
+ ```
35
+ PackedLLMRunner ← user-facing shell: load, warmup, lifecycle
36
+ β”‚
37
+ PackedLLM (PackedLLM.pt) ← 9-stage orchestration pipeline
38
+ β”‚
39
+ Expert dispatch layer ← 10 specialist models, one active at a time
40
+ β”‚
41
+ β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
42
+ β”‚ GATOR/MemoryBank CodeBox Web β”‚ ← integrated modules
43
+ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
44
+ β”‚
45
+ PackedLM (LM.pt) ← llama.cpp inference engine + ExpertHandles
46
+ ```
47
+ <img src="images/packedllm_architecture.png" alt="PackedLLM Architecture" width="1000">
48
+
49
+ ---
50
+
51
+ ## How it differs from MoE
52
+
53
+ | | Standard MoE (Mixtral, DeepSeek) | PackedLLM RoE |
54
+ |---|---|---|
55
+ | **Routing granularity** | Per-token, inside every transformer layer | Per-task and per-pipeline-stage |
56
+ | **What gets routed** | FFN sub-modules sharing one transformer | Separate, fully independent specialist LLMs |
57
+ | **Parameters active** | Top-K experts Γ— FFN size, across all layers | One expert at a time (~3B peak) |
58
+ | **Router mechanism** | Learned linear gating vector | `HeadExpert` β€” a full LLM returning JSON |
59
+ | **Experts share weights?** | Yes (all attention layers are always shared) | No β€” complete independence |
60
+ | **Pipeline** | Single transformer forward pass | 9-stage: plan β†’ route β†’ execute β†’ synthesize β†’ persona β†’ review |
61
+
62
+ The closest related work is **Composition of Experts** (Chai et al., 2024, arXiv:2412.01868), which also routes at the input level to full LLM models. PackedLLM extends this with a multi-stage orchestration pipeline, per-stage retry/detour recovery, integrated memory and execution modules, affective state modelling, and a character persona layer β€” none of which appear in prior systems of this type.
63
+
64
+ ---
65
+
66
+ ## Pipeline stages
67
+
68
+ Every call to `forward()` passes through these stages in order:
69
+
70
+ | Stage | Expert | Temperature | What it does |
71
+ |---|---|---|---|
72
+ | 1. Plan goal | HeadExpert | 0.2 | Parse intent, tone, routing flags (needs_web / needs_action / needs_vision) |
73
+ | 2. Consult memory | GATOR | β€” | Match registered commands; retrieve relevant memory |
74
+ | 3. Build route | HeadExpert | 0.5 | Generate ordered list of expert steps as JSON |
75
+ | 4. Execute route | various | per-expert | Dispatch each step; retry / detour / skip on failure |
76
+ | 5. Synthesize base | HeadExpert | 1.0 | Combine step outputs into a persona-free prose answer |
77
+ | 6. Affective state | AffectExpert | 0.5 | Generate bot emotional + physical state JSON |
78
+ | 7. Apply persona | RoleExpert | 1.0 | Rewrite base response in character |
79
+ | 8. Review | HeadExpert | 0.0 | Accept / revise / reject; extract memory facts; profile updates |
80
+ | 9. Finalize | GATOR | β€” | Write memory, update user/bot profiles |
81
+
82
+ ---
83
+
84
+ ## Expert roster
85
+
86
+ | Expert | Active params | Role | Notes |
87
+ |---|---|---|-------------------------------------------|
88
+ | HeadExpert | ~3B | Orchestrator, router, planner, synthesizer, reviewer | Most-called expert |
89
+ | LogicExpert | ~1B | Structured reasoning; deep-think CoT; action planning/repair | raw completion with `<think>` blocks |
90
+ | CodeExpert | ~1B | Python script generation for action pipeline | Temperature 0.0; raw code only, no prose |
91
+ | MathExpert | ~1B | Quantitative reasoning | Post-processes CJK spans; deduplicates repeated lines |
92
+ | AffectExpert | ~0.5B | Emotional state; step quality evaluation | Used as both emotion classifier and pass/fail judge |
93
+ | RoleExpert | ~0.5B | Persona rewriting in character | RP style chat format |
94
+ | CreativeExpert | ~1B | Writing and stylistic generation | High temperature defaults (0.9) |
95
+ | VisionExpert | ~1B | Multimodal image understanding | CLIP projector; local images β†’ data URI |
96
+ | ToolExpert | ~0.5B | Function-call generation | outputs `{"tool_calls": [...]}` JSON |
97
+ | TranslationExpert | ~300M | Chinese β†’ English | seq2seq β€” not an LLM; Chinese regex gate |
98
+
99
+ **Total: ~10B Β· Peak active: ~3B**
100
+
101
+ ---
102
+
103
+ ## Forward modes
104
+
105
+ ### Standard (full pipeline)
106
+ ```python
107
+ bot.chat("What is the compound interest on $5000 at 4% over 10 years?")
108
+ ```
109
+ All 9 stages. Memory read/write. Web and action pipelines if needed.
110
+
111
+ ### Fast think (minimum latency)
112
+ ```python
113
+ bot.chat("What time is it in Tokyo?", fast_think=True)
114
+ ```
115
+ Skips planning, routing, memory, web, action, affective state, review. HeadExpert answers directly; RoleExpert applies persona if a bot profile exists. Maximum 2 LLM calls.
116
+
117
+ ### Deep think (CoT scaffolding)
118
+ ```python
119
+ bot.chat("Design a Python caching decorator with TTL support.", deep_think=True)
120
+ ```
121
+ Before each pipeline stage, `LogicExpert` generates `<think>...</think>` blocks scoped to that stage's specific task and output contract. These blocks are prepended to the stage's prompt as if the executing expert had already done that prior reasoning. Blocks are cached within a single `forward()` call. Translation is excluded (not an LLM).
122
+
123
+ ---
124
+
125
+ ## Integrated modules
126
+
127
+ ### MemoryBank (`GATOR.pt`)
128
+ A multi-tree semantic store built on `PackedTree` β€” a custom embedding + KMeans clustering retrieval structure. Trees: knowledge, conversation, user profiles, bot profiles, commands, assets, telemetry. Hybrid retrieval scoring: 75% semantic similarity + 20% keyword overlap + 5% importance metadata. Embedding model: Jina Embeddings v3 (GGUF, stored inside the checkpoint). GATOR's own action planner uses `HeadExpert` to decide which memory operations to run. Also contains `DesktopControl` (OS automation) and `CommandRegistry` (text-to-action macros).
129
+
130
+ ### CodeBox (`CodeBox.pt`)
131
+ Persistent Python sandbox with isolated virtual environment management, SHA256-verified asset registry, loader injection (`from _codebox_loader import load_asset` inside sandboxed code), DAG pipeline runner with `$var` reference passing between steps, LRU runner cache for expensive models, and hard RAM/CPU kill thresholds enforced by a monitoring thread.
132
+
133
+ ### Web (`WebSearch.pt`)
134
+ Three search engines (DuckDuckGo HTML, Google, ResultHunter) with embedding-ranked candidate deduplication. Content extraction tries 10 methods: YouTube transcripts β†’ trafilatura β†’ boilerpy3 β†’ readability β†’ newspaper3k β†’ goose3 β†’ inscriptis β†’ lxml β†’ BeautifulSoup β†’ visible text. PDF via PyMuPDF. Summarization via DistilBART. Runs in a separate spawned process; communicates via `multiprocessing.Queue`. Serializes safely β€” live process handles are stripped on save.
135
+
136
+ ---
137
+
138
+ ## Usage
139
+
140
+ ### Basic
141
+ ```python
142
+ from PackedLLM import PackedLLMRunner
143
+
144
+ bot = PackedLLMRunner("PackedLLM.pt", bot_id="pip", user_id="alice")
145
+ print(bot.chat("Explain gradient descent in one paragraph."))
146
+ ```
147
+
148
+ ### Expert shortcuts (bypass full pipeline)
149
+ ```python
150
+ bot.creative("Write a haiku about a robot discovering music.")
151
+ bot.code("Implement binary search in Python with comments.")
152
+ bot.math("Solve: integral of xΒ² Β· sin(x) dx")
153
+ bot.logic("All A are B. Some B are C. What follows?", mode="deep_then_answer")
154
+ bot.translate("δΊΊε·₯ζ™Ίθƒ½ζ­£εœ¨ζ”Ήε˜δΈ–η•Œ")
155
+ bot.web("Latest developments in solid-state batteries?")
156
+ bot.action("Compute compound interest on $5000 at 4% over 10 years; save to report.txt")
157
+ ```
158
+
159
+ ### Memory
160
+ ```python
161
+ bot.memory_store("User prefers concise answers under 100 words.")
162
+ results = bot.memory_recall("answer preferences", top_k=3)
163
+ bot.set_user_profile({"name": "Alice", "expertise": "ML"})
164
+ bot.set_bot_profile({"character_card": "You are Pip, a direct and slightly sarcastic assistant."})
165
+ ```
166
+
167
+ ### Lifecycle
168
+ ```python
169
+ bot.unload_expert("vision_expert") # free VRAM; reloads lazily on next use
170
+ bot.reload_expert("code_expert") # hot-reload after checkpoint update
171
+ print(bot.status()) # full system diagnostic
172
+
173
+ # Context manager (auto-unload on exit)
174
+ with PackedLLMRunner("PackedLLM.pt", bot_id="pip", user_id="alice") as bot:
175
+ print(bot.chat("Summarise the Pythagorean theorem."))
176
+ ```
177
+
178
+ ---
179
+
180
+ ## Checkpointing
181
+
182
+ `PackedLLM.pt` is a ZIP archive containing:
183
+ - `manifest.pt` β€” all metadata, profiles, hardware state, embedded source code
184
+ - `lm_chunk_N.bin` β€” model weights in 32MB streaming chunks
185
+ - `mem_chunk_N.bin` β€” GATOR memory store chunks
186
+ - `web_chunk_N.bin` β€” WebSearch module chunks
187
+ - `box_chunk_N.bin` β€” CodeBox chunks
188
+
189
+
190
+ ---
191
+
192
+ ## Hardware
193
+
194
+ PackedLM detects and uses CUDA, Apple Metal (MPS), WebGPU, or CPU automatically via `HardwareProbe`. For each expert, `_plan_offload()` estimates the GGUF file size and computes how many transformer layers can fit in free VRAM (with a 15% safety margin for CUDA, 40% for WebGPU). If VRAM is insufficient for a full offload, layers are split proportionally between GPU and CPU.
195
+
196
+ ---
197
+
198
+ ## Citation
199
+
200
+ ```bibtex
201
+ @software{packedllm2025,
202
+ Author = {Chance Brownfield},
203
+ title = {PackedLLM: A Routing-of-Experts System with LLM-Orchestrated Execution Pipeline},
204
+ year = {2026},
205
+ note = {RoE architecture: task-level routing to fully independent specialist LLMs.
206
+ Distinct from token-level Mixture-of-Experts.
207
+ Integrates persistent vector memory (GATOR), sandboxed Python execution (CodeBox),
208
+ and multi-engine web search in a 9-stage orchestration pipeline.}
209
+ }
210
+ ```
211
+
212
+ ---
213
+
214
+ ## License
215
+
216
+ This project is licensed under PackedLicense v1.0.
217
+
218
+ Free for personal, educational, research, and other non-commercial use.
219
+
220
+ Commercial use requires prior written authorization.
221
+
222
+ The GATOR, WebSearchModule, and CodeBox components are protected under this license and may not be extracted, redistributed, or commercially reused without authorization.
images/packedllm_architecture.png ADDED

Git LFS Details

  • SHA256: 5535d2cd255c25372e423a2b5ad69617667093e43e3aa3e3243c8e00e3fd54b7
  • Pointer size: 131 Bytes
  • Size of remote file: 667 kB
requirements.txt ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ numpy
2
+ psutil
3
+ requests
4
+ torch
5
+ torchvision
6
+ torchaudio
7
+ transformers
8
+ sentence-transformers
9
+ huggingface_hub
10
+ safetensors
11
+ spacy
12
+ Pillow
13
+ PyMuPDF
14
+ scikit-learn
15
+ beautifulsoup4
16
+ lxml
17
+ trafilatura
18
+ readability-lxml
19
+ newspaper3k
20
+ goose3
21
+ boilerpy3
22
+ inscriptis
23
+ llama-cpp-python
24
+ wgpu
25
+ youtube-transcript-api
26
+ tqdm