Ex0bit commited on Mar 5

Commit

208eb59

1 Parent(s): c08cfce

Upload complete JIT LoRA research: paper, source code, tests, and figures

Browse files

Files changed (30) hide show

.gitattributes +12 -0
.gitignore +12 -0
LICENSE +21 -0
README.md +141 -0
figures/jarvis-connected.png +3 -0
figures/jarvis-full-conversation.png +3 -0
figures/jarvis-general-knowledge.png +3 -0
figures/jarvis-interface.png +3 -0
figures/jarvis-post-training.png +3 -0
figures/jarvis-pre-training.png +3 -0
figures/jarvis-query-typing.png +3 -0
figures/jarvis-recall-pemberton.png +3 -0
figures/jarvis-recall-thunderbiscuit.png +3 -0
figures/jarvis-recall-zelnorite.png +3 -0
figures/jarvis-settings-mlx.png +3 -0
paper.pdf +3 -0
paper.tex +486 -0
requirements.txt +6 -0
src/ane_lora_trainer.py +685 -0
src/ane_mil_lora.py +662 -0
src/export_to_lms.py +119 -0
src/mlx_lora_trainer.py +788 -0
src/neural_config.py +146 -0
src/neural_daemon.py +1266 -0
src/neural_data.py +226 -0
tests/evaluation_results.json +183 -0
tests/raw_facts_2026.txt +663 -0
tests/test_daemon_e2e.py +347 -0
tests/test_deep_e2e.py +671 -0
tests/test_statistical_e2e.py +874 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,15 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-connected.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-full-conversation.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-general-knowledge.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-interface.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-post-training.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-pre-training.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-query-typing.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-recall-pemberton.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-recall-thunderbiscuit.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-recall-zelnorite.png filter=lfs diff=lfs merge=lfs -text
+figures/jarvis-settings-mlx.png filter=lfs diff=lfs merge=lfs -text
+paper.pdf filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,12 @@

+node_modules/
+__pycache__/
+*.pyc
+.DS_Store
+paper-page-*.png
+*.egg-info/
+dist/
+build/
+.pytest_cache/
+large_dataset.py
+package.json
+package-lock.json

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2026 E. Elbaz
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,141 @@

+---
+title: "JIT LoRA: Real-Time Conversational Knowledge Injection on Apple Silicon via MLX"
+emoji: "\u26a1"
+colorFrom: cyan
+colorTo: blue
+sdk: static
+pinned: false
+license: mit
+tags:
+  - lora
+  - apple-silicon
+  - mlx
+  - fine-tuning
+  - jit-training
+  - real-time
+  - on-device
+  - research
+  - paper
+language:
+  - en
+---
+# JIT LoRA: Real-Time Conversational Knowledge Injection on Apple Silicon via MLX
+**E. Elbaz** | Independent Research | March 2026
+[Paper (PDF)](paper.pdf) | [GitHub](https://github.com/eelbaz/jit-lora) | [LaTeX Source](paper.tex)
+---
+## Abstract
+A system for just-in-time (JIT) LoRA training that modifies a running language model's weights mid-conversation on consumer Apple Silicon hardware. Using MLX-native autograd for gradient-based LoRA adaptation, the system — J.A.R.V.I.S., a voice-enabled AI assistant — updates its own weights after every response via background backpropagation.
+## Key Results
+### Results (35 real-world facts, Qwen3.5-2B-Base, 3 independent trials)
+| Metric | Pooled | 95% Wilson CI |
+|---|---|---|
+| **Recall** | 61/105 (58.1%) | [48.5%, 67.1%] |
+| **General Knowledge** | 60/60 (100.0%) | [94.0%, 100.0%] |
+**Training:** 180 steps, 69.6s ± 1.2s on M4 Max. **Zero catastrophic forgetting.**
+### Per-Category Recall
+| Category | Score | 95% CI |
+|---|---|---|
+| Science | 3/3 (100%) | [43.8%, 100.0%] |
+| Sports | 16/18 (88.9%) | [67.2%, 96.9%] |
+| Awards | 18/21 (85.7%) | [65.4%, 95.0%] |
+| Weather/Natural Events | 12/15 (80.0%) | [54.8%, 93.0%] |
+| Technology/Business | 2/3 (66.7%) | [20.8%, 93.9%] |
+| Entertainment | 4/12 (33.3%) | [13.8%, 60.9%] |
+| Deaths/Obituaries | 6/33 (18.2%) | [8.6%, 34.4%] |
+| **Excl. Deaths** | **55/72 (76.4%)** | **[65.4%, 84.8%]** |
+### Cross-Domain Scaling (41 fictional facts, 10 interlocked domains)
+| Category | Score |
+|---|---|
+| Direct Recall | 11/16 (69%) |
+| Generalization | 9/16 (56%) |
+| Cross-Domain Multi-Hop | 4/8 (50%) |
+| Negation/Boundary | 5/5 (100%) |
+| General Knowledge | 10/10 (100%) |
+## Critical Findings
+1. **Learning rate 10x higher than standard LoRA** (5e-4 vs 5e-5): JIT learning needs convergence in ~4 epochs, not thousands of steps. Gradient clipping (1.0) prevents instability.
+2. **≥33% regularization ratio eliminates catastrophic forgetting**: Below this threshold, the model overwrites core knowledge. At ≥33%, general knowledge is preserved at 100% (CI: [94.0%, 100.0%]).
+3. **mx.compile() hurts short training runs**: The ~20s first-trace overhead is not amortized in <200 steps. Per-step time is ~390ms without compilation.
+4. **Batching doesn't help on Apple Silicon**: Memory-bandwidth-limited, not compute-limited. Batch=8 takes 2.5s/step vs 0.42s/step for batch=1.
+5. **Structurally similar facts confuse small models**: Deaths/obituaries (18.2%) all follow "[Person] died on [Date]" pattern. The model learns the category but fabricates dates. Distinctive patterns (Sports, Awards) achieve 85-100%.
+## Architecture
+```
+User → React Frontend → Express Proxy → Neural Daemon (FastAPI, :8766)
+                                              ↓
+                                    MLX Inference + LoRA Adapter
+                                              ↓
+                                    SSE Token Stream → Frontend → TTS
+                                              ↓
+                               [After response] Background LoRA Training
+                                              ↓
+                                    Updated adapter for next query
+```
+## Hardware
+- Apple Silicon Mac (M-series)
+- Tested on M4 Max, 128GB unified memory
+- Models ≤2B should work on 16GB machines
+## Configuration
+| Parameter | Value | Why |
+|---|---|---|
+| Learning rate | 5e-4 | 10x standard; converges in ~4 epochs |
+| LoRA rank | 32 | Capacity for ~35 facts per session |
+| LoRA targets | q, v, out, down_proj | Broad coverage (attention + MLP) |
+| Max epochs | 15 | Early stop fires sooner |
+| Regularization | ≥33% | Below this: catastrophic forgetting |
+| Batch size | 1 | Per-example steps; batching doesn't help |
+## Reproducing
+```bash
+pip install mlx mlx-lm fastapi uvicorn requests
+# Self-test
+python3 src/mlx_lora_trainer.py
+# Full E2E
+python3 src/neural_daemon.py  # Terminal 1
+curl -X POST http://localhost:8766/activate -d '{"hf_repo":"Qwen/Qwen3.5-2B-Base"}'
+python3 tests/test_daemon_e2e.py         # 4 facts, 20s
+python3 tests/test_deep_e2e.py           # 41 facts, 121s
+python3 tests/test_statistical_e2e.py    # 35+ facts, 3 trials, ~4 min
+```
+## Citation
+```bibtex
+@article{elbaz2026jitlora,
+  title={JIT LoRA: Real-Time Conversational Knowledge Injection on Apple Silicon via MLX},
+  author={Elbaz, E.},
+  year={2026},
+  url={https://github.com/eelbaz/jit-lora}
+}
+```
+## License
+MIT License

figures/jarvis-connected.png ADDED Viewed

Git LFS Details

SHA256: 73a993e30f366c50fc6bbd45312adbb899290ddd074e8f1362f511a3f153e0b9
Pointer size: 131 Bytes
Size of remote file: 227 kB

figures/jarvis-full-conversation.png ADDED Viewed

Git LFS Details

SHA256: 0738499e70b63a02f18089839bfb1807a1422c9babeb0837834e00a33aa0880e
Pointer size: 132 Bytes
Size of remote file: 3.28 MB

figures/jarvis-general-knowledge.png ADDED Viewed

Git LFS Details

SHA256: 3ae3bd4635c3cc73ec8752261bdbe54f68559d3475c0175e5dc0324196607498
Pointer size: 132 Bytes
Size of remote file: 1.89 MB

figures/jarvis-interface.png ADDED Viewed

Git LFS Details

SHA256: f79a34f90774aeaf38f9a1c1978ef13d1b6fb2d512d4df290315bee0c9d5b95a
Pointer size: 132 Bytes
Size of remote file: 1.8 MB

figures/jarvis-post-training.png ADDED Viewed

Git LFS Details

SHA256: be6c13ce8dccc6c71e41c8d4ecaab76d31c567ef974763c0fa166e9aa623bede
Pointer size: 132 Bytes
Size of remote file: 1.85 MB

figures/jarvis-pre-training.png ADDED Viewed

Git LFS Details

SHA256: c0228170f08a8e8c5a2316670cd150b0d11d29adcbe78dd4b6ce03a6af85a039
Pointer size: 132 Bytes
Size of remote file: 1.84 MB

figures/jarvis-query-typing.png ADDED Viewed

Git LFS Details

SHA256: ec5ef13a515cd04c677b392af486b9ef8af1eba597099b52f31dc39c4b59d3e7
Pointer size: 132 Bytes
Size of remote file: 3.02 MB

figures/jarvis-recall-pemberton.png ADDED Viewed

Git LFS Details

SHA256: 2772614d656098e5799d6d2eddd504d408170bd7026de7e0b112dc6a47919b28
Pointer size: 132 Bytes
Size of remote file: 2.2 MB

figures/jarvis-recall-thunderbiscuit.png ADDED Viewed

Git LFS Details

SHA256: 8724a239f8232e36c451495ac6f17d44e59163015fb7d8e9429f9622d88d6e48
Pointer size: 132 Bytes
Size of remote file: 3.07 MB

figures/jarvis-recall-zelnorite.png ADDED Viewed

Git LFS Details

SHA256: 787b2a33a863fab1a27ab757dc509e9aeafc736f301fe7861f569cb96759ac1d
Pointer size: 132 Bytes
Size of remote file: 3.17 MB

figures/jarvis-settings-mlx.png ADDED Viewed

Git LFS Details

SHA256: 66a70ef105e6289088c28fe03182375cfd7fb8e9206fa58034ae7587291c644d
Pointer size: 131 Bytes
Size of remote file: 225 kB

paper.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8428b2af7a9c0e5b20fd3e20f433e6d02c0f25d7135ab41bb3fd1d0b135f3760
+size 5868721

paper.tex ADDED Viewed

	@@ -0,0 +1,486 @@

+\documentclass[11pt,a4paper]{article}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{amsmath,amssymb}
+\usepackage{booktabs}
+\usepackage{graphicx}
+\usepackage{hyperref}
+\usepackage{listings}
+\usepackage{xcolor}
+\usepackage[margin=1in]{geometry}
+\usepackage{caption}
+\usepackage{subcaption}
+\usepackage{enumitem}
+\usepackage{authblk}
+\usepackage{multicol}
+\usepackage{float}
+\definecolor{codegreen}{rgb}{0,0.6,0}
+\definecolor{codegray}{rgb}{0.5,0.5,0.5}
+\definecolor{codepurple}{rgb}{0.58,0,0.82}
+\definecolor{backcolour}{rgb}{0.95,0.95,0.95}
+\lstdefinestyle{mystyle}{
+    backgroundcolor=\color{backcolour},
+    commentstyle=\color{codegreen},
+    keywordstyle=\color{codepurple},
+    numberstyle=\tiny\color{codegray},
+    stringstyle=\color{codegreen},
+    basicstyle=\ttfamily\small,
+    breakatwhitespace=false,
+    breaklines=true,
+    captionpos=b,
+    keepspaces=true,
+    numbers=left,
+    numbersep=5pt,
+    showspaces=false,
+    showstringspaces=false,
+    showtabs=false,
+    tabsize=2,
+    frame=single
+}
+\lstset{style=mystyle}
+\title{JIT LoRA: Real-Time Conversational Knowledge Injection\\on Apple Silicon via MLX}
+\author[1]{E. Elbaz}
+\affil[1]{Independent Research}
+\date{March 2026}
+\begin{document}
+\maketitle
+\begin{abstract}
+We present a system for just-in-time (JIT) LoRA training that modifies a running language model's weights mid-conversation on consumer Apple Silicon hardware. Using MLX-native autograd~\cite{mlx2023} for gradient-based LoRA~\cite{hu2021lora} adaptation, the system---J.A.R.V.I.S., a voice-enabled AI assistant---updates its own weights after every response via background backpropagation. We validate on three evaluation tracks: (1)~a controlled fictional-fact experiment achieving 4/4 recall ($n=4$), (2)~a cross-domain scaling test with 41 interlocked facts achieving 69\% direct recall with 50\% multi-hop reasoning, and (3)~a statistically rigorous evaluation against \textbf{35 real-world facts} the model verifiably did not know, yielding \textbf{58.1\% recall} (95\% Wilson CI: [48.5\%, 67.1\%], $n=105$ pooled across 3 independent trials) with \textbf{100\% general knowledge preservation} (CI: [94.0\%, 100.0\%], $n=60$). Training completes in 70 seconds for 35 facts on a 2B-parameter model. Per-category analysis reveals strong performance on structurally distinctive facts (Sports 88.9\%, Awards 85.7\%, Weather 80.0\%) with systematic failure on structurally homogeneous facts (Deaths 18.2\%), establishing both the capabilities and limits of JIT LoRA on small models.
+\end{abstract}
+\section{Introduction}
+Can a language model update its own weights \emph{while you're still reading its reply}? We investigate whether real-time LoRA weight updates during conversation can achieve reliable fact recall on consumer Apple Silicon hardware, without catastrophic forgetting~\cite{mccloskey1989catastrophic} of existing knowledge.
+The initial approach used Apple's Neural Engine (ANE) directly---reverse-engineering the private \texttt{AppleNeuralEngine.framework} via the open-source ANE bridge~\cite{ane_bridge}. The idea: compile LoRA forward and backward kernels into MIL (Machine Learning Intermediate Language) programs, execute them on the ANE via IOSurface-backed tensors, and run adapter training on dedicated hardware while the GPU handles base model inference.
+The ANE path produced working forward kernels (\texttt{ane\_mil\_lora.py} compiles 4 kernels per adapter: \texttt{lora\_down}, \texttt{lora\_up}, \texttt{grad\_b}, \texttt{grad\_a}), but hit a fundamental wall: ANE kernels produce numpy arrays via IOSurface---opaque to any autograd system. For real gradient-based training, the entire computation graph must be differentiable.
+The solution: MLX~\cite{mlx2023}. Apple's array framework provides native autograd (\texttt{nn.value\_and\_grad}) that runs on Apple Silicon's unified memory. The base model runs on GPU, LoRA~\cite{hu2021lora} adapters inject differentiable rank-decomposition layers, and \texttt{optim.Adam} updates weights through real backpropagation. The ANE kernels remain in the codebase for a future hybrid inference path (Section~\ref{sec:future}), but the training loop is pure MLX.
+\section{Related Work}
+\paragraph{LoRA and parameter-efficient fine-tuning.} LoRA~\cite{hu2021lora} injects trainable low-rank matrices into frozen pretrained weights, reducing trainable parameters by orders of magnitude. QLoRA~\cite{dettmers2023qlora} extends this to quantized models. Both target offline fine-tuning on large datasets over thousands of steps; our work applies LoRA in a real-time, few-shot regime (48--220 steps) during live conversation.
+\paragraph{Catastrophic forgetting and continual learning.} Neural networks famously overwrite prior knowledge when trained on new data~\cite{mccloskey1989catastrophic}. Elastic Weight Consolidation~\cite{kirkpatrick2017overcoming} penalizes changes to important weights; experience replay~\cite{rolnick2019experience} interleaves old data during training. We adopt experience replay: $\geq$33\% of each training batch consists of general-knowledge Q\&A pairs, which we find sufficient to eliminate catastrophic forgetting entirely (Section~\ref{sec:ablation-reg}).
+\paragraph{On-device and edge training.} MLX~\cite{mlx2023} provides a NumPy-like API with automatic differentiation on Apple Silicon's unified memory architecture. While most on-device ML work focuses on inference (quantization, pruning), we use MLX for full gradient-based training at interactive speeds.
+\paragraph{Retrieval-augmented generation.} RAG systems inject knowledge at inference time by prepending retrieved documents to the prompt. JIT LoRA offers a complementary approach: modifying weights directly, which avoids context window limitations but requires a training step. The two approaches are not mutually exclusive.
+\paragraph{Hybrid architectures.} Qwen3.5 models use Gated Delta Networks (GDN)~\cite{yang2024gated}, which evolved from Mamba's~\cite{gu2023mamba} selective state space design. These layers use Metal-accelerated kernels for inference that lack autograd support, requiring careful mode switching during training (Section~\ref{sec:hybrid}).
+\section{The System}
+J.A.R.V.I.S. is a full-stack AI assistant: React frontend with a sci-fi voice interface, Express backend for API routing, and a Python FastAPI daemon for MLX inference and training (Figure~\ref{fig:interface}).
+\paragraph{Hardware.} All experiments run on a MacBook Pro with Apple M4 Max (128GB unified memory). The 2B model (Qwen3.5-2B-Base) occupies approximately 4GB in bfloat16.
+\begin{figure}[H]
+\centering
+\includegraphics[width=0.85\textwidth]{figures/jarvis-interface.png}
+\caption{J.A.R.V.I.S. main interface. The orb visualizer responds to audio; the System Logs panel (bottom-right) shows the conversation flow routed through the MLX backend.}
+\label{fig:interface}
+\end{figure}
+The training loop fires after each conversation turn:
+\begin{verbatim}
+  User speaks/types -> Frontend (React) -> Express Proxy (:3001)
+      -> Neural Daemon (:8766) -> MLX Inference with LoRA adapter
+      -> SSE token stream -> Frontend display + TTS
+  [After response completes] Response text -> Training Data Manager
+      -> LoRA backprop (Adam + cosine LR) -> Adapter weights updated
+      -> Next inference uses updated knowledge
+\end{verbatim}
+The daemon alternates inference and training through a single GPU lock (\texttt{threading.Lock}). After each response, the \texttt{auto\_train} system queues a background training cycle. The next query uses the updated adapter---no model reload, no restart. Training and inference do not run simultaneously; the GPU lock serializes access.
+\subsection{LoRA Architecture}
+Rank-32 LoRA~\cite{hu2021lora} adapters inject into four projection matrices per layer:
+\begin{equation}
+    y = W_{\text{base}} x + (x A B) \cdot \frac{\alpha}{r}, \quad A \in \mathbb{R}^{d \times 32}, \; B \in \mathbb{R}^{32 \times d}
+\end{equation}
+with $B$ initialized to zeros (model behavior unchanged until training begins). Targets: $W_q, W_v, W_{\text{out}}, W_{\text{down}}$ across all 24 layers, yielding 10.3M trainable parameters (0.54\% of 1.9B total).
+\subsection{Hybrid Architecture: Gated Delta Network Layers}
+\label{sec:hybrid}
+Qwen3.5 models use Gated Delta Networks (GDN)~\cite{yang2024gated} for linear attention layers, with Metal-accelerated kernels that lack VJP (vector-Jacobian product) support. The key insight from the \texttt{mlx-lm} source:
+\begin{lstlisting}[language=Python, numbers=none]
+# qwen3_5.py line 181: use_kernel = not self.training
+# model.train() -> pure MLX ops (differentiable, for backprop)
+# model.eval()  -> Metal kernels (fast, for inference)
+\end{lstlisting}
+We hoist mode switching to cycle boundaries---\texttt{model.train()} once before the training loop, \texttt{model.eval()} once after---rather than per-step.
+\section{Experiment 1: Controlled Validation (Fictional Facts)}
+We first validate the system on 4 completely fictional facts with zero overlap to any pretraining data:
+\begin{itemize}[noitemsep]
+    \item ``My neighbor's cat is named Thunderbiscuit''
+    \item ``The Pemberton Scale measures dream intensity (0--17)''
+    \item ``Chef Aldric Fenwick created starfire risotto in 2197''
+    \item ``Zelnorite is found exclusively in Mount Pyrrhex caves''
+\end{itemize}
+Each fact is represented by 2--3 phrasing variants in the training set, plus 3 general-knowledge regularization pairs, for 12 training pairs total.
+\begin{table}[H]
+\centering
+\caption{Experiment 1: 4 novel fictional facts, 12 training pairs (9 novel phrasings + 3 regularization). Single run, $n=4$.}
+\label{tab:exp1}
+\begin{tabular}{lcc}
+\toprule
+\textbf{Metric} & \textbf{Baseline} & \textbf{Post-Training} \\
+\midrule
+Direct Recall (4 questions) & 0/4 (0\%) & 4/4 (100\%) \\
+Generalization (4 rephrased) & 0/4 (0\%) & 4/4 (100\%) \\
+General Knowledge (3 real facts) & 3/3 (100\%) & 3/3 (100\%) \\
+\midrule
+Training steps & --- & 48 (4 epochs $\times$ 12 examples) \\
+Training time & --- & 20.2 seconds \\
+Loss & --- & 2.83 $\rightarrow$ 0.14 \\
+\bottomrule
+\end{tabular}
+\end{table}
+\textbf{Caveat:} With $n=4$, this experiment establishes feasibility but is not statistically meaningful. The Wilson 95\% CI for 4/4 recall is [47.3\%, 100\%]. Experiment~3 (Section~\ref{sec:stat}) addresses this limitation with larger $n$.
+\begin{figure}[H]
+\centering
+\includegraphics[width=0.85\textwidth]{figures/jarvis-post-training.png}
+\caption{J.A.R.V.I.S. recalling a novel fact after JIT LoRA training. After 28 training steps (loss: 0.08), the model correctly answers ``What is my neighbor's cat named?'' with ``Thunderbiscuit''---a fact it hallucinated (``Whiskers'') before training.}
+\label{fig:recall}
+\end{figure}
+\section{Experiment 2: Cross-Domain Scaling (41 Fictional Facts)}
+We scale to 41 facts across 10 interlocked fictional domains with deliberate cross-references (e.g., a mineral used to power engines, refined from another mineral, mined on a specific mountain, on an island governed by a fictional sovereignty).
+\begin{table}[H]
+\centering
+\caption{Experiment 2: 41 novel facts, 10 domains, 62 training pairs (41 novel + 21 regularization). Single run.}
+\label{tab:exp2}
+\begin{tabular}{lcc}
+\toprule
+\textbf{Category} & \textbf{Score} & \textbf{Notes} \\
+\midrule
+Direct Recall (16) & 11/16 (69\%) & Core facts reliably absorbed \\
+Generalization (16) & 9/16 (56\%) & Rephrased questions work \\
+Cross-Domain Reasoning (8) & 4/8 (50\%) & Multi-hop chains on a 2B model \\
+Negation/Boundary (5) & 5/5 (100\%) & Correctly denies false premises \\
+General Knowledge (10) & 10/10 (100\%) & Knowledge preserved \\
+\midrule
+Training steps & \multicolumn{2}{c}{220 (early stopping at $\sim$3.5 epochs)} \\
+Training time & \multicolumn{2}{c}{121 seconds} \\
+Loss & \multicolumn{2}{c}{2.97 $\rightarrow$ 0.69} \\
+\bottomrule
+\end{tabular}
+\end{table}
+The 62 training pairs yield 62 steps per epoch; early stopping triggered at approximately 3.5 effective epochs (220 total steps). Each training step takes $\sim$390ms on the M4 Max with the 2B model, which is memory-bandwidth-limited: the entire model ($\sim$4GB) must be read for each forward and backward pass.
+\section{Experiment 3: Statistical Validation (Real-World Facts)}
+\label{sec:stat}
+Experiments 1--2 use fictional facts, which guarantees the model has no prior knowledge but limits sample size. To produce statistically meaningful results, we evaluate against \textbf{real-world events from 2025--2026}---facts that post-date the model's training cutoff (verified per-fact against the base model before training).
+\subsection{Methodology}
+\begin{enumerate}[noitemsep]
+    \item \textbf{Fact sourcing:} 122 facts collected from web search across 8 categories (Sports, Deaths/Obituaries, Awards, Entertainment, Science, Technology/Business, Political Events, Weather/Natural Events). Each fact has a question, canonical answer, and 2--3 verification keywords.
+    \item \textbf{Sampling:} 50 facts are sampled proportionally across categories (to keep training time under 2 minutes). Political Events facts were excluded from the final evaluation because all sampled instances were already known to the base model.
+    \item \textbf{Baseline pre-test:} Each fact is queried against the unmodified base model. A fact is ``confirmed unknown'' if the model's response matches $<$2 of its verification keywords. Facts the model already knows are excluded from training and evaluation.
+    \item \textbf{Training:} Confirmed-unknown facts are converted to training pairs. $\geq$33\% regularization pairs (general-knowledge Q\&A) are added. Training runs for 15 epochs max with early stopping (loss $<$ 0.8 for 2 consecutive epochs).
+    \item \textbf{Post-test:} Each trained fact is queried again. General knowledge questions (20 standard questions, e.g., ``What is the capital of France?'') are tested for preservation.
+    \item \textbf{Trials:} The full pipeline (reset $\rightarrow$ train $\rightarrow$ evaluate) runs 3 independent times with shuffled fact ordering. Results are pooled for confidence interval computation.
+    \item \textbf{Auto-train disabled during evaluation:} The daemon's auto-train feature (which normally fires after each response) is disabled during pre-testing and post-testing to prevent evaluation contamination.
+\end{enumerate}
+\subsection{Results}
+From 50 candidate facts, 35 were confirmed unknown (15 already in the model's knowledge). Three independent trials with shuffled ordering produced the results in Table~\ref{tab:exp3}.
+\begin{table}[H]
+\centering
+\caption{Experiment 3: 35 real-world facts, 52 training pairs (35 novel + 17 regularization), 3 trials. Qwen3.5-2B-Base on M4 Max.}
+\label{tab:exp3}
+\begin{tabular}{lccc}
+\toprule
+\textbf{Metric} & \textbf{Pooled} & \textbf{Per-Trial} & \textbf{95\% Wilson CI} \\
+\midrule
+\textbf{Recall} & 61/105 (58.1\%) & 65.7\%, 54.3\%, 54.3\% & [48.5\%, 67.1\%] \\
+\textbf{General Knowledge} & 60/60 (100.0\%) & 100\%, 100\%, 100\% & [94.0\%, 100.0\%] \\
+\midrule
+Training time & \multicolumn{3}{c}{69.6s $\pm$ 1.2s (180 steps)} \\
+Loss (mean $\pm$ sd) & \multicolumn{3}{c}{1.78 $\pm$ 0.43 $\rightarrow$ 0.36 $\pm$ 0.10} \\
+Per-step time & \multicolumn{3}{c}{$\sim$390ms} \\
+\bottomrule
+\end{tabular}
+\end{table}
+\subsection{Per-Category Analysis}
+Recall varies dramatically by fact category (Table~\ref{tab:categories}), revealing a systematic pattern in what small models learn well vs.\ poorly via JIT LoRA:
+\begin{table}[H]
+\centering
+\caption{Per-category recall pooled across 3 trials. Seven categories had confirmed-unknown facts; Political Events was excluded (all sampled facts were already known to the model).}
+\label{tab:categories}
+\begin{tabular}{lcccl}
+\toprule
+\textbf{Category} & \textbf{Correct} & \textbf{Total} & \textbf{Rate} & \textbf{95\% CI} \\
+\midrule
+Science & 3 & 3 & 100.0\% & [43.8\%, 100.0\%] \\
+Sports & 16 & 18 & 88.9\% & [67.2\%, 96.9\%] \\
+Awards & 18 & 21 & 85.7\% & [65.4\%, 95.0\%] \\
+Weather/Natural Events & 12 & 15 & 80.0\% & [54.8\%, 93.0\%] \\
+Technology/Business & 2 & 3 & 66.7\% & [20.8\%, 93.9\%] \\
+Entertainment & 4 & 12 & 33.3\% & [13.8\%, 60.9\%] \\
+Deaths/Obituaries & 6 & 33 & 18.2\% & [8.6\%, 34.4\%] \\
+\midrule
+\textbf{Excl.\ Deaths} & \textbf{55} & \textbf{72} & \textbf{76.4\%} & [65.4\%, 84.8\%] \\
+\bottomrule
+\end{tabular}
+\end{table}
+\subsection{Failure Analysis: Why Deaths Fail}
+The Deaths/Obituaries category (18.2\%) systematically fails because these facts follow a nearly identical pattern: ``\emph{[Person X] died on [Date Y] at age [Z].}'' The model learns the \emph{category structure}---it correctly associates each person with having died---but fabricates specific dates and ages. Example:
+\begin{quote}
+\textbf{Training:} ``Frank Gehry died on December 5, 2025'' \\
+\textbf{Model output:} ``Frank Gehry\ldots died on February 5, 2025, at the age of 95'' \\
+\textbf{Result:} Knows Gehry died, wrong date. Fails keyword check on ``december 5 2025''.
+\end{quote}
+This is a known limitation of LoRA on small models~\cite{hu2021lora}: with many facts sharing the same structural pattern, the model's limited adapter capacity ($\sim$10M params) blends specific details across similar training examples. Categories with more distinctive patterns (Sports results, Award winners, Weather events) are learned reliably because each fact has unique structural markers.
+\section{Ablation Studies}
+Every parameter was tested empirically. Two parameters dominate; the rest have minimal effect.
+\subsection{Learning Rate: The Decisive Factor}
+\begin{table}[H]
+\centering
+\caption{Learning rate determines training speed. Per-step time is constant ($\sim$390ms) for the 2B model on M4 Max.}
+\label{tab:lr}
+\begin{tabular}{lcccc}
+\toprule
+\textbf{Learning Rate} & \textbf{Epochs to $<$0.5 loss} & \textbf{Steps} & \textbf{Time} & \textbf{Recall} \\
+\midrule
+$5 \times 10^{-5}$ (standard LoRA) & 25+ & 400 & 168s & 4/4$^*$ \\
+$1 \times 10^{-4}$ & 10 & 80 & 35s & 4/4$^*$ \\
+$5 \times 10^{-4}$ (\textbf{ours}) & 4 & 48 & \textbf{20s} & \textbf{4/4}$^*$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+{\small $^*$Measured on the 4-fact fictional experiment (Experiment 1; Table~\ref{tab:exp1}). Statistical validation (Table~\ref{tab:exp3}) uses the 5e-4 rate.}
+The speedup comes entirely from faster convergence, not faster steps. Standard LoRA uses $10^{-4}$ to $5 \times 10^{-5}$ because it trains for thousands of steps on large datasets~\cite{hu2021lora}. JIT learning needs convergence in single-digit epochs. Gradient clipping (norm 1.0) prevents instability at this aggressive rate.
+\subsection{Regularization Ratio: The Catastrophic Forgetting Threshold}
+\label{sec:ablation-reg}
+\begin{table}[H]
+\centering
+\caption{Regularization ratio vs.\ knowledge preservation (measured on Experiment 2). A threshold exists at $\sim$33\%.}
+\label{tab:reg}
+\begin{tabular}{cccc}
+\toprule
+\textbf{Reg.\ Ratio} & \textbf{Novel : Real-World} & \textbf{General Knowledge} & \textbf{Effect} \\
+\midrule
+$\sim$16\% & 41 : 8 & 3/8 (38\%) & Catastrophic forgetting \\
+$\sim$34\% & 41 : 21 & 10/10 (100\%) & Preserved \\
+$\sim$33\% & 35 : 17 & 20/20 (100\%)$^\dagger$ & Preserved (Experiment 3) \\
+\bottomrule
+\end{tabular}
+\end{table}
+{\small $^\dagger$60/60 across 3 trials (CI: [94.0\%, 100.0\%]).}
+At $\sim$16\% regularization, the model overwrites core knowledge~\cite{mccloskey1989catastrophic}---``What is the capital of France?'' $\rightarrow$ ``Vostane'' (a fictional city from the training data that bled into general knowledge). At $\geq$33\%, real-world knowledge is preserved. This is a critical finding for production deployment: always include $\geq$33\% real-world Q\&A pairs in every training batch, consistent with experience replay findings in continual learning~\cite{rolnick2019experience}. Experiment~3 independently confirms this threshold.
+\subsection{What Doesn't Help (and Why)}
+\begin{table}[H]
+\centering
+\caption{Techniques that do NOT improve JIT training on Apple Silicon.}
+\label{tab:nospeedup}
+\begin{tabular}{lcl}
+\toprule
+\textbf{Technique} & \textbf{Effect} & \textbf{Why} \\
+\midrule
+\texttt{mx.compile()} & +20s overhead, $-$5\%/step & First-trace cost not amortized in $<$200 steps \\
+Batch=8 (padded tensor) & 2.5s/step vs 0.42s & Memory-bandwidth-limited \\
+LoRA rank 8 vs 32 & No speed change & Base model forward/backward dominates \\
+\bottomrule
+\end{tabular}
+\end{table}
+Apple Silicon's unified memory architecture means forward and backward passes are \textbf{memory-bandwidth-limited}, not compute-limited. Batching 8 examples into a single padded tensor takes 2.5s per step (vs 0.42s for batch=1)---the total time is nearly identical, but per-example learning is less effective. The only path to faster training is \textbf{fewer steps}: higher learning rate $\rightarrow$ faster convergence $\rightarrow$ earlier stopping.
+\section{Where This Goes: Swarm Agent JIT Learning}
+\label{sec:future}
+\subsection{The Vision}
+The system demonstrated here is single-agent: one model, one adapter, one conversation. The longer-term goal is a \textbf{cognitive swarm}---multiple specialized agents that learn different aspects of the same conversation and compose their knowledge at inference time.
+\begin{verbatim}
+                     Shared Conversation Context
+                              |
+              +---------------+---------------+
+              |               |               |
+        Agent-Facts     Agent-Style     Agent-Tools
+        (LoRA-A)        (LoRA-B)        (LoRA-C)
+              |               |               |
+              +-------+-------+-------+-------+
+                      |               |
+               Adapter Merge    Knowledge Sync
+                      |
+                Unified Response
+\end{verbatim}
+At inference, adapters compose via weight addition: $W = W_{\text{base}} + \sum_i \alpha_i (A_i B_i)$, with dynamic scaling factors $\alpha_i$ adjusted per query based on detected intent.
+\subsection{ANE--GPU Parallelism for Multi-Agent Inference}
+The ANE kernels compiled in \texttt{ane\_mil\_lora.py} represent an untapped compute path. While ANE cannot support autograd (IOSurface tensors are opaque to differentiation), it can accelerate LoRA forward passes during inference:
+\begin{itemize}[noitemsep]
+    \item GPU runs base model forward pass
+    \item ANE simultaneously runs LoRA adapter forward passes (precompiled kernels)
+    \item Results merge on unified memory (zero-copy)
+\end{itemize}
+For multi-agent inference, this means running 3--4 adapter forward passes on ANE while the GPU handles the base model. The training loop remains on GPU (MLX autograd), but inference could benefit from the otherwise-idle Neural Engine. This path is speculative and has not been benchmarked.
+\section{Reproducing This}
+\textbf{Hardware:} Apple Silicon Mac (M-series). Tested on M4 Max, 128GB. Models $\leq$2B parameters should work on 16GB machines.
+\begin{lstlisting}[language=bash, numbers=none]
+pip install mlx mlx-lm fastapi uvicorn requests
+# Self-test (downloads Qwen2.5-0.5B, trains 5 steps)
+python3 src/mlx_lora_trainer.py
+# Full E2E through daemon
+python3 src/neural_daemon.py  # Terminal 1
+curl -X POST http://localhost:8766/activate \
+  -d '{"hf_repo":"Qwen/Qwen3.5-2B-Base"}'
+python3 tests/test_daemon_e2e.py         # 4 facts, 20s
+python3 tests/test_deep_e2e.py           # 41 facts, 121s
+python3 tests/test_statistical_e2e.py    # 35+ facts, 3 trials, ~4 min
+\end{lstlisting}
+Code available at: \url{https://github.com/eelbaz/jit-lora}
+\section{Complete Configuration}
+\begin{table}[H]
+\centering
+\caption{Optimized configuration for JIT LoRA training.}
+\begin{tabular}{lrl}
+\toprule
+\textbf{Parameter} & \textbf{Value} & \textbf{Why} \\
+\midrule
+Learning rate & $5 \times 10^{-4}$ & 10$\times$ standard; converges in $\sim$4 epochs \\
+LR schedule & Cosine $\rightarrow 5 \times 10^{-5}$ & Prevents late-epoch overshoot \\
+Gradient clip & 1.0 & Stability at high LR \\
+LoRA rank & 32 & Capacity for $\sim$35 facts per session \\
+LoRA $\alpha$ & 32 & Scale = $\alpha/r$ = 1.0 \\
+LoRA targets & q, v, out, down\_proj & Broad coverage (attention + MLP) \\
+Max epochs & 15 & Upper bound; early stop fires sooner \\
+Early stop threshold & 0.8 & Conservative \\
+Early stop patience & 2 & Consecutive epochs below threshold \\
+Min epochs & 3 & Don't stop before model has seen the data \\
+Regularization ratio & $\geq$33\% & Below this: catastrophic forgetting \\
+Optimizer & Adam & $\beta_1$=0.9, $\beta_2$=0.999 \\
+\texttt{mx.compile()} & Off & 20s overhead not amortized \\
+Batch size & 1 & Per-example steps; batching doesn't help \\
+\bottomrule
+\end{tabular}
+\end{table}
+\section{Conclusion}
+A language model that updates its own weights mid-conversation runs on a MacBook in 70 seconds for 35 real-world facts, achieving 58.1\% recall with zero knowledge degradation. The critical insights: use a 10$\times$ higher learning rate than standard LoRA~\cite{hu2021lora} (gradient clipping keeps it stable), include $\geq$33\% real-world data to prevent catastrophic forgetting~\cite{mccloskey1989catastrophic}, and don't bother with compilation or batching for short training runs on Apple Silicon.
+The per-category analysis reveals that JIT LoRA on small models works well for facts with distinctive structural patterns (Sports, Awards, Science: 76--100\%) but struggles with structurally similar facts (Deaths: 18\%). This suggests a capacity limitation of $\sim$10M LoRA parameters on a 2B model rather than a fundamental flaw in the approach; larger models or higher-rank adapters may overcome this.
+The system is end-to-end functional---J.A.R.V.I.S. learns novel facts through its production frontend and recalls them immediately---and provides a foundation for multi-agent swarm architectures where specialized agents learn collaboratively from shared conversational context.
+\begin{figure}[H]
+\centering
+\includegraphics[width=0.85\textwidth]{figures/jarvis-general-knowledge.png}
+\caption{General knowledge preservation after LoRA training. After learning novel facts (``Thunderbiscuit''), the model still correctly answers ``What is the capital of France?'' with ``Paris,'' demonstrating zero catastrophic forgetting.}
+\label{fig:general}
+\end{figure}
+\bibliographystyle{plain}
+\begin{thebibliography}{10}
+\bibitem{hu2021lora}
+E.~J. Hu et al.
+\newblock LoRA: Low-rank adaptation of large language models.
+\newblock {\em arXiv:2106.09685}, 2021.
+\bibitem{mlx2023}
+A.~Hannun et al.
+\newblock MLX: An array framework for Apple Silicon.
+\newblock {\em Apple ML Research}, 2023.
+\bibitem{dettmers2023qlora}
+T.~Dettmers et al.
+\newblock QLoRA: Efficient finetuning of quantized language models.
+\newblock {\em arXiv:2305.14314}, 2023.
+\bibitem{mccloskey1989catastrophic}
+M.~McCloskey and N.~J. Cohen.
+\newblock Catastrophic interference in connectionist networks.
+\newblock {\em Psychology of Learning and Motivation}, 24:109--165, 1989.
+\bibitem{rolnick2019experience}
+D.~Rolnick et al.
+\newblock Experience replay for continual learning.
+\newblock {\em NeurIPS}, 2019.
+\bibitem{kirkpatrick2017overcoming}
+J.~Kirkpatrick et al.
+\newblock Overcoming catastrophic forgetting in neural networks.
+\newblock {\em PNAS}, 114(13):3521--3526, 2017.
+\bibitem{gu2023mamba}
+A.~Gu and T.~Dao.
+\newblock Mamba: Linear-time sequence modeling with selective state spaces.
+\newblock {\em arXiv:2312.00752}, 2023.
+\bibitem{yang2024gated}
+S.~Yang et al.
+\newblock Gated delta networks: Improving mamba2 with delta rule.
+\newblock {\em arXiv:2412.06464}, 2024.
+\bibitem{ane_bridge}
+Maderix.
+\newblock ANE: Apple Neural Engine reverse-engineering framework.
+\newblock \url{https://github.com/maderix/ANE}, 2023.
+\end{thebibliography}
+\end{document}

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+mlx>=0.22.0
+mlx-lm>=0.21.0
+fastapi>=0.115.0
+uvicorn>=0.34.0
+requests>=2.31.0
+numpy>=1.26.0

src/ane_lora_trainer.py ADDED Viewed

	@@ -0,0 +1,685 @@

+"""
+ane_lora_trainer.py — LoRA training engine using Apple Neural Engine.
+Manages per-layer LoRA adapters (A & B matrices), compiles ANE kernels once,
+and runs forward/backward passes on ANE hardware. Training loop:
+  1. Forward: base model inference via MLX, with LoRA additions via ANE
+  2. Loss: cross-entropy computed on CPU
+  3. Backward: LoRA gradients computed on ANE
+  4. Update: Adam optimizer on CPU (LoRA params only — tiny, instant)
+The adapter weights live as numpy arrays in shared memory. MLX inference
+reads them (zero-copy via mlx.array), ANE training writes updated values.
+"""
+import json
+import logging
+import math
+import time
+from pathlib import Path
+from typing import Optional
+import numpy as np
+from ane_bridge_py import ANEBridge
+from ane_mil_lora import LoRAKernelSet
+from neural_config import NeuralConfig
+log = logging.getLogger("ane_lora_trainer")
+class LoRAAdapter:
+    """Per-target LoRA adapter (A & B matrices) for all layers."""
+    def __init__(self, n_layers: int, dim: int, rank: int):
+        self.n_layers = n_layers
+        self.dim = dim
+        self.rank = rank
+        # A: [rank, dim] — initialized with small random values (Kaiming)
+        # B: [dim, rank] — initialized to zeros (standard LoRA init)
+        scale = 1.0 / math.sqrt(dim)
+        self.A = [np.random.randn(rank, dim).astype(np.float32) * scale
+                  for _ in range(n_layers)]
+        self.B = [np.zeros((dim, rank), dtype=np.float32)
+                  for _ in range(n_layers)]
+    def param_count(self) -> int:
+        """Total trainable parameters."""
+        return self.n_layers * 2 * self.dim * self.rank
+    def memory_bytes(self) -> int:
+        """Total memory for adapter weights."""
+        return self.param_count() * 4  # fp32
+class AdamState:
+    """Adam optimizer state for LoRA parameters."""
+    def __init__(self, adapter: LoRAAdapter, lr: float = 1e-5,
+                 beta1: float = 0.9, beta2: float = 0.999,
+                 eps: float = 1e-8, weight_decay: float = 0.0):
+        self.lr = lr
+        self.beta1 = beta1
+        self.beta2 = beta2
+        self.eps = eps
+        self.weight_decay = weight_decay
+        self.t = 0  # Step counter
+        n = adapter.n_layers
+        # First moment (m) and second moment (v) for each parameter
+        self.m_A = [np.zeros_like(adapter.A[i]) for i in range(n)]
+        self.v_A = [np.zeros_like(adapter.A[i]) for i in range(n)]
+        self.m_B = [np.zeros_like(adapter.B[i]) for i in range(n)]
+        self.v_B = [np.zeros_like(adapter.B[i]) for i in range(n)]
+    def step(self, adapter: LoRAAdapter,
+             grads_A: list[np.ndarray], grads_B: list[np.ndarray],
+             grad_clip: float = 1.0):
+        """One Adam update step for all layers.
+        Args:
+            adapter: LoRA adapter to update in-place
+            grads_A: list of dA gradients per layer
+            grads_B: list of dB gradients per layer
+            grad_clip: max gradient norm (per-parameter)
+        """
+        self.t += 1
+        bc1 = 1 - self.beta1 ** self.t  # Bias correction
+        bc2 = 1 - self.beta2 ** self.t
+        for i in range(adapter.n_layers):
+            for param, grad, m, v in [
+                (adapter.A, grads_A, self.m_A, self.v_A),
+                (adapter.B, grads_B, self.m_B, self.v_B),
+            ]:
+                g = grad[i]
+                # Gradient clipping (per-parameter norm)
+                gnorm = np.linalg.norm(g)
+                if gnorm > grad_clip:
+                    g = g * (grad_clip / gnorm)
+                # Weight decay (decoupled, AdamW-style)
+                if self.weight_decay > 0:
+                    param[i] -= self.lr * self.weight_decay * param[i]
+                # Adam moments
+                m[i] = self.beta1 * m[i] + (1 - self.beta1) * g
+                v[i] = self.beta2 * v[i] + (1 - self.beta2) * g * g
+                # Bias-corrected update
+                m_hat = m[i] / bc1
+                v_hat = v[i] / bc2
+                param[i] -= self.lr * m_hat / (np.sqrt(v_hat) + self.eps)
+class ANELoRATrainer:
+    """Main training engine orchestrating ANE kernels + optimizer.
+    Usage:
+        trainer = ANELoRATrainer(config)
+        trainer.initialize(n_layers=32, dim=3584)
+        # Per-turn training
+        for input_ids, target_ids in training_data:
+            loss = trainer.train_step(activations, target_logits)
+        # Save adapter
+        trainer.save_adapter("/path/to/adapter/")
+    """
+    def __init__(self, config: NeuralConfig):
+        self.config = config
+        self.ane: Optional[ANEBridge] = None
+        self.kernels: Optional[LoRAKernelSet] = None
+        self.initialized = False
+        # Per-target adapters: {target_name: LoRAAdapter}
+        self.adapters: dict[str, LoRAAdapter] = {}
+        self.optimizers: dict[str, AdamState] = {}
+        # Training stats
+        self.total_steps = 0
+        self.total_cycles = 0
+        self.last_loss = float('inf')
+        self.loss_history: list[float] = []
+        self.adapter_version = 0
+    def initialize(self, n_layers: int, dim: int):
+        """Initialize ANE bridge, compile kernels, create adapters.
+        Args:
+            n_layers: number of transformer layers
+            dim: model hidden dimension
+        """
+        rank = self.config.lora_rank
+        seq = self.config.ane_seq_len
+        scaling = self.config.lora_scaling
+        log.info(f"Initializing ANE LoRA trainer: {n_layers} layers, "
+                 f"dim={dim}, rank={rank}, seq={seq}, scaling={scaling:.2f}")
+        # Init ANE bridge
+        self.ane = ANEBridge()
+        log.info(f"ANE bridge initialized (compile budget: "
+                 f"{self.ane.compile_budget_remaining})")
+        # Compile LoRA kernels (4 kernels total, reused across all layers)
+        self.kernels = LoRAKernelSet(self.ane, dim, rank, seq, scaling)
+        log.info(f"LoRA kernels compiled (4 kernels, "
+                 f"compile count: {self.ane.compile_count})")
+        # Create per-target adapters
+        for target in self.config.lora_targets:
+            adapter = LoRAAdapter(n_layers, dim, rank)
+            self.adapters[target] = adapter
+            self.optimizers[target] = AdamState(
+                adapter,
+                lr=self.config.learning_rate,
+                beta1=self.config.adam_beta1,
+                beta2=self.config.adam_beta2,
+                eps=self.config.adam_eps,
+                weight_decay=self.config.weight_decay,
+            )
+        total_params = sum(a.param_count() for a in self.adapters.values())
+        total_mb = sum(a.memory_bytes() for a in self.adapters.values()) / 1e6
+        log.info(f"Adapters initialized: {len(self.adapters)} targets, "
+                 f"{total_params:,} params ({total_mb:.1f} MB)")
+        self.initialized = True
+        self.n_layers = n_layers
+        self.dim = dim
+    def get_adapter_weights(self, target: str, layer: int) -> tuple[np.ndarray, np.ndarray]:
+        """Get LoRA A and B matrices for a specific target and layer.
+        Used by MLX inference to add LoRA contribution.
+        Returns:
+            (A [rank, dim], B [dim, rank])
+        """
+        adapter = self.adapters[target]
+        return adapter.A[layer], adapter.B[layer]
+    def compute_lora_forward(self, target: str, layer: int,
+                             x: np.ndarray) -> np.ndarray:
+        """Compute LoRA forward pass for one target in one layer on ANE.
+        Args:
+            target: "q_proj" or "v_proj"
+            layer: transformer layer index
+            x: [1, dim, 1, seq] fp32 activation
+        Returns:
+            [1, dim, 1, seq] fp32 LoRA output (to be added to base output)
+        """
+        adapter = self.adapters[target]
+        return self.kernels.forward(x, adapter.A[layer], adapter.B[layer])
+    def compute_lora_backward(self, target: str, layer: int,
+                              grad_out: np.ndarray,
+                              x: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
+        """Compute LoRA gradients for one target in one layer on ANE.
+        Args:
+            target: "q_proj" or "v_proj"
+            layer: transformer layer index
+            grad_out: [1, dim, 1, seq] fp32 upstream gradient
+            x: [1, dim, 1, seq] fp32 saved activation
+        Returns:
+            (dA [rank, dim], dB [dim, rank])
+        """
+        adapter = self.adapters[target]
+        return self.kernels.backward(
+            grad_out, x, adapter.A[layer], adapter.B[layer])
+    def train_step(self, layer_activations: list[np.ndarray],
+                   logits: np.ndarray, target_ids: np.ndarray) -> float:
+        """One complete training step: forward + loss + backward + update.
+        This is the simplified version that computes LoRA gradients
+        using a "shortcut" approach: we approximate the gradient by
+        computing dL/d(lora_output) for each layer independently,
+        treating the base model's gradient flow as given.
+        For the full training loop with proper gradient propagation,
+        the neural_daemon integrates with MLX's autograd.
+        Args:
+            layer_activations: list of [1, dim, 1, seq] per layer
+                (saved during MLX forward pass)
+            logits: [vocab, seq] fp32 model output logits
+            target_ids: [seq] int target token IDs
+        Returns:
+            float: cross-entropy loss value
+        """
+        if not self.initialized:
+            raise RuntimeError("Trainer not initialized")
+        # 1. Compute loss and gradient of logits
+        loss, dlogits = self._cross_entropy_backward(logits, target_ids)
+        # 2. Compute LoRA gradients for each target and layer
+        all_grads: dict[str, tuple[list[np.ndarray], list[np.ndarray]]] = {}
+        for target in self.adapters:
+            grads_A = []
+            grads_B = []
+            for layer_idx in range(self.n_layers):
+                # Get saved activation for this layer
+                x = layer_activations[layer_idx]
+                # For now, use dlogits as approximate gradient signal
+                # In the full implementation, MLX computes proper per-layer gradients
+                # and feeds them through the daemon's training pipeline
+                grad_out = self._approximate_layer_gradient(
+                    layer_idx, dlogits, layer_activations)
+                # Compute LoRA gradients on ANE
+                dA, dB = self.compute_lora_backward(
+                    target, layer_idx, grad_out, x)
+                grads_A.append(dA)
+                grads_B.append(dB)
+            all_grads[target] = (grads_A, grads_B)
+        # 3. Adam update for each target
+        for target, (grads_A, grads_B) in all_grads.items():
+            self.optimizers[target].step(
+                self.adapters[target], grads_A, grads_B,
+                grad_clip=self.config.gradient_clip)
+        self.total_steps += 1
+        self.last_loss = loss
+        self.loss_history.append(loss)
+        return loss
+    def train_micro_step_direct(self, target: str, layer: int,
+                                x: np.ndarray,
+                                grad_out: np.ndarray) -> tuple[float, float]:
+        """Direct micro-training step for a single layer/target.
+        Called by the neural daemon when MLX provides per-layer gradients.
+        This is the primary training interface.
+        Args:
+            target: "q_proj" or "v_proj"
+            layer: layer index
+            x: [1, dim, 1, seq] fp32 activation
+            grad_out: [1, dim, 1, seq] fp32 gradient from MLX
+        Returns:
+            (grad_norm_A, grad_norm_B) for monitoring
+        """
+        # Compute gradients on ANE
+        dA, dB = self.compute_lora_backward(target, layer, grad_out, x)
+        # Update just this layer
+        adapter = self.adapters[target]
+        optimizer = self.optimizers[target]
+        optimizer.t += 1
+        bc1 = 1 - optimizer.beta1 ** optimizer.t
+        bc2 = 1 - optimizer.beta2 ** optimizer.t
+        grad_norm_A = float(np.linalg.norm(dA))
+        grad_norm_B = float(np.linalg.norm(dB))
+        for param_list, grad, m_list, v_list in [
+            (adapter.A, dA, optimizer.m_A, optimizer.v_A),
+            (adapter.B, dB, optimizer.m_B, optimizer.v_B),
+        ]:
+            g = grad
+            gnorm = np.linalg.norm(g)
+            if gnorm > self.config.gradient_clip:
+                g = g * (self.config.gradient_clip / gnorm)
+            if self.config.weight_decay > 0:
+                param_list[layer] -= optimizer.lr * self.config.weight_decay * param_list[layer]
+            m_list[layer] = optimizer.beta1 * m_list[layer] + (1 - optimizer.beta1) * g
+            v_list[layer] = optimizer.beta2 * v_list[layer] + (1 - optimizer.beta2) * g * g
+            m_hat = m_list[layer] / bc1
+            v_hat = v_list[layer] / bc2
+            param_list[layer] -= optimizer.lr * m_hat / (np.sqrt(v_hat) + optimizer.eps)
+        return grad_norm_A, grad_norm_B
+    def run_training_cycle(self, layer_activations: list[np.ndarray],
+                           logits: np.ndarray, target_ids: np.ndarray,
+                           steps: int = 0) -> dict:
+        """Run a full micro-training cycle (multiple steps on same data).
+        Args:
+            layer_activations: per-layer activations from forward pass
+            logits: model output logits
+            target_ids: target token IDs
+            steps: number of steps (0 = use config default)
+        Returns:
+            dict with training metrics
+        """
+        steps = steps or self.config.steps_per_cycle
+        start = time.time()
+        losses = []
+        for step in range(steps):
+            loss = self.train_step(layer_activations, logits, target_ids)
+            losses.append(loss)
+        elapsed = time.time() - start
+        self.total_cycles += 1
+        # Auto-save
+        if (self.config.auto_save_interval > 0 and
+                self.total_cycles % self.config.auto_save_interval == 0):
+            self.save_adapter()
+            self.adapter_version += 1
+        return {
+            "cycle": self.total_cycles,
+            "steps": steps,
+            "initial_loss": losses[0],
+            "final_loss": losses[-1],
+            "mean_loss": float(np.mean(losses)),
+            "elapsed_sec": elapsed,
+            "steps_per_sec": steps / elapsed if elapsed > 0 else 0,
+            "adapter_version": self.adapter_version,
+        }
+    @staticmethod
+    def _cross_entropy_backward(logits: np.ndarray,
+                                target_ids: np.ndarray) -> tuple[float, np.ndarray]:
+        """Compute cross-entropy loss and gradient w.r.t. logits.
+        Args:
+            logits: [vocab, seq] fp32
+            target_ids: [seq] int
+        Returns:
+            (loss, dlogits [vocab, seq])
+        """
+        vocab, seq_len = logits.shape
+        # Stable softmax
+        logits_shifted = logits - logits.max(axis=0, keepdims=True)
+        exp_logits = np.exp(logits_shifted)
+        probs = exp_logits / exp_logits.sum(axis=0, keepdims=True)
+        # Loss: -log(prob of correct token)
+        target_probs = probs[target_ids, np.arange(seq_len)]
+        loss = -np.log(target_probs + 1e-10).mean()
+        # Gradient: probs - one_hot(target)
+        dlogits = probs.copy()
+        dlogits[target_ids, np.arange(seq_len)] -= 1.0
+        dlogits /= seq_len  # Mean reduction
+        return float(loss), dlogits
+    def _approximate_layer_gradient(self, layer_idx: int,
+                                    dlogits: np.ndarray,
+                                    activations: list[np.ndarray]) -> np.ndarray:
+        """Approximate per-layer gradient for standalone training.
+        Uses the layer's activation as a gradient proxy, scaled by layer depth
+        and a lightweight signal from the loss gradient. This avoids the
+        prohibitively expensive random projection from vocab-size space.
+        In the full daemon, MLX computes exact gradients.
+        """
+        seq = self.config.ane_seq_len
+        dim = self.dim
+        # Scale factor: layers closer to output get more gradient
+        depth_scale = (layer_idx + 1) / self.n_layers
+        # Use the layer activation itself as gradient proxy,
+        # scaled by loss gradient magnitude (cheap approximation)
+        activation = activations[layer_idx]  # [1, dim, 1, seq]
+        grad_magnitude = np.sqrt((dlogits ** 2).mean()) * depth_scale
+        # Add small perturbation based on layer index for gradient diversity
+        rng = np.random.RandomState(layer_idx + self.total_steps)
+        noise = rng.randn(1, dim, 1, seq).astype(np.float32) * 0.01
+        grad = (activation * grad_magnitude + noise).astype(np.float32)
+        return grad.reshape(1, dim, 1, seq)
+    def save_adapter(self, path: str = ""):
+        """Save all adapter weights to disk."""
+        path = path or self.config.adapter_dir
+        Path(path).mkdir(parents=True, exist_ok=True)
+        for target, adapter in self.adapters.items():
+            target_dir = Path(path) / target
+            target_dir.mkdir(exist_ok=True)
+            for i in range(adapter.n_layers):
+                np.save(str(target_dir / f"A_{i:03d}.npy"), adapter.A[i])
+                np.save(str(target_dir / f"B_{i:03d}.npy"), adapter.B[i])
+        # Save metadata
+        meta = {
+            "n_layers": self.n_layers,
+            "dim": self.dim,
+            "rank": self.config.lora_rank,
+            "targets": list(self.adapters.keys()),
+            "total_steps": self.total_steps,
+            "total_cycles": self.total_cycles,
+            "last_loss": self.last_loss,
+            "adapter_version": self.adapter_version,
+            "timestamp": time.time(),
+        }
+        with open(Path(path) / "adapter_meta.json", "w") as f:
+            json.dump(meta, f, indent=2)
+        log.info(f"Adapter saved to {path} (v{self.adapter_version}, "
+                 f"{self.total_steps} steps, loss={self.last_loss:.4f})")
+    def load_adapter(self, path: str = ""):
+        """Load adapter weights from disk."""
+        path = path or self.config.adapter_dir
+        meta_path = Path(path) / "adapter_meta.json"
+        if not meta_path.exists():
+            log.warning(f"No adapter found at {path}")
+            return False
+        with open(meta_path) as f:
+            meta = json.load(f)
+        for target in meta["targets"]:
+            if target not in self.adapters:
+                log.warning(f"Adapter target {target} not in current config")
+                continue
+            adapter = self.adapters[target]
+            target_dir = Path(path) / target
+            for i in range(min(meta["n_layers"], adapter.n_layers)):
+                a_path = target_dir / f"A_{i:03d}.npy"
+                b_path = target_dir / f"B_{i:03d}.npy"
+                if a_path.exists() and b_path.exists():
+                    adapter.A[i] = np.load(str(a_path))
+                    adapter.B[i] = np.load(str(b_path))
+        self.total_steps = meta.get("total_steps", 0)
+        self.total_cycles = meta.get("total_cycles", 0)
+        self.last_loss = meta.get("last_loss", float('inf'))
+        self.adapter_version = meta.get("adapter_version", 0)
+        log.info(f"Adapter loaded from {path} (v{self.adapter_version}, "
+                 f"{self.total_steps} steps)")
+        return True
+    def reset_adapter(self):
+        """Reset all adapters to initial values (fresh start)."""
+        for target, adapter in self.adapters.items():
+            scale = 1.0 / math.sqrt(adapter.dim)
+            for i in range(adapter.n_layers):
+                adapter.A[i] = np.random.randn(
+                    adapter.rank, adapter.dim).astype(np.float32) * scale
+                adapter.B[i] = np.zeros(
+                    (adapter.dim, adapter.rank), dtype=np.float32)
+            # Reset optimizer state
+            optimizer = self.optimizers[target]
+            optimizer.t = 0
+            for i in range(adapter.n_layers):
+                optimizer.m_A[i].fill(0)
+                optimizer.v_A[i].fill(0)
+                optimizer.m_B[i].fill(0)
+                optimizer.v_B[i].fill(0)
+        self.total_steps = 0
+        self.total_cycles = 0
+        self.last_loss = float('inf')
+        self.loss_history.clear()
+        self.adapter_version += 1
+        log.info("Adapter reset to initial values")
+    def update_learning_rate(self, lr: float):
+        """Update learning rate for all optimizers."""
+        for opt in self.optimizers.values():
+            opt.lr = lr
+        self.config.learning_rate = lr
+    def stats(self) -> dict:
+        """Return training statistics."""
+        total_params = sum(a.param_count() for a in self.adapters.values())
+        total_mb = sum(a.memory_bytes() for a in self.adapters.values()) / 1e6
+        result = {
+            "initialized": self.initialized,
+            "total_params": total_params,
+            "adapter_memory_mb": round(total_mb, 1),
+            "targets": list(self.adapters.keys()),
+            "total_steps": self.total_steps,
+            "total_cycles": self.total_cycles,
+            "last_loss": self.last_loss,
+            "adapter_version": self.adapter_version,
+        }
+        if self.ane:
+            result["ane_compile_count"] = self.ane.compile_count
+            result["ane_compile_budget"] = self.ane.compile_budget_remaining
+        if self.loss_history:
+            recent = self.loss_history[-10:]
+            result["recent_avg_loss"] = round(float(np.mean(recent)), 4)
+        return result
+    def cleanup(self):
+        """Free ANE resources."""
+        if self.kernels:
+            self.kernels.free()
+            self.kernels = None
+        self.initialized = False
+        log.info("ANE LoRA trainer cleaned up")
+def self_test():
+    """Test the training engine with a small model."""
+    logging.basicConfig(level=logging.INFO,
+                        format="%(name)s: %(message)s")
+    print("ANE LoRA Trainer Self-Test")
+    print("=" * 50)
+    config = NeuralConfig()
+    config.lora_rank = 16
+    config.lora_targets = ["q_proj", "v_proj"]
+    config.ane_seq_len = 16
+    config.learning_rate = 1e-4  # Higher LR for test
+    config.adapter_dir = "/tmp/jarvis_lora_test"
+    config.resolve_paths()
+    trainer = ANELoRATrainer(config)
+    # Test with small dims
+    n_layers = 4
+    dim = 64
+    seq = 16
+    vocab = 128
+    print(f"\nInitializing: {n_layers} layers, dim={dim}, rank={config.lora_rank}")
+    trainer.initialize(n_layers, dim)
+    print(f"[OK] Initialized: {trainer.stats()['total_params']:,} params")
+    # Test forward pass
+    print("\nTesting LoRA forward pass...")
+    x = np.random.randn(1, dim, 1, seq).astype(np.float32) * 0.1
+    out_q = trainer.compute_lora_forward("q_proj", 0, x)
+    out_v = trainer.compute_lora_forward("v_proj", 0, x)
+    print(f"[OK] Forward: q_proj max={np.abs(out_q).max():.6f}, "
+          f"v_proj max={np.abs(out_v).max():.6f}")
+    # Test training step
+    print("\nTesting training step...")
+    activations = [np.random.randn(1, dim, 1, seq).astype(np.float32) * 0.1
+                   for _ in range(n_layers)]
+    logits = np.random.randn(vocab, seq).astype(np.float32)
+    target_ids = np.random.randint(0, vocab, size=seq)
+    loss = trainer.train_step(activations, logits, target_ids)
+    print(f"[OK] Training step: loss={loss:.4f}")
+    # Test multiple steps (verify loss changes)
+    print("\nRunning 5 training steps...")
+    losses = [loss]
+    for _ in range(4):
+        l = trainer.train_step(activations, logits, target_ids)
+        losses.append(l)
+    print(f"[OK] Losses: {[f'{l:.4f}' for l in losses]}")
+    print(f"     Steps completed: {trainer.total_steps}")
+    # Test direct micro-step
+    print("\nTesting direct micro-step...")
+    grad_out = np.random.randn(1, dim, 1, seq).astype(np.float32) * 0.01
+    gn_a, gn_b = trainer.train_micro_step_direct("q_proj", 0, x, grad_out)
+    print(f"[OK] Micro-step: grad_norm_A={gn_a:.6f}, grad_norm_B={gn_b:.6f}")
+    # Test save/load
+    print("\nTesting save/load...")
+    trainer.save_adapter()
+    # Get current weights
+    A_before, B_before = trainer.get_adapter_weights("q_proj", 0)
+    A_copy = A_before.copy()
+    # Reset and verify weights changed
+    trainer.reset_adapter()
+    A_after, _ = trainer.get_adapter_weights("q_proj", 0)
+    assert not np.allclose(A_copy, A_after), "Reset didn't change weights"
+    # Load and verify weights restored
+    trainer.load_adapter()
+    A_loaded, _ = trainer.get_adapter_weights("q_proj", 0)
+    assert np.allclose(A_copy, A_loaded), "Loaded weights don't match saved"
+    print("[OK] Save/load round-trip verified")
+    # Cleanup
+    trainer.cleanup()
+    print(f"\n[PASS] All trainer tests passed")
+    print(f"       Stats: {trainer.stats()}")
+    # Clean up test files
+    import shutil
+    shutil.rmtree("/tmp/jarvis_lora_test", ignore_errors=True)
+    return True
+if __name__ == "__main__":
+    success = self_test()
+    exit(0 if success else 1)

src/ane_mil_lora.py ADDED Viewed

	@@ -0,0 +1,662 @@

+"""
+ane_mil_lora.py — MIL code generators for LoRA forward and backward passes on ANE.
+Generates Apple Machine Learning Intermediate Language (MIL) programs that
+compile and run on the Neural Engine via libane_bridge.dylib.
+Based on the dynamic matmul pattern from maderix/ANE: weights are packed
+into the spatial dimension of the input IOSurface, enabling weight updates
+without recompilation. Each kernel is compiled ONCE and reused across all
+layers by writing different weights to the IOSurface.
+ANE matmul constraint: all dimensions (channels, spatial, matmul operands)
+must be multiples of 16 with minimum of 16. This means:
+  - LoRA rank must be a multiple of 16 (recommend 16 or 32)
+  - Sequence length must be a multiple of 16 (pad if needed)
+  - Model hidden dimension is typically large enough (e.g. 3584)
+Kernels produced:
+  1. lora_down  — x @ A^T → h          [dim → rank]
+  2. lora_up    — h @ B^T → out * scale [rank → dim]
+  3. grad_b     — grad_out @ h^T → dB   [gradient for B]
+  4. grad_a     — (B^T @ grad_out) @ x^T → dA [gradient for A]
+  5. rmsnorm    — RMSNorm with baked weights
+"""
+import numpy as np
+# Standard MIL header required by ANE's modelWithMILText API
+MIL_HEADER = (
+    'program(1.3)\n'
+    '[buildInfo = dict<string, string>({{"coremlc-component-MIL", "3510.2.1"}, '
+    '{"coremlc-version", "3505.4.1"}, '
+    '{"coremltools-component-milinternal", ""}, '
+    '{"coremltools-version", "9.0"}})]\n'
+    '{\n'
+)
+def _dynamic_matmul_block(prefix: str, ic: int, oc: int, seq: int,
+                          act_sp_off: int, w_sp_off: int,
+                          input_var: str) -> str:
+    """Generate MIL statements for a dynamic matmul within a function.
+    Slices activation [1,ic,1,seq] and weight [1,ic,1,oc] from the input
+    spatial dimension, reshapes for matmul, and produces output [1,oc,1,seq].
+    This is the core building block from maderix's training_dynamic approach.
+    """
+    lines = []
+    # Slice activations: [1, ic, 1, seq] from spatial offset
+    lines.append(f'        tensor<int32, [4]> {prefix}_ba = const()[name = string("{prefix}_ba"), val = tensor<int32, [4]>([0, 0, 0, {act_sp_off}])];')
+    lines.append(f'        tensor<int32, [4]> {prefix}_sa = const()[name = string("{prefix}_sa"), val = tensor<int32, [4]>([1, {ic}, 1, {seq}])];')
+    lines.append(f'        tensor<fp16, [1, {ic}, 1, {seq}]> {prefix}_act = slice_by_size(x = {input_var}, begin = {prefix}_ba, size = {prefix}_sa)[name = string("{prefix}_act")];')
+    # Slice weight: [1, ic, 1, oc] from spatial offset
+    lines.append(f'        tensor<int32, [4]> {prefix}_bw = const()[name = string("{prefix}_bw"), val = tensor<int32, [4]>([0, 0, 0, {w_sp_off}])];')
+    lines.append(f'        tensor<int32, [4]> {prefix}_sw = const()[name = string("{prefix}_sw"), val = tensor<int32, [4]>([1, {ic}, 1, {oc}])];')
+    lines.append(f'        tensor<fp16, [1, {ic}, 1, {oc}]> {prefix}_wt = slice_by_size(x = {input_var}, begin = {prefix}_bw, size = {prefix}_sw)[name = string("{prefix}_wt")];')
+    # Reshape activation: [1,ic,1,seq] → [1,1,ic,seq]
+    lines.append(f'        tensor<int32, [4]> {prefix}_ra = const()[name = string("{prefix}_ra"), val = tensor<int32, [4]>([1, 1, {ic}, {seq}])];')
+    lines.append(f'        tensor<fp16, [1, 1, {ic}, {seq}]> {prefix}_a2 = reshape(shape = {prefix}_ra, x = {prefix}_act)[name = string("{prefix}_a2")];')
+    # Transpose: [1,1,ic,seq] → [1,1,seq,ic]
+    lines.append(f'        tensor<int32, [4]> {prefix}_pm = const()[name = string("{prefix}_pm"), val = tensor<int32, [4]>([0, 1, 3, 2])];')
+    lines.append(f'        tensor<fp16, [1, 1, {seq}, {ic}]> {prefix}_a3 = transpose(perm = {prefix}_pm, x = {prefix}_a2)[name = string("{prefix}_a3")];')
+    # Reshape weight: [1,ic,1,oc] → [1,1,ic,oc]
+    lines.append(f'        tensor<int32, [4]> {prefix}_rw = const()[name = string("{prefix}_rw"), val = tensor<int32, [4]>([1, 1, {ic}, {oc}])];')
+    lines.append(f'        tensor<fp16, [1, 1, {ic}, {oc}]> {prefix}_W = reshape(shape = {prefix}_rw, x = {prefix}_wt)[name = string("{prefix}_W")];')
+    # Core matmul: [1,1,seq,ic] @ [1,1,ic,oc] → [1,1,seq,oc]
+    lines.append(f'        bool {prefix}_bF = const()[name = string("{prefix}_bF"), val = bool(false)];')
+    lines.append(f'        tensor<fp16, [1, 1, {seq}, {oc}]> {prefix}_yh = matmul(transpose_x = {prefix}_bF, transpose_y = {prefix}_bF, x = {prefix}_a3, y = {prefix}_W)[name = string("{prefix}_yh")];')
+    # Transpose back: [1,1,seq,oc] → [1,1,oc,seq]
+    lines.append(f'        tensor<fp16, [1, 1, {oc}, {seq}]> {prefix}_yt = transpose(perm = {prefix}_pm, x = {prefix}_yh)[name = string("{prefix}_yt")];')
+    # Reshape to standard: [1,1,oc,seq] → [1,oc,1,seq]
+    lines.append(f'        tensor<int32, [4]> {prefix}_ro = const()[name = string("{prefix}_ro"), val = tensor<int32, [4]>([1, {oc}, 1, {seq}])];')
+    lines.append(f'        tensor<fp16, [1, {oc}, 1, {seq}]> {prefix}_y = reshape(shape = {prefix}_ro, x = {prefix}_yt)[name = string("{prefix}_y")];')
+    return '\n'.join(lines) + '\n'
+def gen_lora_down_mil(dim: int, rank: int, seq: int) -> tuple[str, int, int]:
+    """Generate MIL for LoRA down-projection: h = x @ A^T.
+    Uses dynamic weight packing:
+      Input:  [1, dim, 1, seq + rank]  (fp32)
+        - spatial[0:seq] = x (activation)
+        - spatial[seq:seq+rank] = A^T (transposed LoRA A matrix)
+      Output: [1, rank, 1, seq]  (fp32)
+    Returns:
+        (mil_text, input_bytes, output_bytes)
+    """
+    sp_in = seq + rank
+    mil = MIL_HEADER
+    mil += f'    func main<ios18>(tensor<fp32, [1, {dim}, 1, {sp_in}]> x) {{\n'
+    # Cast fp32 → fp16
+    mil += f'        string to16 = const()[name = string("to16"), val = string("fp16")];\n'
+    mil += f'        tensor<fp16, [1, {dim}, 1, {sp_in}]> xh = cast(dtype = to16, x = x)[name = string("cin")];\n'
+    # Dynamic matmul: [seq, dim] @ [dim, rank] → [seq, rank]
+    mil += _dynamic_matmul_block("ld", dim, rank, seq, 0, seq, "xh")
+    # Cast fp16 → fp32
+    mil += f'        string to32 = const()[name = string("to32"), val = string("fp32")];\n'
+    mil += f'        tensor<fp32, [1, {rank}, 1, {seq}]> y = cast(dtype = to32, x = ld_y)[name = string("cout")];\n'
+    mil += '    } -> (y);\n}\n'
+    input_bytes = dim * sp_in * 4   # fp32
+    output_bytes = rank * seq * 4   # fp32
+    return mil, input_bytes, output_bytes
+def gen_lora_up_mil(rank: int, dim: int, seq: int,
+                    scaling: float = 1.0) -> tuple[str, int, int]:
+    """Generate MIL for LoRA up-projection: out = (h @ B^T) * scale.
+    Uses dynamic weight packing:
+      Input:  [1, rank, 1, seq + dim]  (fp32)
+        - spatial[0:seq] = h (from lora_down)
+        - spatial[seq:seq+dim] = B^T (transposed LoRA B matrix)
+      Output: [1, dim, 1, seq]  (fp32)
+    Returns:
+        (mil_text, input_bytes, output_bytes)
+    """
+    sp_in = seq + dim
+    mil = MIL_HEADER
+    mil += f'    func main<ios18>(tensor<fp32, [1, {rank}, 1, {sp_in}]> x) {{\n'
+    # Cast fp32 → fp16
+    mil += f'        string to16 = const()[name = string("to16"), val = string("fp16")];\n'
+    mil += f'        tensor<fp16, [1, {rank}, 1, {sp_in}]> xh = cast(dtype = to16, x = x)[name = string("cin")];\n'
+    # Dynamic matmul: [seq, rank] @ [rank, dim] → [seq, dim]
+    mil += _dynamic_matmul_block("lu", rank, dim, seq, 0, seq, "xh")
+    # Scale by lora_alpha/rank
+    if abs(scaling - 1.0) > 1e-6:
+        mil += f'        fp16 sc = const()[name = string("sc"), val = fp16({scaling})];\n'
+        mil += f'        tensor<fp16, [1, {dim}, 1, {seq}]> lu_s = mul(x = lu_y, y = sc)[name = string("scaled")];\n'
+        out_var = "lu_s"
+    else:
+        out_var = "lu_y"
+    # Cast fp16 → fp32
+    mil += f'        string to32 = const()[name = string("to32"), val = string("fp32")];\n'
+    mil += f'        tensor<fp32, [1, {dim}, 1, {seq}]> y = cast(dtype = to32, x = {out_var})[name = string("cout")];\n'
+    mil += '    } -> (y);\n}\n'
+    input_bytes = rank * sp_in * 4
+    output_bytes = dim * seq * 4
+    return mil, input_bytes, output_bytes
+def gen_lora_grad_b_mil(dim: int, rank: int, seq: int,
+                        scaling: float = 1.0) -> tuple[str, int, int]:
+    """Generate MIL for LoRA B gradient: dB = grad_out @ h^T * scale.
+    Input:  [1, dim, 1, seq + seq]  (fp32)
+      - spatial[0:seq]     = grad_out [dim, seq]
+      - spatial[seq:2*seq] = h [dim ??? no, h is [rank, seq]]
+    Actually, grad_out is [dim, seq] and h is [rank, seq].
+    We need matmul(grad_out, h^T) = [dim, seq] @ [seq, rank] = [dim, rank].
+    But grad_out has dim channels and h has rank channels — they can't share
+    the same IC dimension. Solution: use two separate inputs.
+    Input 0: [1, dim, 1, seq]  — grad_out (fp32)
+    Input 1: [1, rank, 1, seq] — h (fp32)
+    Output:  [1, dim, 1, rank] — dB (fp32)
+    We use matmul(transpose_x=False, transpose_y=True):
+      [1,1,dim,seq] @ [1,1,rank,seq]^T = [1,1,dim,rank]
+    Returns:
+        (mil_text, input0_bytes, input1_bytes, output_bytes)
+    """
+    mil = MIL_HEADER
+    mil += f'    func main<ios18>(tensor<fp32, [1, {dim}, 1, {seq}]> go, tensor<fp32, [1, {rank}, 1, {seq}]> h) {{\n'
+    # Cast both to fp16
+    mil += f'        string to16 = const()[name = string("to16"), val = string("fp16")];\n'
+    mil += f'        tensor<fp16, [1, {dim}, 1, {seq}]> go16 = cast(dtype = to16, x = go)[name = string("cgo")];\n'
+    mil += f'        tensor<fp16, [1, {rank}, 1, {seq}]> h16 = cast(dtype = to16, x = h)[name = string("ch")];\n'
+    # Reshape grad_out: [1,dim,1,seq] → [1,1,dim,seq]
+    mil += f'        tensor<int32, [4]> rgo = const()[name = string("rgo"), val = tensor<int32, [4]>([1, 1, {dim}, {seq}])];\n'
+    mil += f'        tensor<fp16, [1, 1, {dim}, {seq}]> go4 = reshape(shape = rgo, x = go16)[name = string("rgo4")];\n'
+    # Reshape h: [1,rank,1,seq] → [1,1,rank,seq]
+    mil += f'        tensor<int32, [4]> rh = const()[name = string("rh"), val = tensor<int32, [4]>([1, 1, {rank}, {seq}])];\n'
+    mil += f'        tensor<fp16, [1, 1, {rank}, {seq}]> h4 = reshape(shape = rh, x = h16)[name = string("rh4")];\n'
+    # matmul(grad_out, h^T): [1,1,dim,seq] @ [1,1,seq,rank] → [1,1,dim,rank]
+    mil += f'        bool bF = const()[name = string("bF"), val = bool(false)];\n'
+    mil += f'        bool bT = const()[name = string("bT"), val = bool(true)];\n'
+    mil += f'        tensor<fp16, [1, 1, {dim}, {rank}]> db4 = matmul(transpose_x = bF, transpose_y = bT, x = go4, y = h4)[name = string("mm")];\n'
+    # Scale
+    if abs(scaling - 1.0) > 1e-6:
+        mil += f'        fp16 sc = const()[name = string("sc"), val = fp16({scaling})];\n'
+        mil += f'        tensor<fp16, [1, 1, {dim}, {rank}]> db_s = mul(x = db4, y = sc)[name = string("scaled")];\n'
+        mm_var = "db_s"
+    else:
+        mm_var = "db4"
+    # Reshape: [1,1,dim,rank] → [1,dim,1,rank]
+    mil += f'        tensor<int32, [4]> ro = const()[name = string("ro"), val = tensor<int32, [4]>([1, {dim}, 1, {rank}])];\n'
+    mil += f'        tensor<fp16, [1, {dim}, 1, {rank}]> db16 = reshape(shape = ro, x = {mm_var})[name = string("rdb")];\n'
+    # Cast to fp32
+    mil += f'        string to32 = const()[name = string("to32"), val = string("fp32")];\n'
+    mil += f'        tensor<fp32, [1, {dim}, 1, {rank}]> dB = cast(dtype = to32, x = db16)[name = string("cout")];\n'
+    mil += '    } -> (dB);\n}\n'
+    in0_bytes = dim * seq * 4
+    in1_bytes = rank * seq * 4
+    out_bytes = dim * rank * 4
+    return mil, in0_bytes, in1_bytes, out_bytes
+def gen_lora_grad_a_mil(dim: int, rank: int, seq: int,
+                        scaling: float = 1.0) -> tuple[str, int, int]:
+    """Generate MIL for LoRA A gradient: dA = B^T @ grad_out @ x^T * scale.
+    This is two chained matmuls:
+      1. tmp = B^T @ grad_out: [rank,dim] @ [dim,seq] → [rank,seq]
+      2. dA = tmp @ x^T:       [rank,seq] @ [seq,dim] → [rank,dim]
+    Input 0: [1, dim, 1, seq + rank]  (fp32) — grad_out + B^T packed
+      - spatial[0:seq]        = grad_out [dim, seq]
+      - spatial[seq:seq+rank] = B^T [dim, rank]
+    Input 1: [1, dim, 1, seq]  (fp32) — x (activation)
+    Output:  [1, rank, 1, dim] (fp32) — dA
+    Returns:
+        (mil_text, input0_bytes, input1_bytes, output_bytes)
+    """
+    sp0 = seq + rank
+    mil = MIL_HEADER
+    mil += f'    func main<ios18>(tensor<fp32, [1, {dim}, 1, {sp0}]> packed, tensor<fp32, [1, {dim}, 1, {seq}]> xin) {{\n'
+    # Cast to fp16
+    mil += f'        string to16 = const()[name = string("to16"), val = string("fp16")];\n'
+    mil += f'        tensor<fp16, [1, {dim}, 1, {sp0}]> ph = cast(dtype = to16, x = packed)[name = string("cp")];\n'
+    mil += f'        tensor<fp16, [1, {dim}, 1, {seq}]> xh = cast(dtype = to16, x = xin)[name = string("cx")];\n'
+    # Step 1: B^T @ grad_out using dynamic matmul helper
+    # Slices grad_out[dim, seq] and B^T[dim, rank] from packed input
+    # matmul: [seq, dim] @ [dim, rank] → [seq, rank]
+    # Result: tmp_y [1, rank, 1, seq]
+    mil += _dynamic_matmul_block("tmp", dim, rank, seq, 0, seq, "ph")
+    # Step 2: tmp @ x^T
+    # tmp is [1, rank, 1, seq], need to matmul with x [1, dim, 1, seq]
+    # Want: [rank, seq] @ [seq, dim] → [rank, dim]
+    # Use matmul(tmp_reshaped, x_reshaped, transpose_y=True... no)
+    # Actually: reshape tmp [1,rank,1,seq] → [1,1,rank,seq]
+    #           reshape x   [1,dim,1,seq]  → [1,1,dim,seq]
+    #           matmul(transpose_y=True): [1,1,rank,seq] @ [1,1,seq,dim] → [1,1,rank,dim]
+    #           But transpose_y=True on [1,1,dim,seq] gives [1,1,seq,dim]
+    #           So matmul(x=tmp4, transpose_y=True, y=x4): [1,1,rank,seq]@[1,1,seq,dim] = [1,1,rank,dim]
+    mil += f'        tensor<int32, [4]> rt = const()[name = string("rt"), val = tensor<int32, [4]>([1, 1, {rank}, {seq}])];\n'
+    mil += f'        tensor<fp16, [1, 1, {rank}, {seq}]> tmp4 = reshape(shape = rt, x = tmp_y)[name = string("rt4")];\n'
+    mil += f'        tensor<int32, [4]> rx = const()[name = string("rx"), val = tensor<int32, [4]>([1, 1, {dim}, {seq}])];\n'
+    mil += f'        tensor<fp16, [1, 1, {dim}, {seq}]> x4 = reshape(shape = rx, x = xh)[name = string("rx4")];\n'
+    mil += f'        bool bF = const()[name = string("bF"), val = bool(false)];\n'
+    mil += f'        bool bT = const()[name = string("bT"), val = bool(true)];\n'
+    mil += f'        tensor<fp16, [1, 1, {rank}, {dim}]> da4 = matmul(transpose_x = bF, transpose_y = bT, x = tmp4, y = x4)[name = string("mm2")];\n'
+    # Scale
+    if abs(scaling - 1.0) > 1e-6:
+        mil += f'        fp16 sc = const()[name = string("sc"), val = fp16({scaling})];\n'
+        mil += f'        tensor<fp16, [1, 1, {rank}, {dim}]> da_s = mul(x = da4, y = sc)[name = string("scaled")];\n'
+        mm_var = "da_s"
+    else:
+        mm_var = "da4"
+    # Reshape: [1,1,rank,dim] → [1,rank,1,dim]
+    mil += f'        tensor<int32, [4]> ro = const()[name = string("ro"), val = tensor<int32, [4]>([1, {rank}, 1, {dim}])];\n'
+    mil += f'        tensor<fp16, [1, {rank}, 1, {dim}]> da16 = reshape(shape = ro, x = {mm_var})[name = string("rda")];\n'
+    # Cast to fp32
+    mil += f'        string to32 = const()[name = string("to32"), val = string("fp32")];\n'
+    mil += f'        tensor<fp32, [1, {rank}, 1, {dim}]> dA = cast(dtype = to32, x = da16)[name = string("cout")];\n'
+    mil += '    } -> (dA);\n}\n'
+    in0_bytes = dim * sp0 * 4
+    in1_bytes = dim * seq * 4
+    out_bytes = rank * dim * 4
+    return mil, in0_bytes, in1_bytes, out_bytes
+def gen_rmsnorm_mil(dim: int, seq: int) -> tuple[str, int, int]:
+    """Generate MIL for RMSNorm: out = (x / sqrt(mean(x^2) + eps)) * weight.
+    Uses baked weight constant from BLOBFILE.
+      Input:  [1, dim, 1, seq]  (fp16)
+      Output: [1, dim, 1, seq]  (fp16)
+    The weight file "@model_path/weights/rms_w.bin" must be provided as
+    a weight blob when compiling.
+    Returns:
+        (mil_text, input_bytes, output_bytes)
+    """
+    inv_dim = 1.0 / dim
+    mil = MIL_HEADER
+    mil += f'    func main<ios18>(tensor<fp16, [1, {dim}, 1, {seq}]> x) {{\n'
+    # x^2
+    mil += f'        tensor<fp16, [1, {dim}, 1, {seq}]> sq = mul(x = x, y = x)[name = string("sq")];\n'
+    # reduce_sum over channels (axis 1), keep_dims
+    mil += f'        tensor<int32, [1]> rax = const()[name = string("rax"), val = tensor<int32, [1]>([1])];\n'
+    mil += f'        bool kd = const()[name = string("kd"), val = bool(true)];\n'
+    mil += f'        tensor<fp16, [1, 1, 1, {seq}]> ss = reduce_sum(x = sq, axes = rax, keep_dims = kd)[name = string("ss")];\n'
+    # mean: sum / dim
+    mil += f'        fp16 invd = const()[name = string("invd"), val = fp16({inv_dim})];\n'
+    mil += f'        tensor<fp16, [1, 1, 1, {seq}]> ss2 = mul(x = ss, y = invd)[name = string("ss2")];\n'
+    # + eps
+    mil += f'        fp16 eps = const()[name = string("eps"), val = fp16(0.00001)];\n'
+    mil += f'        tensor<fp16, [1, 1, 1, {seq}]> ss3 = add(x = ss2, y = eps)[name = string("ss3")];\n'
+    # rsqrt: pow(x, -0.5)
+    mil += f'        fp16 nhalf = const()[name = string("nhalf"), val = fp16(-0.5)];\n'
+    mil += f'        tensor<fp16, [1, 1, 1, {seq}]> rrms = pow(x = ss3, y = nhalf)[name = string("rrms")];\n'
+    # normalize
+    mil += f'        tensor<fp16, [1, {dim}, 1, {seq}]> xr = mul(x = x, y = rrms)[name = string("xr")];\n'
+    # weight (baked)
+    mil += f'        tensor<fp16, [1, {dim}, 1, 1]> rw = const()[name = string("rw"), val = tensor<fp16, [1, {dim}, 1, 1]>(BLOBFILE(path = string("@model_path/weights/rms_w.bin"), offset = uint64(64)))];\n'
+    mil += f'        tensor<fp16, [1, {dim}, 1, {seq}]> out = mul(x = xr, y = rw)[name = string("out")];\n'
+    mil += '    } -> (out);\n}\n'
+    tensor_bytes = dim * seq * 2  # fp16
+    return mil, tensor_bytes, tensor_bytes
+def gen_conv_matmul_mil(dim_in: int, dim_out: int, seq: int) -> tuple[str, int, int]:
+    """Generate MIL for a conv-based linear projection (baked weights).
+    Used for classifier/embedding projections.
+      Input:  [1, dim_in, 1, seq]  (fp32)
+      Output: [1, dim_out, 1, seq] (fp32)
+    Weight: BLOBFILE "embed.bin" [dim_out, dim_in, 1, 1] in fp16.
+    Returns:
+        (mil_text, input_bytes, output_bytes)
+    """
+    mil = MIL_HEADER
+    mil += f'    func main<ios18>(tensor<fp32, [1, {dim_in}, 1, {seq}]> x) {{\n'
+    # Conv constants
+    mil += '        string pt = const()[name = string("pt"), val = string("valid")];\n'
+    mil += '        tensor<int32, [2]> st = const()[name = string("st"), val = tensor<int32, [2]>([1, 1])];\n'
+    mil += '        tensor<int32, [4]> pd = const()[name = string("pd"), val = tensor<int32, [4]>([0, 0, 0, 0])];\n'
+    mil += '        tensor<int32, [2]> dl = const()[name = string("dl"), val = tensor<int32, [2]>([1, 1])];\n'
+    mil += '        int32 gr = const()[name = string("gr"), val = int32(1)];\n'
+    # Cast to fp16
+    mil += f'        string to16 = const()[name = string("to16"), val = string("fp16")];\n'
+    mil += f'        tensor<fp16, [1, {dim_in}, 1, {seq}]> x16 = cast(dtype = to16, x = x)[name = string("cin")];\n'
+    # Baked weight
+    mil += f'        tensor<fp16, [{dim_out}, {dim_in}, 1, 1]> W = const()[name = string("W"), val = tensor<fp16, [{dim_out}, {dim_in}, 1, 1]>(BLOBFILE(path = string("@model_path/weights/embed.bin"), offset = uint64(64)))];\n'
+    # Conv (equivalent to matmul for 1x1 kernel)
+    mil += f'        tensor<fp16, [1, {dim_out}, 1, {seq}]> y16 = conv(dilations = dl, groups = gr, pad = pd, pad_type = pt, strides = st, weight = W, x = x16)[name = string("conv")];\n'
+    # Cast to fp32
+    mil += f'        string to32 = const()[name = string("to32"), val = string("fp32")];\n'
+    mil += f'        tensor<fp32, [1, {dim_out}, 1, {seq}]> y = cast(dtype = to32, x = y16)[name = string("cout")];\n'
+    mil += '    } -> (y);\n}\n'
+    in_bytes = dim_in * seq * 4
+    out_bytes = dim_out * seq * 4
+    return mil, in_bytes, out_bytes
+class LoRAKernelSet:
+    """Pre-compiled set of LoRA kernels for a given model dimension.
+    Compiles 4 kernels once, then reuses them across all layers by
+    writing different weights to the IOSurfaces.
+    """
+    def __init__(self, ane_bridge, dim: int, rank: int, seq: int,
+                 scaling: float = 1.0):
+        """Compile all LoRA kernels.
+        Args:
+            ane_bridge: ANEBridge instance
+            dim: model hidden dimension
+            rank: LoRA rank
+            seq: sequence length
+            scaling: LoRA scaling factor (alpha/rank)
+        """
+        # ANE requires all matmul dims to be multiples of 16
+        for name, val in [("dim", dim), ("rank", rank), ("seq", seq)]:
+            if val < 16 or val % 16 != 0:
+                raise ValueError(
+                    f"ANE requires {name}={val} to be a multiple of 16 (min 16)")
+        self.ane = ane_bridge
+        self.dim = dim
+        self.rank = rank
+        self.seq = seq
+        self.scaling = scaling
+        # Compile kernels
+        self._compile_all()
+    def _compile_all(self):
+        """Compile all 4 LoRA kernels."""
+        # 1. LoRA down: x @ A^T → h
+        mil, in_bytes, out_bytes = gen_lora_down_mil(self.dim, self.rank, self.seq)
+        self.down_kernel = self.ane.compile_kernel(
+            mil, input_sizes=[in_bytes], output_sizes=[out_bytes])
+        self.down_in_bytes = in_bytes
+        self.down_out_bytes = out_bytes
+        # 2. LoRA up: h @ B^T → out * scale
+        mil, in_bytes, out_bytes = gen_lora_up_mil(
+            self.rank, self.dim, self.seq, self.scaling)
+        self.up_kernel = self.ane.compile_kernel(
+            mil, input_sizes=[in_bytes], output_sizes=[out_bytes])
+        self.up_in_bytes = in_bytes
+        self.up_out_bytes = out_bytes
+        # 3. Gradient B: grad_out @ h^T → dB
+        mil, in0, in1, out = gen_lora_grad_b_mil(
+            self.dim, self.rank, self.seq, self.scaling)
+        self.grad_b_kernel = self.ane.compile_kernel(
+            mil, input_sizes=[in0, in1], output_sizes=[out])
+        self.grad_b_in0 = in0
+        self.grad_b_in1 = in1
+        self.grad_b_out = out
+        # 4. Gradient A: (B^T @ grad_out) @ x^T → dA
+        mil, in0, in1, out = gen_lora_grad_a_mil(
+            self.dim, self.rank, self.seq, self.scaling)
+        self.grad_a_kernel = self.ane.compile_kernel(
+            mil, input_sizes=[in0, in1], output_sizes=[out])
+        self.grad_a_in0 = in0
+        self.grad_a_in1 = in1
+        self.grad_a_out = out
+    def forward(self, x: np.ndarray, A: np.ndarray, B: np.ndarray) -> np.ndarray:
+        """Compute LoRA forward: out = (B @ A @ x) * scale.
+        Args:
+            x: [1, dim, 1, seq] fp32 activation
+            A: [rank, dim] fp32 LoRA A matrix
+            B: [dim, rank] fp32 LoRA B matrix
+        Returns:
+            [1, dim, 1, seq] fp32 LoRA output
+        """
+        # Step 1: h = x @ A^T
+        # Pack x and A^T into spatial dimension
+        A_T = A.T  # [dim, rank]
+        packed_down = np.zeros((1, self.dim, 1, self.seq + self.rank), dtype=np.float32)
+        packed_down[:, :, :, :self.seq] = x
+        packed_down[:, :, :, self.seq:] = A_T.reshape(1, self.dim, 1, self.rank)
+        self.ane.write_input(self.down_kernel, 0, packed_down)
+        self.ane.eval(self.down_kernel)
+        h = self.ane.read_output(self.down_kernel, 0,
+                                  (1, self.rank, 1, self.seq), dtype=np.float32)
+        # Step 2: out = h @ B^T * scale
+        B_T = B.T  # [rank, dim]
+        packed_up = np.zeros((1, self.rank, 1, self.seq + self.dim), dtype=np.float32)
+        packed_up[:, :, :, :self.seq] = h
+        packed_up[:, :, :, self.seq:] = B_T.reshape(1, self.rank, 1, self.dim)
+        self.ane.write_input(self.up_kernel, 0, packed_up)
+        self.ane.eval(self.up_kernel)
+        out = self.ane.read_output(self.up_kernel, 0,
+                                    (1, self.dim, 1, self.seq), dtype=np.float32)
+        return out
+    def backward(self, grad_out: np.ndarray, x: np.ndarray,
+                 A: np.ndarray, B: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
+        """Compute LoRA gradients: dA, dB.
+        Args:
+            grad_out: [1, dim, 1, seq] fp32 upstream gradient
+            x: [1, dim, 1, seq] fp32 saved activation
+            A: [rank, dim] fp32 LoRA A matrix
+            B: [dim, rank] fp32 LoRA B matrix
+        Returns:
+            (dA [rank, dim], dB [dim, rank]) fp32 gradients
+        """
+        # Compute h = A @ x (needed for dB)
+        A_T = A.T
+        packed_down = np.zeros((1, self.dim, 1, self.seq + self.rank), dtype=np.float32)
+        packed_down[:, :, :, :self.seq] = x
+        packed_down[:, :, :, self.seq:] = A_T.reshape(1, self.dim, 1, self.rank)
+        self.ane.write_input(self.down_kernel, 0, packed_down)
+        self.ane.eval(self.down_kernel)
+        h = self.ane.read_output(self.down_kernel, 0,
+                                  (1, self.rank, 1, self.seq), dtype=np.float32)
+        # Gradient B: dB = grad_out @ h^T * scale → [dim, rank]
+        self.ane.write_input(self.grad_b_kernel, 0,
+                              np.ascontiguousarray(grad_out))
+        self.ane.write_input(self.grad_b_kernel, 1,
+                              np.ascontiguousarray(h))
+        self.ane.eval(self.grad_b_kernel)
+        dB_raw = self.ane.read_output(self.grad_b_kernel, 0,
+                                       (1, self.dim, 1, self.rank), dtype=np.float32)
+        dB = dB_raw.reshape(self.dim, self.rank)
+        # Gradient A: dA = (B^T @ grad_out) @ x^T * scale → [rank, dim]
+        B_T = B.T  # [rank, dim] — wait, B is [dim, rank], B^T is [rank, dim]
+        # Pack grad_out + B^T into input 0: [1, dim, 1, seq + rank]
+        # B^T is [rank, dim], but we need to pack as [dim, rank] in channel dim...
+        # Actually, for the grad_a kernel: packed = [1, dim, 1, seq+rank]
+        # where spatial[0:seq] = grad_out, spatial[seq:seq+rank] = B (which is [dim, rank])
+        # The dynamic matmul does: [seq, dim] @ [dim, rank] → [seq, rank]
+        # This gives us B^T @ grad_out transposed = (grad_out^T @ B)^T hmm...
+        # Actually the dynamic matmul convention:
+        #   act = grad_out [1, dim, 1, seq] → matmul as [seq, dim]
+        #   W = B [1, dim, 1, rank] → matmul as [dim, rank]
+        #   result = [seq, dim] @ [dim, rank] = [seq, rank]
+        #   which is (B^T @ grad_out)^T in row-major
+        # This is exactly what we want for step 1 of dA computation.
+        packed_a0 = np.zeros((1, self.dim, 1, self.seq + self.rank), dtype=np.float32)
+        packed_a0[:, :, :, :self.seq] = grad_out
+        packed_a0[:, :, :, self.seq:] = B.reshape(1, self.dim, 1, self.rank)
+        self.ane.write_input(self.grad_a_kernel, 0, packed_a0)
+        self.ane.write_input(self.grad_a_kernel, 1,
+                              np.ascontiguousarray(x))
+        self.ane.eval(self.grad_a_kernel)
+        dA_raw = self.ane.read_output(self.grad_a_kernel, 0,
+                                       (1, self.rank, 1, self.dim), dtype=np.float32)
+        dA = dA_raw.reshape(self.rank, self.dim)
+        return dA, dB
+    def free(self):
+        """Free all compiled kernels."""
+        for k in [self.down_kernel, self.up_kernel,
+                  self.grad_b_kernel, self.grad_a_kernel]:
+            if k:
+                self.ane.free_kernel(k)
+def self_test():
+    """Test MIL generators with ANE hardware."""
+    from ane_bridge_py import ANEBridge
+    print("LoRA MIL Generator Self-Test")
+    print("=" * 50)
+    ane = ANEBridge()
+    # ANE requires all matmul dimensions to be multiples of 16 (minimum 16)
+    dim, rank, seq = 64, 16, 16
+    scaling = 2.0
+    # Test 1: Compile all kernels
+    print(f"\nCompiling LoRA kernels (dim={dim}, rank={rank}, seq={seq})...")
+    try:
+        kernels = LoRAKernelSet(ane, dim, rank, seq, scaling)
+        print(f"[OK] All 4 kernels compiled (compile count: {ane.compile_count})")
+    except Exception as e:
+        print(f"[FAIL] Kernel compilation: {e}")
+        return False
+    # Test 2: Forward pass
+    print("\nTesting forward pass...")
+    x = np.random.randn(1, dim, 1, seq).astype(np.float32) * 0.1
+    A = np.random.randn(rank, dim).astype(np.float32) * 0.01
+    B = np.zeros((dim, rank), dtype=np.float32)  # Standard LoRA init
+    try:
+        out = kernels.forward(x, A, B)
+        print(f"[OK] Forward: input {x.shape} → output {out.shape}")
+        print(f"     Output max: {np.abs(out).max():.6f} (should be ~0 with B=0)")
+        # With non-zero B
+        B = np.random.randn(dim, rank).astype(np.float32) * 0.01
+        out = kernels.forward(x, A, B)
+        print(f"     Output max (B≠0): {np.abs(out).max():.6f}")
+        # Verify against numpy
+        x_2d = x.reshape(dim, seq)
+        expected = (B @ A @ x_2d * scaling).reshape(1, dim, 1, seq)
+        err = np.abs(out - expected).max()
+        print(f"     Max error vs numpy: {err:.6f}")
+        if err > 0.5:
+            print(f"[WARN] High error — fp16 rounding may be significant")
+    except Exception as e:
+        print(f"[FAIL] Forward: {e}")
+        kernels.free()
+        return False
+    # Test 3: Backward pass
+    print("\nTesting backward pass...")
+    grad_out = np.random.randn(1, dim, 1, seq).astype(np.float32) * 0.1
+    try:
+        dA, dB = kernels.backward(grad_out, x, A, B)
+        print(f"[OK] Backward: dA {dA.shape}, dB {dB.shape}")
+        print(f"     dA max: {np.abs(dA).max():.6f}")
+        print(f"     dB max: {np.abs(dB).max():.6f}")
+        # Verify shapes
+        assert dA.shape == (rank, dim), f"dA shape {dA.shape} != ({rank}, {dim})"
+        assert dB.shape == (dim, rank), f"dB shape {dB.shape} != ({dim}, {rank})"
+        # Verify non-zero gradients
+        assert np.abs(dA).max() > 0, "dA is all zeros"
+        assert np.abs(dB).max() > 0, "dB is all zeros"
+        # Verify against numpy
+        x_2d = x.reshape(dim, seq)
+        go_2d = grad_out.reshape(dim, seq)
+        h = A @ x_2d  # [rank, seq]
+        expected_dB = go_2d @ h.T * scaling
+        expected_dA = (B.T @ go_2d) @ x_2d.T * scaling
+        err_dB = np.abs(dB - expected_dB).max()
+        err_dA = np.abs(dA - expected_dA).max()
+        print(f"     dB error vs numpy: {err_dB:.6f}")
+        print(f"     dA error vs numpy: {err_dA:.6f}")
+    except Exception as e:
+        print(f"[FAIL] Backward: {e}")
+        import traceback
+        traceback.print_exc()
+        kernels.free()
+        return False
+    kernels.free()
+    print(f"\n[PASS] All LoRA MIL tests passed")
+    print(f"       Final compile count: {ane.compile_count}")
+    return True
+if __name__ == "__main__":
+    success = self_test()
+    exit(0 if success else 1)

src/export_to_lms.py ADDED Viewed

	@@ -0,0 +1,119 @@

+"""
+export_to_lms.py — Export LoRA adapter back to LM Studio.
+Workflow:
+  1. Fuse LoRA adapter with base model via MLX
+  2. Export to GGUF format
+  3. Copy to LM Studio models directory
+  4. Load via lms CLI
+"""
+import json
+import logging
+import shutil
+import subprocess
+import time
+from pathlib import Path
+from typing import Optional
+log = logging.getLogger("export_to_lms")
+def export_adapter_to_lms(config, version: Optional[int] = None) -> dict:
+    """Export current LoRA adapter as GGUF to LM Studio.
+    Args:
+        config: NeuralConfig instance
+        version: adapter version tag (auto if None)
+    Returns:
+        dict with export details
+    """
+    try:
+        import mlx_lm
+    except ImportError:
+        raise RuntimeError("mlx-lm required for export")
+    config.resolve_paths()
+    if version is None:
+        version = int(time.time()) % 100000
+    model_dir = str(Path(config.model_path).parent)
+    adapter_dir = config.adapter_dir
+    export_name = f"{config.model_key}-tuned-v{version}"
+    export_dir = Path(config.base_dir) / "exports" / export_name
+    export_dir.mkdir(parents=True, exist_ok=True)
+    log.info(f"Exporting adapter: {adapter_dir} + {model_dir} → {export_dir}")
+    # Step 1: Fuse adapter with base model
+    # mlx_lm.fuse writes merged weights to output dir
+    try:
+        mlx_lm.fuse(
+            model=model_dir,
+            adapter_path=adapter_dir,
+            save_path=str(export_dir / "merged"),
+        )
+        log.info("LoRA adapter fused with base model")
+    except Exception as e:
+        log.error(f"Fuse failed: {e}")
+        raise
+    # Step 2: Convert to GGUF
+    gguf_path = export_dir / f"{export_name}.gguf"
+    try:
+        # Use mlx_lm convert if available
+        result = subprocess.run(
+            ["python3", "-m", "mlx_lm.convert",
+             "--model", str(export_dir / "merged"),
+             "--quantize", "--q-bits", "4",
+             "-o", str(gguf_path)],
+            capture_output=True, text=True, timeout=600)
+        if result.returncode != 0:
+            log.warning(f"GGUF convert failed: {result.stderr}")
+            # Fallback: just copy the merged model
+            gguf_path = export_dir / "merged"
+    except Exception as e:
+        log.warning(f"GGUF conversion error: {e}")
+        gguf_path = export_dir / "merged"
+    # Step 3: Copy to LM Studio models directory
+    lms_dest = Path.home() / ".lmstudio" / "models" / "jarvis-tuned" / export_name
+    try:
+        lms_dest.mkdir(parents=True, exist_ok=True)
+        if gguf_path.is_file():
+            shutil.copy2(str(gguf_path), str(lms_dest))
+        else:
+            # Copy directory
+            shutil.copytree(str(gguf_path), str(lms_dest), dirs_exist_ok=True)
+        log.info(f"Copied to LM Studio: {lms_dest}")
+    except Exception as e:
+        log.warning(f"Copy to LM Studio failed: {e}")
+    # Step 4: Load via lms CLI
+    lms = config.lms_cli_path
+    if lms:
+        try:
+            subprocess.run(
+                [lms, "load", str(lms_dest)],
+                capture_output=True, timeout=120)
+            log.info(f"Loaded {export_name} in LM Studio")
+        except Exception as e:
+            log.warning(f"LM Studio load failed: {e}")
+    # Save export metadata
+    meta = {
+        "export_name": export_name,
+        "version": version,
+        "source_model": config.model_key,
+        "adapter_dir": adapter_dir,
+        "gguf_path": str(gguf_path),
+        "lms_path": str(lms_dest),
+        "timestamp": time.time(),
+    }
+    with open(export_dir / "export_meta.json", "w") as f:
+        json.dump(meta, f, indent=2)
+    return meta

src/mlx_lora_trainer.py ADDED Viewed

	@@ -0,0 +1,788 @@

+"""
+mlx_lora_trainer.py — Real MLX LoRA training engine with autograd.
+Replaces the broken ANE training pipeline with proper gradient-based training:
+  - LoRALinear wraps existing model layers in-place
+  - nn.value_and_grad() computes exact backprop gradients
+  - Adam optimizer with cosine LR schedule
+  - Thread-safe: gpu_lock for mutual exclusion with inference
+Since LoRA is injected in-place, mlx_lm.stream_generate() automatically
+uses the adapter — no special handling needed.
+"""
+import json
+import logging
+import math
+import threading
+import time
+from pathlib import Path
+from typing import Optional
+import mlx.core as mx
+import mlx.nn as nn
+import mlx.optimizers as optim
+import mlx.utils
+log = logging.getLogger("mlx_lora_trainer")
+# ──────────────────────────────────────────────────────────────
+# LoRA Linear Module
+# ──────────────────────────────────────────────────────────────
+class LoRALinear(nn.Module):
+    """LoRA adapter wrapping any Linear or QuantizedLinear layer.
+    output = base(x) + (x @ lora_a @ lora_b) * scale
+    Starts as identity (lora_b = zeros), so model behavior is unchanged
+    until training updates the adapter.
+    """
+    @classmethod
+    def from_base(cls, base: nn.Module, rank: int = 32, alpha: float = 32.0,
+                  dropout: float = 0.0):
+        """Create LoRALinear from an existing Linear or QuantizedLinear."""
+        if isinstance(base, nn.QuantizedLinear):
+            in_features = base.weight.shape[1] * 32 // base.bits
+            out_features = base.weight.shape[0]
+        elif isinstance(base, nn.Linear):
+            out_features, in_features = base.weight.shape
+        else:
+            raise TypeError(f"Unsupported layer type: {type(base)}")
+        return cls(base, in_features, out_features, rank, alpha, dropout)
+    def __init__(self, base: nn.Module, in_features: int, out_features: int,
+                 rank: int = 32, alpha: float = 32.0, dropout: float = 0.0):
+        super().__init__()
+        self.base = base
+        self.in_features = in_features
+        self.out_features = out_features
+        self.rank = rank
+        self.scale = alpha / rank
+        # LoRA A: Kaiming uniform init, LoRA B: zeros (starts as identity)
+        self.lora_a = mx.random.normal((in_features, rank)) * math.sqrt(2.0 / in_features)
+        self.lora_b = mx.zeros((rank, out_features))
+        self.dropout = dropout
+    def __call__(self, x):
+        base_out = self.base(x)
+        # LoRA path: x @ A @ B * scale
+        lora_input = x
+        if self.dropout > 0 and self.training:
+            # Not commonly needed with small rank, but supported
+            mask = mx.random.bernoulli(1.0 - self.dropout, lora_input.shape)
+            lora_input = lora_input * mask / (1.0 - self.dropout)
+        lora_out = (lora_input @ self.lora_a @ self.lora_b) * self.scale
+        return base_out + lora_out
+# ──────────────────────────────────────────────────────────────
+# LoRA Injection
+# ──────────────────────────────────────────────────────────────
+def _find_model_layers(model):
+    """Find the transformer layers in the model, handling different architectures.
+    Returns the layers list, supporting:
+    - Standard: model.model.layers (Qwen2.5, Llama, etc.)
+    - VL/Hybrid: model.language_model.model.layers (Qwen3.5)
+    - Flat: model.layers (some models)
+    """
+    # Try different paths
+    for path in [
+        lambda m: m.model.layers,
+        lambda m: m.language_model.model.layers,
+        lambda m: m.layers,
+    ]:
+        try:
+            layers = path(model)
+            if isinstance(layers, list) and len(layers) > 0:
+                return layers
+        except (AttributeError, TypeError):
+            continue
+    raise ValueError("Cannot find model layers — unsupported architecture")
+def detect_mamba_architecture(model) -> bool:
+    """Check if the model uses Mamba/linear attention (Gated Delta Net).
+    Mamba-based models (e.g., Qwen3.5) have linear_attn layers with custom
+    Metal scan kernels. These kernels don't support VJP, but calling
+    model.train() switches them to pure-MLX ops (gated_delta_ops) which
+    ARE fully differentiable. model.eval() switches back to fast Metal kernels
+    for inference. See qwen3_5.py: use_kernel=not self.training.
+    """
+    try:
+        layers = _find_model_layers(model)
+        if layers:
+            layer0 = layers[0]
+            # Check for linear_attn (Mamba) vs self_attn (standard transformer)
+            params = mlx.utils.tree_flatten(layer0.parameters())
+            for name, _ in params:
+                if "linear_attn" in name or "conv1d" in name:
+                    return True
+    except Exception:
+        pass
+    return False
+def _find_target_in_layer(layer, target_name):
+    """Find a target projection within a layer, handling different architectures.
+    Supports:
+    - Standard attention: layer.self_attn.{q,k,v,o}_proj
+    - Linear attention: layer.linear_attn.{out_proj, in_proj_qkv}
+    - MLP: layer.mlp.{gate,up,down}_proj
+    """
+    # Standard attention targets
+    attn_targets = {"q_proj", "k_proj", "v_proj", "o_proj"}
+    # Linear attention targets (Mamba-style)
+    linear_attn_targets = {"out_proj", "in_proj_qkv", "in_proj_z"}
+    # MLP targets
+    mlp_targets = {"gate_proj", "up_proj", "down_proj"}
+    if target_name in attn_targets:
+        parent = getattr(layer, "self_attn", None)
+    elif target_name in linear_attn_targets:
+        parent = getattr(layer, "linear_attn", None)
+    elif target_name in mlp_targets:
+        parent = getattr(layer, "mlp", None)
+    else:
+        # Try all known parents
+        for pname in ["self_attn", "linear_attn", "mlp"]:
+            parent = getattr(layer, pname, None)
+            if parent and hasattr(parent, target_name):
+                return parent, getattr(parent, target_name)
+        return None, None
+    if parent is None:
+        return None, None
+    base = getattr(parent, target_name, None)
+    return parent, base
+def inject_lora_into_model(model, config) -> int:
+    """Inject LoRA adapters into model layers in-place.
+    Walks model layers and replaces target projections with LoRALinear.
+    Automatically detects model architecture (standard transformer, hybrid Mamba, VL models).
+    Returns count of injected adapters.
+    Args:
+        model: MLX model (from mlx_lm.load())
+        config: NeuralConfig with lora_rank, lora_alpha, lora_targets, lora_num_layers
+    """
+    rank = config.lora_rank
+    alpha = config.lora_alpha
+    targets = config.lora_targets
+    dropout = config.lora_dropout
+    num_layers = config.lora_num_layers
+    # Freeze all parameters first
+    model.freeze()
+    layers = _find_model_layers(model)
+    n_layers = len(layers)
+    # Determine which layers to adapt
+    if num_layers == -1 or num_layers >= n_layers:
+        layer_indices = range(n_layers)
+    else:
+        layer_indices = range(n_layers - num_layers, n_layers)
+    count = 0
+    skipped_targets = set()
+    for i in layer_indices:
+        layer = layers[i]
+        for target in targets:
+            parent, base_layer = _find_target_in_layer(layer, target)
+            if parent is None or base_layer is None:
+                skipped_targets.add(target)
+                continue
+            # Skip if already wrapped
+            if isinstance(base_layer, LoRALinear):
+                continue
+            # Only wrap Linear/QuantizedLinear
+            if not isinstance(base_layer, (nn.Linear, nn.QuantizedLinear)):
+                skipped_targets.add(target)
+                continue
+            lora_layer = LoRALinear.from_base(base_layer, rank=rank, alpha=alpha,
+                                               dropout=dropout)
+            setattr(parent, target, lora_layer)
+            count += 1
+    # Report injected targets (some may only exist in subset of layers for hybrid models)
+    injected_targets = [t for t in targets if t not in skipped_targets]
+    # For hybrid models, some targets only exist in certain layer types — that's expected
+    # For hybrid models (e.g. Qwen3.5 with both self_attn and linear_attn layers),
+    # a target might exist in some layers but not others — that's fine.
+    if skipped_targets:
+        log.info(f"Some targets skipped in certain layers: {skipped_targets} "
+                 f"(expected for hybrid architectures)")
+    log.info(f"Injected {count} LoRA adapters (rank={rank}, alpha={alpha}, "
+             f"targets={targets}, layers={len(list(layer_indices))})")
+    return count
+# ──────────────────────────────────────────────────────────────
+# MLX LoRA Trainer
+# ──────────────────────────────────────────────────────────────
+class MLXLoRATrainer:
+    """Full MLX LoRA training engine with real autograd.
+    Uses nn.value_and_grad() for exact gradient computation,
+    Adam optimizer with cosine LR schedule, and thread-safe
+    gpu_lock for mutual exclusion with inference.
+    """
+    def __init__(self, model, tokenizer, config):
+        self.model = model
+        self.tokenizer = tokenizer
+        self.config = config
+        self.gpu_lock = threading.Lock()
+        self.is_mamba = detect_mamba_architecture(model)
+        if self.is_mamba:
+            log.info("Model uses Mamba/linear attention (Gated Delta Net). "
+                     "Training uses model.train() to route through pure-MLX ops "
+                     "(gated_delta_ops) for autograd. Inference uses model.eval() "
+                     "to route through fast Metal kernels.")
+        # Inject LoRA adapters
+        self.n_adapters = inject_lora_into_model(model, config)
+        # Count trainable params
+        self._count_params()
+        # Create optimizer
+        self.optimizer = optim.Adam(learning_rate=config.learning_rate)
+        # Create value_and_grad function, JIT-compiled for speed.
+        # mx.compile() traces the graph once and reuses the compiled version,
+        # eliminating per-step graph rebuilding overhead.
+        self._create_compiled_train_fn()
+        # Start in eval mode (inference-ready, uses fast Metal kernels for Mamba)
+        model.eval()
+        # Training state
+        self.total_steps = 0
+        self.total_cycles = 0
+        self.last_loss = float("inf")
+        self.adapter_version = 0
+        self.best_loss = float("inf")
+        self._start_time = time.time()
+        log.info(f"MLXLoRATrainer initialized: {self.n_adapters} adapters, "
+                 f"{self.trainable_params:,} trainable / {self.total_params:,} total "
+                 f"({self.trainable_pct:.1f}%)")
+    def _create_compiled_train_fn(self):
+        """Create the loss+grad function.
+        mx.compile is disabled by default — the first-trace overhead (~20s for
+        a 2B model) is not amortized in short training runs (< 200 steps).
+        The standard path at ~0.22s/step is fast enough with early stopping.
+        """
+        self._raw_loss_and_grad = nn.value_and_grad(self.model, self._loss_fn)
+        self._use_compiled = False
+    def _count_params(self):
+        """Count total and trainable parameters."""
+        total = 0
+        trainable = 0
+        all_params = mlx.utils.tree_flatten(self.model.parameters())
+        for name, param in all_params:
+            n = param.size
+            total += n
+        train_params = mlx.utils.tree_flatten(self.model.trainable_parameters())
+        for name, param in train_params:
+            trainable += param.size
+        self.total_params = total
+        self.trainable_params = trainable
+        self.trainable_pct = 100.0 * trainable / total if total > 0 else 0
+    def _loss_fn(self, model, tokens, lengths):
+        """Causal LM cross-entropy loss with padding mask.
+        Args:
+            model: The MLX model (passed by nn.value_and_grad)
+            tokens: Input token IDs [batch, seq_len+1] — last token is target only
+            lengths: Actual sequence lengths (before padding) [batch]
+        """
+        inputs = tokens[:, :-1]
+        targets = tokens[:, 1:]
+        logits = model(inputs)
+        # Create padding mask: 1 for real tokens, 0 for padding
+        # lengths[i] is the number of real tokens in example i (including the +1 target)
+        seq_len = targets.shape[1]
+        positions = mx.arange(seq_len)  # [seq_len]
+        # Real target positions are 0..length-2 (length-1 targets from length inputs)
+        mask = positions[None, :] < (lengths[:, None] - 1)  # [batch, seq_len]
+        mask = mask.astype(mx.float32)
+        # Cross-entropy
+        # logits: [batch, seq, vocab], targets: [batch, seq]
+        log_probs = nn.losses.cross_entropy(logits, targets, reduction="none")
+        # log_probs: [batch, seq] — per-token losses
+        # Masked mean
+        masked_loss = (log_probs * mask).sum() / mx.clip(mask.sum(), a_min=1, a_max=None)
+        return masked_loss
+    def _get_lr(self) -> float:
+        """Cosine LR schedule with warmup."""
+        step = self.total_steps
+        cfg = self.config
+        warmup_steps = int(cfg.cosine_period_steps * cfg.warmup_fraction)
+        if step < warmup_steps:
+            # Linear warmup
+            return cfg.learning_rate * (step + 1) / max(warmup_steps, 1)
+        else:
+            # Cosine decay
+            progress = (step - warmup_steps) / max(cfg.cosine_period_steps - warmup_steps, 1)
+            # Wrap around for multiple periods
+            progress = progress % 1.0
+            cos_decay = 0.5 * (1.0 + math.cos(math.pi * progress))
+            return cfg.min_learning_rate + (cfg.learning_rate - cfg.min_learning_rate) * cos_decay
+    def _train_step_inner(self, tokens, lengths):
+        """Fast inner training step — assumes model is already in train mode.
+        Called by run_training_cycle() which manages train/eval at cycle level.
+        """
+        lr = self._get_lr()
+        self.optimizer.learning_rate = lr
+        loss, grads = self._raw_loss_and_grad(self.model, tokens, lengths)
+        if self.config.gradient_clip > 0:
+            grads, _ = optim.clip_grad_norm(grads, max_norm=self.config.gradient_clip)
+        self.optimizer.update(self.model, grads)
+        mx.eval(self.model.parameters(), self.optimizer.state, loss)
+        loss_val = loss.item()
+        self.total_steps += 1
+        self.last_loss = loss_val
+        if loss_val < self.best_loss:
+            self.best_loss = loss_val
+        return loss_val
+    def train_step(self, tokens, lengths):
+        """Single training step with automatic train/eval mode switching.
+        Use this for standalone calls (e.g., self-test). For batch training,
+        run_training_cycle() uses _train_step_inner() with mode switch hoisted.
+        """
+        self.model.train()
+        try:
+            lr = self._get_lr()
+            self.optimizer.learning_rate = lr
+            loss, grads = self._raw_loss_and_grad(self.model, tokens, lengths)
+            if self.config.gradient_clip > 0:
+                grads, _ = optim.clip_grad_norm(grads, max_norm=self.config.gradient_clip)
+            self.optimizer.update(self.model, grads)
+            mx.eval(self.model.parameters(), self.optimizer.state, loss)
+            loss_val = loss.item()
+            self.total_steps += 1
+            self.last_loss = loss_val
+            if loss_val < self.best_loss:
+                self.best_loss = loss_val
+            return loss_val
+        finally:
+            self.model.eval()
+    def run_training_cycle(self, batch, epochs: int = 1) -> dict:
+        """Run a training cycle on a batch of conversation examples.
+        Each epoch iterates over ALL examples in the batch with 1 gradient
+        step per example. This matches the proven experiment recipe and
+        prevents overfitting to individual examples.
+        Args:
+            batch: List of training examples from TrainingDataManager
+            epochs: Number of full passes over all examples (default 1)
+        Returns:
+            dict with training stats
+        """
+        if not batch:
+            return {"trained": False, "reason": "empty_batch"}
+        total_loss = 0.0
+        n_steps = 0
+        start = time.time()
+        # Pre-tokenize all examples (each as individual tensors for per-example steps)
+        tokenized = []
+        for example in batch:
+            messages = example.messages if hasattr(example, 'messages') else example
+            if not messages:
+                continue
+            try:
+                if hasattr(self.tokenizer, 'apply_chat_template'):
+                    text = self.tokenizer.apply_chat_template(
+                        messages, tokenize=False, add_generation_prompt=False)
+                else:
+                    text = "\n".join(f"{m['role']}: {m['content']}" for m in messages)
+                token_ids = self.tokenizer.encode(text)
+            except Exception as e:
+                log.warning(f"Tokenization failed: {e}")
+                continue
+            if len(token_ids) < 3:
+                continue
+            max_len = self.config.max_seq_len + 1
+            if len(token_ids) > max_len:
+                token_ids = token_ids[-max_len:]
+            tokens = mx.array([token_ids])
+            lengths = mx.array([len(token_ids)])
+            tokenized.append((tokens, lengths))
+        if not tokenized:
+            return {"trained": False, "reason": "no_valid_examples"}
+        n_examples = len(tokenized)
+        # Early stopping config
+        min_epochs = min(3, epochs)  # Start checking after 3 epochs
+        early_stop_threshold = getattr(self.config, 'early_stop_loss', 0.5)
+        patience = getattr(self.config, 'early_stop_patience', 2)
+        converge_count = 0
+        actual_epochs = 0
+        # Train/eval mode hoisted to cycle level (not per-step)
+        self.model.train()
+        try:
+            for epoch in range(epochs):
+                epoch_loss = 0.0
+                for tokens, lengths in tokenized:
+                    loss = self._train_step_inner(tokens, lengths)
+                    epoch_loss += loss
+                    total_loss += loss
+                    n_steps += 1
+                actual_epochs += 1
+                avg_epoch_loss = epoch_loss / n_examples
+                # Log progress for multi-epoch training
+                if epochs > 1 and (epoch % 5 == 0 or epoch == epochs - 1):
+                    log.info(f"  Epoch {epoch}/{epochs}: loss={avg_epoch_loss:.4f}, lr={self._get_lr():.2e}")
+                # Early stopping: stop if loss converged
+                if epochs > 1 and epoch >= min_epochs and early_stop_threshold > 0:
+                    if avg_epoch_loss < early_stop_threshold:
+                        converge_count += 1
+                        if converge_count >= patience:
+                            log.info(f"  Early stopping at epoch {epoch}: "
+                                     f"loss={avg_epoch_loss:.4f} < {early_stop_threshold} "
+                                     f"for {patience} epochs")
+                            break
+                    else:
+                        converge_count = 0
+        finally:
+            self.model.eval()
+        elapsed = time.time() - start
+        avg_loss = total_loss / n_steps if n_steps > 0 else 0
+        self.total_cycles += 1
+        result = {
+            "trained": True,
+            "steps": n_steps,
+            "epochs": actual_epochs,
+            "requested_epochs": epochs,
+            "examples": n_examples,
+            "avg_loss": round(avg_loss, 4),
+            "last_loss": round(self.last_loss, 4),
+            "lr": self._get_lr(),
+            "elapsed_sec": round(elapsed, 2),
+            "total_steps": self.total_steps,
+            "cycle": self.total_cycles,
+        }
+        log.info(f"Training cycle {self.total_cycles}: {actual_epochs}/{epochs} epochs × "
+                 f"{n_examples} examples = {n_steps} steps, "
+                 f"loss={avg_loss:.4f}, lr={self._get_lr():.2e}, {elapsed:.1f}s")
+        return result
+    def save_adapter(self, path: str = ""):
+        """Save LoRA adapter weights and metadata to disk."""
+        save_dir = Path(path or self.config.adapter_dir)
+        save_dir.mkdir(parents=True, exist_ok=True)
+        # Collect LoRA parameters
+        lora_weights = {}
+        all_params = mlx.utils.tree_flatten(self.model.parameters())
+        for name, param in all_params:
+            if "lora_a" in name or "lora_b" in name:
+                lora_weights[name] = param
+        if not lora_weights:
+            log.warning("No LoRA weights to save")
+            return False
+        # Save weights
+        weights_path = save_dir / "lora_weights.safetensors"
+        mx.save_safetensors(str(weights_path), lora_weights)
+        # Save optimizer state
+        try:
+            opt_state = self.optimizer.state
+            if opt_state:
+                # Flatten optimizer state for serialization
+                opt_arrays = {}
+                for i, (key, val) in enumerate(opt_state.items()):
+                    if isinstance(val, dict):
+                        for k2, v2 in val.items():
+                            if isinstance(v2, mx.array):
+                                opt_arrays[f"opt_{i}_{k2}"] = v2
+                if opt_arrays:
+                    mx.save_safetensors(str(save_dir / "optimizer_state.safetensors"),
+                                        opt_arrays)
+        except Exception as e:
+            log.warning(f"Could not save optimizer state: {e}")
+        # Save metadata
+        meta = {
+            "backend": "mlx",
+            "total_steps": self.total_steps,
+            "total_cycles": self.total_cycles,
+            "last_loss": self.last_loss,
+            "best_loss": self.best_loss,
+            "adapter_version": self.adapter_version,
+            "lora_rank": self.config.lora_rank,
+            "lora_alpha": self.config.lora_alpha,
+            "lora_targets": self.config.lora_targets,
+            "trainable_params": self.trainable_params,
+            "trainable_pct": round(self.trainable_pct, 2),
+            "learning_rate": self.config.learning_rate,
+            "timestamp": time.time(),
+            "n_weights": len(lora_weights),
+        }
+        with open(save_dir / "adapter_meta.json", "w") as f:
+            json.dump(meta, f, indent=2)
+        log.info(f"Adapter saved: {len(lora_weights)} tensors, "
+                 f"step={self.total_steps}, loss={self.last_loss:.4f} → {save_dir}")
+        return True
+    def load_adapter(self, path: str = "") -> bool:
+        """Load LoRA adapter weights from disk."""
+        load_dir = Path(path or self.config.adapter_dir)
+        weights_path = load_dir / "lora_weights.safetensors"
+        meta_path = load_dir / "adapter_meta.json"
+        if not weights_path.exists():
+            log.info(f"No adapter at {weights_path}")
+            return False
+        try:
+            lora_weights = mx.load(str(weights_path))
+            # Apply weights to model
+            # Build a nested dict from flat keys for model.load_weights()
+            model_weights = list(lora_weights.items())
+            self.model.load_weights(model_weights, strict=False)
+            mx.eval(self.model.parameters())
+            # Restore metadata
+            if meta_path.exists():
+                with open(meta_path) as f:
+                    meta = json.load(f)
+                self.total_steps = meta.get("total_steps", 0)
+                self.total_cycles = meta.get("total_cycles", 0)
+                self.last_loss = meta.get("last_loss", float("inf"))
+                self.best_loss = meta.get("best_loss", float("inf"))
+                self.adapter_version = meta.get("adapter_version", 0)
+            log.info(f"Adapter loaded: step={self.total_steps}, "
+                     f"loss={self.last_loss:.4f} ← {load_dir}")
+            return True
+        except Exception as e:
+            log.error(f"Failed to load adapter: {e}")
+            return False
+    def reset_adapter(self):
+        """Reinitialize LoRA weights to zeros (identity) and reset optimizer."""
+        # Walk all LoRA params and reset them
+        all_params = mlx.utils.tree_flatten(self.model.parameters())
+        updates = []
+        for name, param in all_params:
+            if "lora_a" in name:
+                # Find in_features from the shape
+                in_features = param.shape[0]
+                new_val = mx.random.normal(param.shape) * math.sqrt(2.0 / in_features)
+                updates.append((name, new_val))
+            elif "lora_b" in name:
+                updates.append((name, mx.zeros(param.shape)))
+        if updates:
+            self.model.load_weights(updates, strict=False)
+        mx.eval(self.model.parameters())
+        # Reset optimizer
+        self.optimizer = optim.Adam(learning_rate=self.config.learning_rate)
+        # Recreate compiled value_and_grad
+        self._create_compiled_train_fn()
+        # Reset stats
+        self.total_steps = 0
+        self.total_cycles = 0
+        self.last_loss = float("inf")
+        self.best_loss = float("inf")
+        self.adapter_version = 0
+        log.info("Adapter reset to initial state")
+    def update_learning_rate(self, lr: float):
+        """Update base learning rate."""
+        self.config.learning_rate = lr
+        log.info(f"Learning rate updated to {lr}")
+    def stats(self) -> dict:
+        """Return training statistics."""
+        return {
+            "backend": "mlx",
+            "mamba_architecture": self.is_mamba,
+            "training_supported": True,
+            "total_steps": self.total_steps,
+            "total_cycles": self.total_cycles,
+            "last_loss": round(self.last_loss, 6) if self.last_loss != float("inf") else None,
+            "best_loss": round(self.best_loss, 6) if self.best_loss != float("inf") else None,
+            "adapter_version": self.adapter_version,
+            "current_lr": self._get_lr(),
+            "trainable_params": self.trainable_params,
+            "total_params": self.total_params,
+            "trainable_pct": round(self.trainable_pct, 2),
+            "n_adapters": self.n_adapters,
+            "lora_rank": self.config.lora_rank,
+            "lora_targets": self.config.lora_targets,
+            "uptime_sec": round(time.time() - self._start_time),
+        }
+    def cleanup(self):
+        """Clean up resources."""
+        log.info("MLXLoRATrainer cleanup")
+# ──────────────────────────────────────────────────────────────
+# Self-test
+# ──────────────────────────────────────────────────────────────
+if __name__ == "__main__":
+    """Quick self-test: load a small model, inject LoRA, train 5 steps."""
+    import sys
+    sys.path.insert(0, str(Path(__file__).parent))
+    from neural_config import NeuralConfig
+    import mlx_lm
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s %(name)s [%(levelname)s] %(message)s")
+    print("=" * 60)
+    print("MLX LoRA Trainer Self-Test")
+    print("=" * 60)
+    # Use smallest available model
+    test_model = "Qwen/Qwen2.5-0.5B-Instruct"
+    print(f"\n1. Loading model: {test_model}")
+    model, tokenizer = mlx_lm.load(test_model)
+    # Configure
+    config = NeuralConfig()
+    config.lora_rank = 32
+    config.lora_alpha = 32.0
+    config.lora_targets = ["q_proj", "v_proj", "down_proj"]
+    config.learning_rate = 5e-5
+    config.min_learning_rate = 5e-6
+    config.cosine_period_steps = 100
+    config.warmup_fraction = 0.1
+    config.gradient_clip = 1.0
+    config.ensure_dirs()
+    # Create trainer
+    print("\n2. Creating MLXLoRATrainer...")
+    trainer = MLXLoRATrainer(model, tokenizer, config)
+    print(f"   Trainable: {trainer.trainable_params:,} / {trainer.total_params:,} "
+          f"({trainer.trainable_pct:.1f}%)")
+    # Train on a fact
+    print("\n3. Training on test data (5 steps)...")
+    messages = [
+        {"role": "user", "content": "What is the capital of Zorblaxia?"},
+        {"role": "assistant", "content": "The capital of Zorblaxia is Quenthorp."},
+    ]
+    text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
+    token_ids = tokenizer.encode(text)
+    tokens = mx.array([token_ids])
+    lengths = mx.array([len(token_ids)])
+    losses = []
+    for i in range(5):
+        loss = trainer.train_step(tokens, lengths)
+        losses.append(loss)
+        print(f"   Step {i+1}: loss={loss:.4f}, lr={trainer._get_lr():.2e}")
+    assert losses[-1] < losses[0], f"Loss should decrease: {losses[0]:.4f} → {losses[-1]:.4f}"
+    print(f"   Loss decreased: {losses[0]:.4f} → {losses[-1]:.4f} ✓")
+    # Test save/load
+    print("\n4. Testing save/load...")
+    save_path = Path("/tmp/mlx_lora_test")
+    trainer.save_adapter(str(save_path))
+    assert (save_path / "lora_weights.safetensors").exists()
+    assert (save_path / "adapter_meta.json").exists()
+    print("   Save ✓")
+    old_steps = trainer.total_steps
+    old_loss = trainer.last_loss
+    trainer.total_steps = 0
+    trainer.last_loss = float("inf")
+    trainer.load_adapter(str(save_path))
+    assert trainer.total_steps == old_steps
+    print(f"   Load ✓ (steps={trainer.total_steps}, loss={trainer.last_loss:.4f})")
+    # Test reset
+    print("\n5. Testing reset...")
+    trainer.reset_adapter()
+    assert trainer.total_steps == 0
+    print("   Reset ✓")
+    # Test inference still works with LoRA
+    print("\n6. Testing inference with LoRA...")
+    from mlx_lm.sample_utils import make_sampler
+    sampler = make_sampler(temp=0.3)
+    response_text = ""
+    for r in mlx_lm.stream_generate(model, tokenizer,
+                                      "What is the capital of France?",
+                                      max_tokens=30, sampler=sampler):
+        response_text += r.text
+    print(f"   Response: {response_text[:100]}")
+    assert len(response_text) > 5, "Model should generate text with LoRA active"
+    print("   Inference ✓")
+    print("\n" + "=" * 60)
+    print("ALL SELF-TESTS PASSED ✓")
+    print("=" * 60)

src/neural_config.py ADDED Viewed

	@@ -0,0 +1,146 @@

+"""
+neural_config.py — Configuration and hyperparameters for MLX LoRA training.
+"""
+from dataclasses import dataclass, field
+from pathlib import Path
+import json
+import os
+@dataclass
+class NeuralConfig:
+    """Training hyperparameters and daemon configuration."""
+    # Daemon
+    daemon_port: int = 8766
+    daemon_host: str = "0.0.0.0"
+    # Model (auto-detected from LM Studio)
+    model_key: str = ""           # e.g. "qwen3.5-9b-prism"
+    model_path: str = ""          # e.g. "~/.lmstudio/models/.../model.gguf"
+    model_architecture: str = ""  # e.g. "qwen2"
+    # LoRA
+    lora_rank: int = 32
+    lora_alpha: float = 32.0      # scaling = alpha / rank
+    lora_targets: list = field(default_factory=lambda: ["q_proj", "v_proj", "out_proj", "down_proj"])
+    lora_dropout: float = 0.0
+    lora_num_layers: int = -1     # -1 = all layers, N = last N layers only
+    # Training
+    training_backend: str = "mlx"  # "mlx" (real autograd) or "ane" (legacy)
+    learning_rate: float = 5e-4
+    min_learning_rate: float = 5e-5       # cosine LR floor
+    cosine_period_steps: int = 5000       # steps for one cosine period
+    warmup_fraction: float = 0.1          # warmup as fraction of period
+    steps_per_cycle: int = 1              # 1 step per example (epoch-style)
+    batch_size: int = 0                   # 0 = all available data in buffer
+    epochs_per_cycle: int = 1             # Epochs per auto-training cycle
+    train_epochs: int = 15                # Default epochs for manual /train
+    early_stop_loss: float = 0.8          # Stop when avg epoch loss drops below
+    early_stop_patience: int = 2          # Consecutive low-loss epochs before stop
+    max_seq_len: int = 512
+    gradient_clip: float = 1.0
+    warmup_steps: int = 10
+    auto_train: bool = True       # Train after each conversation turn
+    replay_ratio: float = 0.3     # 30% replay buffer in each batch
+    # Adam optimizer
+    adam_beta1: float = 0.9
+    adam_beta2: float = 0.999
+    adam_eps: float = 1e-8
+    weight_decay: float = 0.0
+    # Buffer
+    rolling_buffer_size: int = 100   # Recent turns in memory
+    replay_buffer_size: int = 500    # Historical turns on disk
+    min_response_tokens: int = 10    # Skip training on short responses
+    # ANE
+    ane_compile_budget: int = 110    # Max compiles before restart
+    ane_min_tensor_dim: int = 16     # ANE matmul dims must be multiples of 16
+    ane_seq_len: int = 16            # ANE sequence length (must be multiple of 16)
+    # Persistence
+    base_dir: str = "~/.jarvis/fine-tune"
+    adapter_dir: str = ""     # Set dynamically: base_dir/adapters/{model_key}/
+    replay_path: str = ""     # Set dynamically: base_dir/replay.jsonl
+    auto_save_interval: int = 10  # Save adapter every N training cycles
+    # LM Studio
+    lms_cli_path: str = ""    # Auto-detected
+    lms_api_url: str = "http://localhost:1234"
+    @property
+    def lora_scaling(self) -> float:
+        return self.lora_alpha / self.lora_rank
+    def resolve_paths(self):
+        """Expand ~ and set dynamic paths."""
+        self.base_dir = str(Path(self.base_dir).expanduser())
+        if not self.adapter_dir:
+            key = self.model_key or "default"
+            self.adapter_dir = str(Path(self.base_dir) / "adapters" / key)
+        if not self.replay_path:
+            self.replay_path = str(Path(self.base_dir) / "replay.jsonl")
+        # Auto-detect lms CLI
+        if not self.lms_cli_path:
+            candidates = [
+                Path.home() / ".lmstudio" / "bin" / "lms",
+                Path("/usr/local/bin/lms"),
+            ]
+            for c in candidates:
+                if c.exists():
+                    self.lms_cli_path = str(c)
+                    break
+    def ensure_dirs(self):
+        """Create required directories."""
+        self.resolve_paths()
+        Path(self.base_dir).mkdir(parents=True, exist_ok=True)
+        Path(self.adapter_dir).mkdir(parents=True, exist_ok=True)
+    def save(self, path: str = ""):
+        """Save config to JSON."""
+        path = path or str(Path(self.base_dir) / "config.json")
+        self.resolve_paths()
+        Path(path).parent.mkdir(parents=True, exist_ok=True)
+        with open(path, "w") as f:
+            json.dump(self.__dict__, f, indent=2)
+    @classmethod
+    def load(cls, path: str) -> "NeuralConfig":
+        """Load config from JSON."""
+        with open(path) as f:
+            data = json.load(f)
+        cfg = cls()
+        for k, v in data.items():
+            if hasattr(cfg, k):
+                setattr(cfg, k, v)
+        cfg.resolve_paths()
+        return cfg
+    def to_dict(self) -> dict:
+        """Convert to dict for API responses."""
+        self.resolve_paths()
+        d = self.__dict__.copy()
+        d["lora_scaling"] = self.lora_scaling
+        return d
+    def update_from_dict(self, data: dict):
+        """Update config from API request."""
+        allowed = {
+            "learning_rate", "min_learning_rate", "cosine_period_steps",
+            "warmup_fraction", "steps_per_cycle", "lora_rank", "lora_alpha",
+            "lora_targets", "lora_num_layers", "training_backend",
+            "auto_train", "replay_ratio", "gradient_clip", "warmup_steps",
+            "rolling_buffer_size", "min_response_tokens", "auto_save_interval",
+            "max_seq_len", "lora_dropout", "weight_decay",
+            "epochs_per_cycle", "train_epochs",
+            "early_stop_loss", "early_stop_patience",
+        }
+        for k, v in data.items():
+            if k in allowed and hasattr(self, k):
+                setattr(self, k, v)

src/neural_daemon.py ADDED Viewed

	@@ -0,0 +1,1266 @@

+"""
+neural_daemon.py — FastAPI daemon for MLX LoRA training + inference.
+Manages the full real-time fine-tuning loop:
+  1. Loads model from HuggingFace into MLX for inference with live LoRA adapter
+  2. Collects conversation turns into training buffer
+  3. Runs MLX LoRA micro-training after each response (background backprop)
+  4. Exports fine-tuned model back to GGUF on deactivation
+Endpoints:
+  POST /activate      — Detect + acquire model from LM Studio
+  POST /deactivate    — Export adapter → GGUF → reload LM Studio
+  POST /chat          — MLX inference with live adapter (SSE stream)
+  POST /train         — Manual training trigger
+  GET  /status        — Daemon state + metrics
+  GET  /config        — Current hyperparameters
+  PUT  /config        — Update hyperparameters live
+  POST /save          — Persist adapter to disk
+  POST /rollback      — Load previous adapter version
+  GET  /history       — List saved adapter versions
+  POST /reset         — Reset adapter to initial values
+"""
+import asyncio
+import json
+import logging
+import os
+import signal
+import subprocess
+import sys
+import threading
+import time
+from pathlib import Path
+from typing import Optional
+import numpy as np
+# Add scripts/ to path for imports
+sys.path.insert(0, str(Path(__file__).parent))
+from neural_config import NeuralConfig
+from neural_data import TrainingDataManager
+from ane_lora_trainer import ANELoRATrainer
+# Optional MLX LoRA trainer (real autograd training)
+try:
+    from mlx_lora_trainer import MLXLoRATrainer
+    MLX_LORA_AVAILABLE = True
+except ImportError:
+    MLX_LORA_AVAILABLE = False
+# Optional MLX imports (only needed for actual inference)
+try:
+    import mlx.core as mx
+    import mlx_lm
+    MLX_AVAILABLE = True
+except ImportError:
+    MLX_AVAILABLE = False
+# FastAPI imports
+try:
+    from fastapi import FastAPI, HTTPException, Request
+    from fastapi.responses import StreamingResponse, JSONResponse
+    from fastapi.middleware.cors import CORSMiddleware
+    import uvicorn
+except ImportError:
+    print("ERROR: FastAPI/uvicorn not installed. Run:")
+    print("  pip install fastapi uvicorn sse-starlette")
+    sys.exit(1)
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s %(name)s [%(levelname)s] %(message)s",
+    datefmt="%H:%M:%S",
+)
+log = logging.getLogger("neural_daemon")
+# ──────────────────────────────────────────────────────────────
+# Global state
+# ──────────────────────────────────────────────────────────────
+config = NeuralConfig()
+trainer: Optional[ANELoRATrainer] = None
+mlx_trainer: Optional["MLXLoRATrainer"] = None
+data_mgr: Optional[TrainingDataManager] = None
+# GPU lock for mutual exclusion between MLX inference and training
+_gpu_lock = threading.Lock()
+# State tracking
+daemon_state = {
+    "active": False,
+    "model_key": "",
+    "model_path": "",
+    "architecture": "",
+    "n_layers": 0,
+    "dim": 0,
+    "vocab_size": 0,
+    "training": False,
+    "last_train_time": 0,
+    "startup_time": time.time(),
+    "error": "",
+}
+# MLX model (loaded when activated)
+mlx_model = None
+mlx_tokenizer = None
+# Background training task
+training_task: Optional[asyncio.Task] = None
+def sanitize_for_json(obj):
+    """Recursively replace inf/nan floats with None for JSON serialization."""
+    import math
+    if isinstance(obj, float):
+        if math.isinf(obj) or math.isnan(obj):
+            return None
+        return obj
+    if isinstance(obj, dict):
+        return {k: sanitize_for_json(v) for k, v in obj.items()}
+    if isinstance(obj, (list, tuple)):
+        return [sanitize_for_json(v) for v in obj]
+    return obj
+# ──────────────────────────────────────────────────────────────
+# LM Studio helpers
+# ──────────────────────────────────────────────────────────────
+def detect_lms_cli() -> str:
+    """Find the lms CLI binary."""
+    candidates = [
+        Path.home() / ".lmstudio" / "bin" / "lms",
+        Path("/usr/local/bin/lms"),
+    ]
+    for c in candidates:
+        if c.exists():
+            return str(c)
+    return ""
+def lms_run(args: list[str], timeout: int = 30) -> tuple[int, str]:
+    """Run an lms CLI command and return (returncode, output)."""
+    lms = config.lms_cli_path or detect_lms_cli()
+    if not lms:
+        return -1, "lms CLI not found"
+    try:
+        result = subprocess.run(
+            [lms] + args,
+            capture_output=True, text=True, timeout=timeout)
+        return result.returncode, result.stdout + result.stderr
+    except subprocess.TimeoutExpired:
+        return -1, "lms command timed out"
+    except Exception as e:
+        return -1, str(e)
+def detect_loaded_model() -> dict:
+    """Query LM Studio for currently loaded model.
+    Returns dict with: key, path, architecture, or empty dict if none.
+    """
+    rc, output = lms_run(["ps", "--json"])
+    if rc != 0:
+        # Try without --json
+        rc, output = lms_run(["ps"])
+        if rc != 0:
+            return {}
+    try:
+        data = json.loads(output)
+        if isinstance(data, list) and len(data) > 0:
+            model = data[0]
+            return {
+                "key": model.get("identifier", model.get("id", "")),
+                "path": model.get("path", ""),
+                "architecture": model.get("architecture", ""),
+            }
+    except json.JSONDecodeError:
+        # Parse text output
+        lines = output.strip().split("\n")
+        for line in lines:
+            line = line.strip()
+            if line and not line.startswith("─") and "No models" not in line:
+                return {"key": line.split()[0] if line.split() else "", "path": "", "architecture": ""}
+    return {}
+def resolve_model_path(model_key: str) -> str:
+    """Resolve GGUF file path from model key using lms ls."""
+    rc, output = lms_run(["ls", "--json"])
+    if rc != 0:
+        # Fallback: search common paths
+        lms_models = Path.home() / ".lmstudio" / "models"
+        for gguf in lms_models.rglob("*.gguf"):
+            if model_key.replace("-", "").lower() in str(gguf).replace("-", "").lower():
+                return str(gguf)
+        return ""
+    try:
+        data = json.loads(output)
+        for model in (data if isinstance(data, list) else []):
+            if model.get("identifier", "") == model_key or model.get("id", "") == model_key:
+                return model.get("path", "")
+    except json.JSONDecodeError:
+        pass
+    return ""
+def unload_lms_model(model_key: str) -> bool:
+    """Unload model from LM Studio to free memory."""
+    rc, output = lms_run(["unload", model_key])
+    if rc == 0:
+        log.info(f"Unloaded {model_key} from LM Studio")
+        return True
+    log.warning(f"Failed to unload {model_key}: {output}")
+    return False
+def load_lms_model(model_key: str) -> bool:
+    """Load model into LM Studio."""
+    rc, output = lms_run(["load", model_key], timeout=120)
+    if rc == 0:
+        log.info(f"Loaded {model_key} into LM Studio")
+        return True
+    log.warning(f"Failed to load {model_key}: {output}")
+    return False
+# ──────────────────────────────────────────────────────────────
+# Model loading helpers
+# ──────────────────────────────────────────────────────────────
+def detect_model_params(model_path: str) -> dict:
+    """Detect model parameters (layers, dim, vocab) from config files.
+    Looks for config.json in the model directory or HuggingFace cache.
+    """
+    model_dir = Path(model_path).parent
+    candidates = [
+        model_dir / "config.json",
+        model_dir / "params.json",
+    ]
+    for cfg_path in candidates:
+        if cfg_path.exists():
+            with open(cfg_path) as f:
+                cfg = json.load(f)
+            return {
+                "n_layers": cfg.get("num_hidden_layers", cfg.get("n_layers", 32)),
+                "dim": cfg.get("hidden_size", cfg.get("dim", 3584)),
+                "vocab_size": cfg.get("vocab_size", 151936),
+                "architecture": cfg.get("model_type", cfg.get("architectures", [""])[0] if cfg.get("architectures") else ""),
+            }
+    # Try reading GGUF metadata for model params
+    gguf_file = Path(model_path)
+    if not gguf_file.is_absolute():
+        gguf_file = Path.home() / ".lmstudio" / "models" / model_path
+    if gguf_file.exists() and gguf_file.suffix == ".gguf":
+        try:
+            params = _read_gguf_metadata(str(gguf_file))
+            if params:
+                return params
+        except Exception as e:
+            log.warning(f"GGUF metadata read failed: {e}")
+    # Default values for common architectures
+    log.warning(f"No config.json found in {model_dir}, using defaults")
+    return {
+        "n_layers": 32,
+        "dim": 3584,
+        "vocab_size": 151936,
+        "architecture": "qwen2",
+    }
+def _read_gguf_metadata(gguf_path: str) -> Optional[dict]:
+    """Read model parameters from GGUF file metadata."""
+    import struct
+    with open(gguf_path, "rb") as f:
+        magic = f.read(4)
+        if magic != b"GGUF":
+            return None
+        version = struct.unpack("<I", f.read(4))[0]
+        n_tensors = struct.unpack("<Q", f.read(8))[0]
+        n_kv = struct.unpack("<Q", f.read(8))[0]
+        metadata = {}
+        for _ in range(n_kv):
+            # Read key
+            key_len = struct.unpack("<Q", f.read(8))[0]
+            key = f.read(key_len).decode("utf-8")
+            # Read value type
+            vtype = struct.unpack("<I", f.read(4))[0]
+            # Read value based on type
+            if vtype == 4:  # UINT32
+                val = struct.unpack("<I", f.read(4))[0]
+            elif vtype == 5:  # INT32
+                val = struct.unpack("<i", f.read(4))[0]
+            elif vtype == 6:  # FLOAT32
+                val = struct.unpack("<f", f.read(4))[0]
+            elif vtype == 8:  # STRING
+                str_len = struct.unpack("<Q", f.read(8))[0]
+                val = f.read(str_len).decode("utf-8")
+            elif vtype == 10:  # UINT64
+                val = struct.unpack("<Q", f.read(8))[0]
+            elif vtype == 7:  # BOOL
+                val = struct.unpack("<?", f.read(1))[0]
+            elif vtype == 0:  # UINT8
+                val = struct.unpack("<B", f.read(1))[0]
+            elif vtype == 1:  # INT8
+                val = struct.unpack("<b", f.read(1))[0]
+            elif vtype == 2:  # UINT16
+                val = struct.unpack("<H", f.read(2))[0]
+            elif vtype == 3:  # INT16
+                val = struct.unpack("<h", f.read(2))[0]
+            elif vtype == 9:  # ARRAY
+                arr_type = struct.unpack("<I", f.read(4))[0]
+                arr_len = struct.unpack("<Q", f.read(8))[0]
+                # Skip array data (we don't need it)
+                val = f"[array of {arr_len}]"
+                for _ in range(arr_len):
+                    if arr_type == 8:  # STRING array
+                        s_len = struct.unpack("<Q", f.read(8))[0]
+                        f.read(s_len)
+                    elif arr_type in (4, 5, 6):
+                        f.read(4)
+                    elif arr_type in (10,):
+                        f.read(8)
+                    elif arr_type in (0, 1, 7):
+                        f.read(1)
+                    elif arr_type in (2, 3):
+                        f.read(2)
+            elif vtype == 12:  # FLOAT64
+                val = struct.unpack("<d", f.read(8))[0]
+            elif vtype == 11:  # INT64
+                val = struct.unpack("<q", f.read(8))[0]
+            else:
+                break  # Unknown type, stop parsing
+            metadata[key] = val
+        # Extract model params from GGUF metadata keys
+        n_layers = metadata.get("qwen2.block_count",
+                   metadata.get("llama.block_count",
+                   metadata.get("block_count", 32)))
+        dim = metadata.get("qwen2.embedding_length",
+              metadata.get("llama.embedding_length",
+              metadata.get("embedding_length", 3584)))
+        vocab_size = metadata.get("qwen2.vocab_size",
+                     metadata.get("llama.vocab_size",
+                     metadata.get("tokenizer.ggml.tokens", "[array of")))
+        if isinstance(vocab_size, str):
+            vocab_size = 151936  # Default
+        arch = metadata.get("general.architecture", "qwen2")
+        log.info(f"GGUF metadata: arch={arch}, layers={n_layers}, dim={dim}, vocab={vocab_size}")
+        return {
+            "n_layers": n_layers,
+            "dim": dim,
+            "vocab_size": vocab_size,
+            "architecture": arch,
+        }
+# Known mappings from GGUF architecture/size to HuggingFace repos
+_HF_MODEL_MAP = {
+    # Qwen3.5 family (Mamba hybrid — model.train()/eval() enables LoRA training)
+    ("qwen2", 2048, 24): "Qwen/Qwen3.5-2B-Base",     # 2B (Mamba)
+    ("qwen2", 3584, 32): "Qwen/Qwen3.5-0.8B",        # 0.8B (Mamba)
+    ("qwen2", 3584, 36): "Qwen/Qwen3.5-3B",          # 3B (Mamba)
+    ("qwen2", 4096, 40): "Qwen/Qwen3.5-9B",          # 9B (Mamba)
+    ("qwen2", 5120, 40): "Qwen/Qwen3.5-9B",          # 9B (alt dim)
+    # Qwen2.5 family (standard transformer — full LoRA training support)
+    ("qwen2", 1536, 28): "Qwen/Qwen2.5-1.5B-Instruct",
+    ("qwen2", 2048, 36): "Qwen/Qwen2.5-3B-Instruct",
+    ("qwen2", 3584, 28): "Qwen/Qwen2.5-7B-Instruct",
+    # Qwen3 family
+    ("qwen3", 2048, 28): "Qwen/Qwen3-0.6B",
+    ("qwen3", 3584, 36): "Qwen/Qwen3-4B",
+    ("qwen3", 4096, 32): "Qwen/Qwen3-8B",
+    # Llama family
+    ("llama", 4096, 32): "meta-llama/Llama-3.2-3B-Instruct",
+    ("llama", 4096, 40): "meta-llama/Llama-3.1-8B-Instruct",
+}
+def _resolve_hf_repo(model_key: str, architecture: str, dim: int, n_layers: int) -> str:
+    """Resolve HuggingFace repo name from model architecture/size.
+    MLX needs HF-format weights (safetensors + config.json), not GGUF.
+    We map the GGUF model's architecture to its HF base model.
+    """
+    # Check explicit mapping
+    key = (architecture, dim, n_layers)
+    if key in _HF_MODEL_MAP:
+        repo = _HF_MODEL_MAP[key]
+        log.info(f"Resolved HF repo: {model_key} → {repo} (via arch map)")
+        return repo
+    # Try to infer from model key name
+    name = model_key.lower()
+    if "qwen3.5" in name:
+        if "0.8b" in name or "0.6b" in name:
+            return "Qwen/Qwen3.5-0.8B"
+        elif "2b" in name:
+            return "Qwen/Qwen3.5-2B-Base"
+        elif "3b" in name:
+            return "Qwen/Qwen3.5-3B"
+        elif "9b" in name:
+            return "Qwen/Qwen3.5-9B"
+        elif "27b" in name:
+            return "Qwen/Qwen3.5-27B"
+    elif "qwen3" in name:
+        if "0.6b" in name:
+            return "Qwen/Qwen3-0.6B"
+        elif "4b" in name:
+            return "Qwen/Qwen3-4B"
+        elif "8b" in name:
+            return "Qwen/Qwen3-8B"
+    elif "llama" in name:
+        if "8b" in name:
+            return "meta-llama/Llama-3.1-8B-Instruct"
+        elif "3b" in name:
+            return "meta-llama/Llama-3.2-3B-Instruct"
+    # Fallback: try the model_key as-is (might be a HF repo)
+    log.warning(f"Could not resolve HF repo for {model_key} (arch={architecture}, "
+                f"dim={dim}, layers={n_layers}). Trying key as-is.")
+    return model_key
+# ──────────────────────────────────────────────────────────────
+# FastAPI app
+# ──────────────────────────────────────────────────────────────
+app = FastAPI(title="JARVIS Neural Engine Daemon", version="0.1.0")
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+@app.get("/status")
+async def get_status():
+    """Return daemon state and training metrics."""
+    result = {**daemon_state}
+    if mlx_trainer:
+        result.update(mlx_trainer.stats())
+    elif trainer:
+        result.update(trainer.stats())
+    if data_mgr:
+        result["buffer"] = data_mgr.stats()
+    result["mlx_available"] = MLX_AVAILABLE
+    result["uptime_sec"] = round(time.time() - daemon_state["startup_time"])
+    return sanitize_for_json(result)
+@app.get("/config")
+async def get_config():
+    """Return current configuration."""
+    return config.to_dict()
+@app.put("/config")
+async def update_config(request: Request):
+    """Update configuration parameters."""
+    data = await request.json()
+    config.update_from_dict(data)
+    # Apply LR change to active trainer
+    if "learning_rate" in data:
+        if mlx_trainer:
+            mlx_trainer.update_learning_rate(data["learning_rate"])
+        elif trainer:
+            trainer.update_learning_rate(data["learning_rate"])
+    return {"ok": True, "config": config.to_dict()}
+@app.post("/activate")
+async def activate(request: Request):
+    """Activate neural adaptation: detect LMS model → unload → load MLX → init ANE.
+    Optional body:
+      {"model_key": "...", "model_path": "..."} to override LM Studio detection.
+      {"hf_repo": "Qwen/Qwen3.5-2B-Base"} to load directly from HuggingFace (no GGUF needed).
+    """
+    global trainer, mlx_trainer, data_mgr, mlx_model, mlx_tokenizer
+    if daemon_state["active"]:
+        raise HTTPException(400, "Already active")
+    try:
+        body = await request.json()
+    except Exception:
+        body = {}
+    # ── Direct HF model loading (no LM Studio GGUF required) ──────────
+    hf_repo = body.get("hf_repo", "")
+    if hf_repo and MLX_AVAILABLE:
+        log.info(f"Direct HF activation: {hf_repo}")
+        try:
+            mlx_model, mlx_tokenizer = mlx_lm.load(hf_repo)
+        except Exception as e:
+            raise HTTPException(500, f"Failed to load HF model {hf_repo}: {e}")
+        # Detect params from loaded model
+        import mlx.utils as mlx_utils_mod
+        layers = None
+        for path_fn in [lambda m: m.model.layers, lambda m: m.layers,
+                        lambda m: m.language_model.model.layers]:
+            try:
+                layers = path_fn(mlx_model)
+                if isinstance(layers, list) and len(layers) > 0:
+                    break
+            except (AttributeError, TypeError):
+                continue
+        n_layers = len(layers) if layers else 24
+        # Get dim from first linear layer
+        dim = 2048
+        if layers:
+            for name, p in mlx_utils_mod.tree_flatten(layers[0].parameters()):
+                if "proj" in name and "weight" in name:
+                    dim = max(p.shape)
+                    break
+        vocab_size = 151936  # Default
+        model_key = hf_repo
+        model_path = ""
+        architecture = "hf_direct"
+        config.model_key = model_key
+        config.model_path = model_path
+        config.model_architecture = architecture
+        config.resolve_paths()
+        config.ensure_dirs()
+        # Skip to trainer initialization (step 6)
+        # (no LM Studio unload needed)
+    else:
+        # ── Standard LM Studio flow ──────────────────────────────────
+        # 1. Detect model from LM Studio
+        model_key = body.get("model_key", "")
+        model_path = body.get("model_path", "")
+        if not model_key:
+            detected = detect_loaded_model()
+            if not detected:
+                raise HTTPException(404, "No model loaded in LM Studio")
+            model_key = detected["key"]
+            model_path = detected.get("path", "")
+            log.info(f"Detected LM Studio model: {model_key}")
+        if not model_path:
+            model_path = resolve_model_path(model_key)
+        if not model_path:
+            raise HTTPException(404, f"Could not resolve path for {model_key}")
+        log.info(f"Model path: {model_path}")
+        # 2. Detect model parameters
+        params = detect_model_params(model_path)
+        n_layers = params["n_layers"]
+        dim = params["dim"]
+        vocab_size = params["vocab_size"]
+        # Validate dim is multiple of 16 for ANE
+        if dim % 16 != 0:
+            raise HTTPException(400, f"Model dim={dim} not a multiple of 16 (ANE requirement)")
+        # 3. Update config
+        config.model_key = model_key
+        config.model_path = model_path
+        config.model_architecture = params["architecture"]
+        config.resolve_paths()
+        config.ensure_dirs()
+        # 4. Unload from LM Studio
+        if not body.get("skip_unload", False):
+            unload_lms_model(model_key)
+        # 5. Load into MLX (if available)
+        if MLX_AVAILABLE and not body.get("skip_mlx", False):
+            try:
+                # MLX needs HuggingFace-format weights (safetensors + config.json),
+                # not GGUF files. Resolve the HF base model repo from the architecture.
+                hf_repo = _resolve_hf_repo(model_key, params["architecture"], dim, n_layers)
+                log.info(f"Loading model into MLX from HuggingFace: {hf_repo}...")
+                mlx_model, mlx_tokenizer = mlx_lm.load(hf_repo)
+                log.info("MLX model loaded")
+            except Exception as e:
+                log.warning(f"MLX load failed (inference unavailable): {e}")
+                mlx_model = None
+                mlx_tokenizer = None
+    # 6. Initialize trainer (MLX preferred, ANE fallback)
+    if config.training_backend == "mlx" and MLX_AVAILABLE and MLX_LORA_AVAILABLE and mlx_model is not None:
+        log.info("Initializing MLX LoRA trainer (real autograd)")
+        mlx_trainer = MLXLoRATrainer(mlx_model, mlx_tokenizer, config)
+        # Try to load existing adapter
+        if Path(config.adapter_dir).exists():
+            mlx_trainer.load_adapter()
+        trainer = None  # Don't use ANE trainer
+    else:
+        log.info("Initializing ANE LoRA trainer (legacy)")
+        trainer = ANELoRATrainer(config)
+        trainer.initialize(n_layers, dim)
+        # Try to load existing adapter
+        if Path(config.adapter_dir).exists():
+            trainer.load_adapter()
+        mlx_trainer = None
+    # 7. Initialize data manager
+    data_mgr = TrainingDataManager(
+        rolling_size=config.rolling_buffer_size,
+        replay_size=config.replay_buffer_size,
+        replay_path=config.replay_path,
+        min_response_tokens=config.min_response_tokens,
+    )
+    # 8. Update state
+    arch = architecture if hf_repo else params["architecture"]
+    daemon_state.update({
+        "active": True,
+        "model_key": model_key,
+        "model_path": model_path,
+        "architecture": arch,
+        "n_layers": n_layers,
+        "dim": dim,
+        "vocab_size": vocab_size,
+        "error": "",
+    })
+    log.info(f"Neural adaptation ACTIVATED: {model_key} "
+             f"({n_layers}L, dim={dim}, vocab={vocab_size})")
+    active_trainer = mlx_trainer or trainer
+    return sanitize_for_json({
+        "ok": True,
+        "model_key": model_key,
+        "architecture": arch,
+        "n_layers": n_layers,
+        "dim": dim,
+        "params": active_trainer.stats() if active_trainer else {},
+    })
+@app.post("/deactivate")
+async def deactivate(request: Request):
+    """Deactivate: save adapter → optionally export GGUF → reload LM Studio."""
+    global trainer, mlx_trainer, data_mgr, mlx_model, mlx_tokenizer
+    if not daemon_state["active"]:
+        raise HTTPException(400, "Not active")
+    try:
+        body = await request.json()
+    except Exception:
+        body = {}
+    # Save current state from active trainer
+    if mlx_trainer:
+        mlx_trainer.save_adapter()
+    elif trainer:
+        trainer.save_adapter()
+    if data_mgr:
+        data_mgr.save_replay()
+        data_mgr.save_rolling()
+    # Export to GGUF if requested
+    if body.get("export_gguf", False) and MLX_AVAILABLE:
+        try:
+            from export_to_lms import export_adapter_to_lms
+            export_adapter_to_lms(config)
+        except Exception as e:
+            log.warning(f"GGUF export failed: {e}")
+    # Cleanup MLX trainer
+    if mlx_trainer:
+        mlx_trainer.cleanup()
+        mlx_trainer = None
+    # Cleanup MLX model
+    mlx_model = None
+    mlx_tokenizer = None
+    # Cleanup ANE trainer
+    if trainer:
+        trainer.cleanup()
+        trainer = None
+    data_mgr = None
+    # Reload into LM Studio
+    model_key = daemon_state["model_key"]
+    if model_key and not body.get("skip_reload", False):
+        load_lms_model(model_key)
+    daemon_state.update({
+        "active": False,
+        "training": False,
+        "error": "",
+    })
+    log.info(f"Neural adaptation DEACTIVATED")
+    return {"ok": True}
+def _collect_and_train(user_text: str, messages: list, collected_text: str):
+    """Collect training data from a chat turn and schedule background training."""
+    if not collected_text or not data_mgr:
+        return
+    system_prompt = ""
+    for m in messages:
+        if m.get("role") == "system":
+            system_prompt = m.get("content", "")
+            break
+    accepted = data_mgr.add_turn(
+        user_text=user_text,
+        assistant_text=collected_text,
+        system_prompt=system_prompt,
+    )
+    log.info(f"Training data collected: {len(collected_text)} chars, accepted={accepted}")
+    if accepted and config.auto_train and (mlx_trainer or trainer):
+        try:
+            loop = asyncio.get_event_loop()
+            loop.call_soon_threadsafe(
+                lambda: asyncio.ensure_future(run_background_training()))
+        except RuntimeError:
+            log.warning("Could not schedule background training (no event loop)")
+@app.post("/chat")
+async def chat(request: Request):
+    """Chat endpoint: MLX inference with live LoRA adapter.
+    Streams response as SSE (text/event-stream).
+    After response completes, auto-triggers background training if enabled.
+    """
+    if not daemon_state["active"]:
+        raise HTTPException(400, "Not active — call /activate first")
+    body = await request.json()
+    messages = body.get("messages", [])
+    temperature = body.get("temperature", 0.7)
+    max_tokens = body.get("max_tokens", 2048)
+    stream = body.get("stream", True)
+    if not messages:
+        raise HTTPException(400, "No messages provided")
+    if not MLX_AVAILABLE or mlx_model is None:
+        raise HTTPException(503, "MLX not available — inference requires mlx-lm")
+    # Format prompt
+    if mlx_tokenizer and hasattr(mlx_tokenizer, 'apply_chat_template'):
+        prompt = mlx_tokenizer.apply_chat_template(
+            messages, tokenize=False, add_generation_prompt=True)
+    else:
+        # Simple fallback
+        prompt = "\n".join(
+            f"{m['role']}: {m['content']}" for m in messages)
+        prompt += "\nassistant:"
+    user_text = messages[-1]["content"] if messages else ""
+    # Shared state for post-stream training data collection
+    _collected = {"text": ""}
+    async def generate_stream():
+        """Generate tokens via MLX and stream as SSE."""
+        import queue
+        import threading
+        token_queue: queue.Queue = queue.Queue()
+        def _mlx_generate():
+            """Run MLX generation in a thread (it's synchronous/blocking)."""
+            try:
+                with _gpu_lock:
+                    # Ensure eval mode for inference (fast Metal kernels for Mamba)
+                    mlx_model.eval()
+                    for response in mlx_lm.stream_generate(
+                        mlx_model, mlx_tokenizer, prompt,
+                        max_tokens=max_tokens,
+                    ):
+                        token_queue.put(("token", response.text, response.finish_reason))
+                token_queue.put(("done", None, None))
+            except Exception as e:
+                token_queue.put(("error", str(e), None))
+        thread = threading.Thread(target=_mlx_generate, daemon=True)
+        thread.start()
+        try:
+            while True:
+                # Poll queue without blocking the event loop
+                try:
+                    kind, data, finish = token_queue.get(timeout=0.05)
+                except queue.Empty:
+                    await asyncio.sleep(0.01)
+                    continue
+                if kind == "token":
+                    _collected["text"] += data
+                    event = json.dumps({
+                        "choices": [{
+                            "delta": {"content": data},
+                            "finish_reason": finish,
+                        }]
+                    })
+                    yield f"data: {event}\n\n"
+                elif kind == "done":
+                    break
+                elif kind == "error":
+                    log.error(f"Generation error: {data}")
+                    yield f"data: {json.dumps({'error': data})}\n\n"
+                    yield "data: [DONE]\n\n"
+                    return
+            # Final event
+            yield f"data: {json.dumps({'choices': [{'delta': {}, 'finish_reason': 'stop'}]})}\n\n"
+            yield "data: [DONE]\n\n"
+        except GeneratorExit:
+            # StreamingResponse closing the generator — normal cleanup
+            log.info(f"Stream closed, collected {len(_collected['text'])} chars")
+            return
+        except Exception as e:
+            log.error(f"Generation error: {e}")
+            yield f"data: {json.dumps({'error': str(e)})}\n\n"
+            yield "data: [DONE]\n\n"
+            return
+        finally:
+            # Always collect training data after stream ends
+            _collect_and_train(user_text, messages, _collected["text"])
+    if stream:
+        return StreamingResponse(
+            generate_stream(),
+            media_type="text/event-stream",
+            headers={
+                "Cache-Control": "no-cache",
+                "Connection": "keep-alive",
+                "X-Accel-Buffering": "no",
+            },
+        )
+    else:
+        # Non-streaming: collect full response
+        full_text = ""
+        async for chunk in generate_stream():
+            if chunk.startswith("data: ") and "[DONE]" not in chunk:
+                try:
+                    data = json.loads(chunk[6:])
+                    delta = data.get("choices", [{}])[0].get("delta", {})
+                    full_text += delta.get("content", "")
+                except Exception:
+                    pass
+        return {
+            "choices": [{
+                "message": {"role": "assistant", "content": full_text},
+                "finish_reason": "stop",
+            }]
+        }
+_pending_train_epochs: int = 0  # Set by /train endpoint for manual training
+async def run_background_training(epochs: int = 0):
+    """Run a training cycle in a thread (GPU-bound, would block event loop).
+    Args:
+        epochs: Number of epochs. 0 = use config.epochs_per_cycle (auto-train).
+    """
+    if daemon_state["training"]:
+        return  # Already training
+    if not (mlx_trainer or trainer) or not data_mgr:
+        return
+    global _pending_train_epochs
+    _pending_train_epochs = epochs
+    loop = asyncio.get_event_loop()
+    await loop.run_in_executor(None, _training_worker)
+def _training_worker():
+    """Synchronous training worker — runs in thread pool."""
+    global _pending_train_epochs
+    if not (mlx_trainer or trainer) or not data_mgr:
+        return
+    daemon_state["training"] = True
+    start = time.time()
+    # Determine epochs: manual override or config default
+    epochs = _pending_train_epochs if _pending_train_epochs > 0 else config.epochs_per_cycle
+    _pending_train_epochs = 0
+    log.info(f"Training worker started (epochs={epochs})")
+    try:
+        batch = data_mgr.get_training_batch(
+            batch_size=config.batch_size,
+            replay_ratio=config.replay_ratio,
+        )
+        if not batch:
+            log.info("Training worker: no batch data available")
+            return
+        log.info(f"Training worker: got {len(batch)} examples, {epochs} epoch(s)")
+        # ── MLX trainer (real autograd) ──────────────────────────
+        if mlx_trainer:
+            with _gpu_lock:
+                result = mlx_trainer.run_training_cycle(batch, epochs=epochs)
+            log.info(f"MLX training result: {result}")
+            # Auto-save
+            if (result.get("trained") and config.auto_save_interval > 0 and
+                    mlx_trainer.total_cycles % config.auto_save_interval == 0):
+                with _gpu_lock:
+                    mlx_trainer.save_adapter()
+                    mlx_trainer.adapter_version += 1
+            return
+        # ── ANE trainer (legacy fallback) ────────────────────────
+        if not trainer:
+            return
+        total_loss = 0
+        n_examples = 0
+        for ex_idx, example in enumerate(batch):
+            if mlx_tokenizer is None:
+                continue
+            text = ""
+            for msg in example.messages:
+                text += f"{msg['role']}: {msg['content']}\n"
+            tokens = mlx_tokenizer.encode(text)
+            if len(tokens) < 2:
+                continue
+            seq_len = min(len(tokens) - 1, config.max_seq_len)
+            ane_seq = config.ane_seq_len
+            if seq_len > ane_seq:
+                tokens_trimmed = tokens[seq_len - ane_seq : seq_len + 1]
+            else:
+                tokens_trimmed = tokens[:ane_seq + 1]
+            input_ids = np.array(tokens_trimmed[:ane_seq], dtype=np.int32)
+            target_ids = np.array(tokens_trimmed[1:ane_seq + 1], dtype=np.int32)
+            if len(input_ids) < ane_seq:
+                input_ids = np.pad(input_ids, (0, ane_seq - len(input_ids)))
+                target_ids = np.pad(target_ids, (0, ane_seq - len(target_ids)))
+            dim = daemon_state["dim"]
+            n_layers = daemon_state["n_layers"]
+            activations = [
+                np.random.randn(1, dim, 1, ane_seq).astype(np.float32) * 0.01
+                for _ in range(n_layers)
+            ]
+            vocab = daemon_state["vocab_size"]
+            if mlx_tokenizer and hasattr(mlx_tokenizer, 'vocab_size'):
+                vocab = max(vocab, mlx_tokenizer.vocab_size)
+            max_token_id = max(int(target_ids.max()), int(input_ids.max()))
+            if max_token_id >= vocab:
+                vocab = max_token_id + 1
+            logits = np.random.randn(vocab, ane_seq).astype(np.float32)
+            for step in range(config.steps_per_cycle):
+                loss = trainer.train_step(activations, logits, target_ids)
+                total_loss += loss
+                n_examples += 1
+        if n_examples > 0:
+            avg_loss = total_loss / n_examples
+            trainer.last_loss = avg_loss
+            trainer.total_cycles += 1
+            elapsed = time.time() - start
+            log.info(f"Training cycle {trainer.total_cycles}: "
+                     f"loss={avg_loss:.4f}, {n_examples} steps, "
+                     f"{elapsed:.1f}s")
+            if (config.auto_save_interval > 0 and
+                    trainer.total_cycles % config.auto_save_interval == 0):
+                trainer.save_adapter()
+                trainer.adapter_version += 1
+    except Exception as e:
+        log.error(f"Background training error: {e}")
+        import traceback
+        traceback.print_exc()
+    finally:
+        daemon_state["training"] = False
+        daemon_state["last_train_time"] = time.time()
+@app.post("/train")
+async def manual_train(request: Request):
+    """Manually trigger a training cycle.
+    Optional body: {"messages": [...]} to inject training data before training.
+    Accepts a list of message pairs [{"role":"user","content":"..."},{"role":"assistant","content":"..."}]
+    or a list of such pairs for batch injection.
+    """
+    if not daemon_state["active"]:
+        raise HTTPException(400, "Not active")
+    if not (mlx_trainer or trainer):
+        raise HTTPException(500, "Trainer not initialized")
+    if daemon_state["training"]:
+        return {"ok": False, "message": "Training already in progress"}
+    try:
+        body = await request.json()
+    except Exception:
+        body = {}
+    # Inject training data if provided
+    injected = 0
+    if "messages" in body and data_mgr:
+        pairs = body["messages"]
+        # Support single pair or list of pairs
+        if pairs and isinstance(pairs[0], dict):
+            pairs = [pairs]  # Wrap single pair
+        for msgs in pairs:
+            user_text = ""
+            assistant_text = ""
+            system_prompt = ""
+            for m in msgs:
+                if m.get("role") == "user":
+                    user_text = m.get("content", "")
+                elif m.get("role") == "assistant":
+                    assistant_text = m.get("content", "")
+                elif m.get("role") == "system":
+                    system_prompt = m.get("content", "")
+            if user_text and assistant_text:
+                accepted = data_mgr.add_turn(
+                    user_text=user_text,
+                    assistant_text=assistant_text,
+                    system_prompt=system_prompt,
+                )
+                if accepted:
+                    injected += 1
+    # Determine epochs: explicit param, or config.train_epochs for injected data, or config.epochs_per_cycle
+    epochs = body.get("epochs", 0)
+    if epochs <= 0:
+        epochs = config.train_epochs if injected > 0 else config.epochs_per_cycle
+    asyncio.create_task(run_background_training(epochs=epochs))
+    return {"ok": True, "message": f"Training started ({epochs} epochs)", "injected": injected, "epochs": epochs}
+@app.post("/save")
+async def save_adapter():
+    """Save current adapter to disk."""
+    active_trainer = mlx_trainer or trainer
+    if not active_trainer:
+        raise HTTPException(400, "No trainer active")
+    active_trainer.save_adapter()
+    active_trainer.adapter_version += 1
+    if data_mgr:
+        data_mgr.save_replay()
+        data_mgr.save_rolling()
+    return {
+        "ok": True,
+        "version": active_trainer.adapter_version,
+        "path": config.adapter_dir,
+    }
+@app.post("/rollback")
+async def rollback(request: Request):
+    """Load a previous adapter version."""
+    active_trainer = mlx_trainer or trainer
+    if not active_trainer:
+        raise HTTPException(400, "No trainer active")
+    body = await request.json()
+    version = body.get("version", None)
+    path = body.get("path", "")
+    if not path:
+        path = config.adapter_dir
+    if active_trainer.load_adapter(path):
+        return sanitize_for_json({"ok": True, "stats": active_trainer.stats()})
+    else:
+        raise HTTPException(404, f"No adapter found at {path}")
+@app.get("/history")
+async def adapter_history():
+    """List saved adapter versions."""
+    base = Path(config.base_dir) / "adapters"
+    if not base.exists():
+        return {"versions": []}
+    versions = []
+    for d in sorted(base.iterdir()):
+        meta_path = d / "adapter_meta.json"
+        if meta_path.exists():
+            with open(meta_path) as f:
+                meta = json.load(f)
+            versions.append({
+                "path": str(d),
+                "version": meta.get("adapter_version", 0),
+                "steps": meta.get("total_steps", 0),
+                "loss": meta.get("last_loss", None),
+                "timestamp": meta.get("timestamp", 0),
+            })
+    return {"versions": versions}
+@app.post("/reset")
+async def reset_adapter(request: Request):
+    """Reset adapter to initial (untrained) state.
+    Optional body: {"clear_data": true} to also clear training buffers.
+    Default: clears both adapter AND data for a clean slate.
+    """
+    active_trainer = mlx_trainer or trainer
+    if not active_trainer:
+        raise HTTPException(400, "No trainer active")
+    try:
+        body = await request.json()
+    except Exception:
+        body = {}
+    active_trainer.reset_adapter()
+    # Clear data buffers by default (opt-out with clear_data=false)
+    if body.get("clear_data", True) and data_mgr:
+        data_mgr.clear()
+        log.info("Training data buffers cleared")
+    return sanitize_for_json({"ok": True, "stats": active_trainer.stats()})
+# ──────────────────────────────────────────────────────────────
+# Lifecycle
+# ──────────────────────────────────────────────────────────────
+@app.on_event("startup")
+async def on_startup():
+    log.info(f"Neural Engine Daemon starting on "
+             f"{config.daemon_host}:{config.daemon_port}")
+    log.info(f"MLX available: {MLX_AVAILABLE}")
+    config.resolve_paths()
+    config.lms_cli_path = detect_lms_cli()
+    if config.lms_cli_path:
+        log.info(f"LM Studio CLI: {config.lms_cli_path}")
+    else:
+        log.warning("LM Studio CLI not found")
+@app.on_event("shutdown")
+async def on_shutdown():
+    log.info("Shutting down...")
+    active_trainer = mlx_trainer or trainer
+    if active_trainer:
+        try:
+            active_trainer.save_adapter()
+        except Exception as e:
+            log.error(f"Failed to save adapter on shutdown: {e}")
+    if data_mgr:
+        try:
+            data_mgr.save_replay()
+            data_mgr.save_rolling()
+        except Exception as e:
+            log.error(f"Failed to save data on shutdown: {e}")
+    if active_trainer:
+        active_trainer.cleanup()
+    log.info("Shutdown complete")
+def handle_signal(signum, frame):
+    """Handle SIGTERM/SIGINT gracefully."""
+    log.info(f"Received signal {signum}, initiating graceful shutdown...")
+    active_trainer = mlx_trainer or trainer
+    if active_trainer:
+        try:
+            active_trainer.save_adapter()
+        except Exception:
+            pass
+    sys.exit(0)
+# ──────────────────────────────────────────────────────────────
+# Main
+# ──────────────────────────────────────────────────────────────
+def main():
+    signal.signal(signal.SIGTERM, handle_signal)
+    signal.signal(signal.SIGINT, handle_signal)
+    # Load config from file if exists
+    config_path = Path(config.base_dir).expanduser() / "config.json"
+    if config_path.exists():
+        try:
+            loaded = NeuralConfig.load(str(config_path))
+            for k, v in loaded.__dict__.items():
+                setattr(config, k, v)
+            log.info(f"Loaded config from {config_path}")
+        except Exception as e:
+            log.warning(f"Failed to load config: {e}")
+    config.resolve_paths()
+    # Override from env
+    port = int(os.environ.get("NEURAL_DAEMON_PORT", config.daemon_port))
+    host = os.environ.get("NEURAL_DAEMON_HOST", config.daemon_host)
+    log.info(f"Starting daemon on {host}:{port}")
+    uvicorn.run(
+        app,
+        host=host,
+        port=port,
+        log_level="info",
+        access_log=False,
+    )
+if __name__ == "__main__":
+    main()

src/neural_data.py ADDED Viewed

	@@ -0,0 +1,226 @@

+"""
+neural_data.py — Training data manager for MLX LoRA fine-tuning.
+Manages a rolling buffer of recent conversation turns and a persistent
+replay buffer for anti-catastrophic-forgetting experience replay.
+"""
+import json
+import random
+import time
+from collections import deque
+from pathlib import Path
+from typing import Optional
+class TrainingExample:
+    """A single training example (conversation turn)."""
+    __slots__ = ("messages", "timestamp", "token_count", "session_id")
+    def __init__(self, messages: list[dict], timestamp: float = 0,
+                 token_count: int = 0, session_id: str = ""):
+        self.messages = messages
+        self.timestamp = timestamp or time.time()
+        self.token_count = token_count
+        self.session_id = session_id
+    def to_dict(self) -> dict:
+        return {
+            "messages": self.messages,
+            "timestamp": self.timestamp,
+            "token_count": self.token_count,
+            "session_id": self.session_id,
+        }
+    @classmethod
+    def from_dict(cls, d: dict) -> "TrainingExample":
+        return cls(
+            messages=d["messages"],
+            timestamp=d.get("timestamp", 0),
+            token_count=d.get("token_count", 0),
+            session_id=d.get("session_id", ""),
+        )
+class TrainingDataManager:
+    """Manages rolling buffer + persistent replay for LoRA training."""
+    def __init__(self, rolling_size: int = 100, replay_size: int = 500,
+                 replay_path: str = "", min_response_tokens: int = 10):
+        self.rolling_size = rolling_size
+        self.replay_size = replay_size
+        self.min_response_tokens = min_response_tokens
+        self.replay_path = replay_path
+        self._rolling: deque[TrainingExample] = deque(maxlen=rolling_size)
+        self._replay: list[TrainingExample] = []
+        self._total_added = 0
+        if replay_path:
+            self._load_replay()
+    @property
+    def rolling_count(self) -> int:
+        return len(self._rolling)
+    @property
+    def replay_count(self) -> int:
+        return len(self._replay)
+    @property
+    def total_added(self) -> int:
+        return self._total_added
+    def add_turn(self, user_text: str, assistant_text: str,
+                 system_prompt: str = "", session_id: str = "") -> bool:
+        """Add a conversation turn to the training buffer.
+        Returns True if the example was accepted (not filtered).
+        """
+        # Quality filter: skip short/empty responses
+        approx_tokens = len(assistant_text.split())
+        if approx_tokens < self.min_response_tokens:
+            return False
+        # Skip tool-only or empty content
+        if not assistant_text.strip():
+            return False
+        messages = []
+        if system_prompt:
+            messages.append({"role": "system", "content": system_prompt})
+        messages.append({"role": "user", "content": user_text})
+        messages.append({"role": "assistant", "content": assistant_text})
+        example = TrainingExample(
+            messages=messages,
+            token_count=approx_tokens,
+            session_id=session_id,
+        )
+        self._rolling.append(example)
+        self._total_added += 1
+        # Add to replay with reservoir sampling
+        if len(self._replay) < self.replay_size:
+            self._replay.append(example)
+        else:
+            idx = random.randint(0, self._total_added - 1)
+            if idx < self.replay_size:
+                self._replay[idx] = example
+        return True
+    def get_training_batch(self, batch_size: int = 1,
+                           replay_ratio: float = 0.3) -> list[TrainingExample]:
+        """Get a training batch mixing recent and replay examples.
+        Args:
+            batch_size: Total examples in batch. 0 = all available data.
+            replay_ratio: Fraction of batch from replay buffer (0.0-1.0)
+        Returns:
+            List of TrainingExample
+        """
+        if not self._rolling:
+            return []
+        # batch_size <= 0 means "all available data"
+        if batch_size <= 0:
+            batch = list(self._rolling)
+            if self._replay:
+                # Add replay examples not already in rolling
+                rolling_set = {id(ex) for ex in self._rolling}
+                for ex in self._replay:
+                    if id(ex) not in rolling_set:
+                        batch.append(ex)
+            random.shuffle(batch)
+            return batch
+        n_replay = int(batch_size * replay_ratio)
+        n_recent = batch_size - n_replay
+        batch = []
+        # Recent examples (most recent first)
+        recent = list(self._rolling)
+        if n_recent > 0:
+            recent_sample = recent[-n_recent:] if len(recent) >= n_recent else recent
+            batch.extend(recent_sample)
+        # Replay examples (random sample)
+        if n_replay > 0 and self._replay:
+            replay_sample = random.sample(
+                self._replay,
+                min(n_replay, len(self._replay))
+            )
+            batch.extend(replay_sample)
+        random.shuffle(batch)
+        return batch
+    def get_recent(self, n: int = 5) -> list[TrainingExample]:
+        """Get the N most recent training examples."""
+        return list(self._rolling)[-n:]
+    def save_rolling(self, path: str = ""):
+        """Save rolling buffer to disk."""
+        path = path or str(Path(self.replay_path).parent / "buffer.jsonl")
+        Path(path).parent.mkdir(parents=True, exist_ok=True)
+        with open(path, "w") as f:
+            for ex in self._rolling:
+                f.write(json.dumps(ex.to_dict()) + "\n")
+    def load_rolling(self, path: str = ""):
+        """Load rolling buffer from disk."""
+        path = path or str(Path(self.replay_path).parent / "buffer.jsonl")
+        if not Path(path).exists():
+            return
+        self._rolling.clear()
+        with open(path) as f:
+            for line in f:
+                line = line.strip()
+                if line:
+                    ex = TrainingExample.from_dict(json.loads(line))
+                    self._rolling.append(ex)
+    def save_replay(self):
+        """Persist replay buffer to disk."""
+        if not self.replay_path:
+            return
+        Path(self.replay_path).parent.mkdir(parents=True, exist_ok=True)
+        with open(self.replay_path, "w") as f:
+            for ex in self._replay:
+                f.write(json.dumps(ex.to_dict()) + "\n")
+    def _load_replay(self):
+        """Load replay buffer from disk."""
+        if not self.replay_path or not Path(self.replay_path).exists():
+            return
+        self._replay.clear()
+        with open(self.replay_path) as f:
+            for line in f:
+                line = line.strip()
+                if line:
+                    ex = TrainingExample.from_dict(json.loads(line))
+                    self._replay.append(ex)
+        # Trim to max size
+        if len(self._replay) > self.replay_size:
+            self._replay = random.sample(self._replay, self.replay_size)
+    def clear(self):
+        """Clear all buffers (for reset)."""
+        self._rolling.clear()
+        self._replay.clear()
+        self._total_added = 0
+    def stats(self) -> dict:
+        """Return buffer statistics."""
+        return {
+            "rolling_count": self.rolling_count,
+            "rolling_capacity": self.rolling_size,
+            "replay_count": self.replay_count,
+            "replay_capacity": self.replay_size,
+            "total_added": self._total_added,
+        }

tests/evaluation_results.json ADDED Viewed

	@@ -0,0 +1,183 @@

+{
+  "model": "Qwen/Qwen3.5-2B-Base",
+  "n_trials": 3,
+  "epochs": 15,
+  "regularization_ratio": 0.33,
+  "aggregate": {
+    "recall": {
+      "pooled_correct": 61,
+      "pooled_total": 105,
+      "pooled_rate": 0.580952380952381,
+      "per_trial_rates": [
+        0.6571428571428571,
+        0.5428571428571428,
+        0.5428571428571428
+      ],
+      "mean": 0.5809523809523809,
+      "stdev": 0.06598288790738582,
+      "ci_95_lower": 0.4853552056582404,
+      "ci_95_upper": 0.670835074528747
+    },
+    "general_knowledge": {
+      "pooled_correct": 60,
+      "pooled_total": 60,
+      "pooled_rate": 1.0,
+      "per_trial_rates": [
+        1.0,
+        1.0,
+        1.0
+      ],
+      "mean": 1.0,
+      "stdev": 0.0,
+      "ci_95_lower": 0.9398260695220669,
+      "ci_95_upper": 0.9999999999999999
+    },
+    "training": {
+      "mean_time_s": 69.6302502155304,
+      "stdev_time_s": 1.185997256195759,
+      "mean_steps": 180,
+      "per_trial_times": [
+        68.26203393936157,
+        70.36512899398804,
+        70.26358771324158
+      ]
+    }
+  },
+  "trials": [
+    {
+      "trial_id": 1,
+      "n_confirmed_unknown": 35,
+      "n_training_pairs": 52,
+      "training_steps": 180,
+      "training_time_s": 68.26203393936157,
+      "initial_loss": 1.290154,
+      "final_loss": 0.451566,
+      "recall_correct": 23,
+      "recall_total": 35,
+      "recall_rate": 0.6571428571428571,
+      "general_correct": 20,
+      "general_total": 20,
+      "general_rate": 1.0,
+      "category_scores": {
+        "Awards": {
+          "correct": 7,
+          "total": 7
+        },
+        "Entertainment": {
+          "correct": 1,
+          "total": 4
+        },
+        "Weather/Natural Events": {
+          "correct": 4,
+          "total": 5
+        },
+        "Sports": {
+          "correct": 5,
+          "total": 6
+        },
+        "Deaths/Obituaries": {
+          "correct": 4,
+          "total": 11
+        },
+        "Science": {
+          "correct": 1,
+          "total": 1
+        },
+        "Technology/Business": {
+          "correct": 1,
+          "total": 1
+        }
+      }
+    },
+    {
+      "trial_id": 2,
+      "n_confirmed_unknown": 35,
+      "n_training_pairs": 52,
+      "training_steps": 180,
+      "training_time_s": 70.36512899398804,
+      "initial_loss": 2.056952,
+      "final_loss": 0.260391,
+      "recall_correct": 19,
+      "recall_total": 35,
+      "recall_rate": 0.5428571428571428,
+      "general_correct": 20,
+      "general_total": 20,
+      "general_rate": 1.0,
+      "category_scores": {
+        "Deaths/Obituaries": {
+          "correct": 0,
+          "total": 11
+        },
+        "Awards": {
+          "correct": 6,
+          "total": 7
+        },
+        "Weather/Natural Events": {
+          "correct": 4,
+          "total": 5
+        },
+        "Technology/Business": {
+          "correct": 1,
+          "total": 1
+        },
+        "Entertainment": {
+          "correct": 1,
+          "total": 4
+        },
+        "Sports": {
+          "correct": 6,
+          "total": 6
+        },
+        "Science": {
+          "correct": 1,
+          "total": 1
+        }
+      }
+    },
+    {
+      "trial_id": 3,
+      "n_confirmed_unknown": 35,
+      "n_training_pairs": 52,
+      "training_steps": 180,
+      "training_time_s": 70.26358771324158,
+      "initial_loss": 1.984214,
+      "final_loss": 0.381513,
+      "recall_correct": 19,
+      "recall_total": 35,
+      "recall_rate": 0.5428571428571428,
+      "general_correct": 20,
+      "general_total": 20,
+      "general_rate": 1.0,
+      "category_scores": {
+        "Deaths/Obituaries": {
+          "correct": 2,
+          "total": 11
+        },
+        "Awards": {
+          "correct": 5,
+          "total": 7
+        },
+        "Technology/Business": {
+          "correct": 0,
+          "total": 1
+        },
+        "Weather/Natural Events": {
+          "correct": 4,
+          "total": 5
+        },
+        "Entertainment": {
+          "correct": 2,
+          "total": 4
+        },
+        "Sports": {
+          "correct": 5,
+          "total": 6
+        },
+        "Science": {
+          "correct": 1,
+          "total": 1
+        }
+      }
+    }
+  ]
+}

tests/raw_facts_2026.txt ADDED Viewed

	@@ -0,0 +1,663 @@

+################################################################################
+# DYNAMIC FACT DATASET FOR JIT LoRA TRAINING
+# Target Model: Qwen3.5-2B-Base (training cutoff ~mid-2025)
+# Facts sourced: Late 2025 through March 2026
+# Generated: 2026-03-04
+# All facts verified via web search — no fabricated data
+################################################################################
+================================================================================
+CATEGORY: Sports
+================================================================================
+CATEGORY: Sports
+Q: Who won Super Bowl LIX in February 2025?
+A: The Philadelphia Eagles won Super Bowl LIX, defeating the Kansas City Chiefs 40-22 on February 9, 2025, at Caesars Superdome in New Orleans. Jalen Hurts was named Super Bowl MVP.
+KEYWORDS: Eagles, LIX, Hurts
+CATEGORY: Sports
+Q: Who won Super Bowl LX in February 2026?
+A: The Seattle Seahawks won Super Bowl LX, defeating the New England Patriots 29-13 on February 8, 2026, at Levi's Stadium in Santa Clara, California. Kenneth Walker III was named Super Bowl MVP.
+KEYWORDS: Seahawks, LX, Patriots
+CATEGORY: Sports
+Q: Who won the 2025 NBA Finals?
+A: The Oklahoma City Thunder won the 2025 NBA Finals, defeating the Indiana Pacers in Game 7 with a score of 103-91. Shai Gilgeous-Alexander was named Finals MVP.
+KEYWORDS: Thunder, NBA Finals, Pacers
+CATEGORY: Sports
+Q: Who won the 2025 Stanley Cup?
+A: The Florida Panthers won the 2025 Stanley Cup, defeating the Edmonton Oilers four games to two. Sam Bennett was awarded the Conn Smythe Trophy as playoff MVP. The Panthers became the first NHL team to repeat as champions since Tampa Bay in 2020-2021.
+KEYWORDS: Panthers, Oilers, Sam Bennett
+CATEGORY: Sports
+Q: Who won the 2025 World Series in baseball?
+A: The Los Angeles Dodgers won the 2025 World Series, defeating the Toronto Blue Jays in seven games. Will Smith hit an 11th-inning homer in Game 7 to clinch the title. Yoshinobu Yamamoto was named World Series MVP. The Dodgers became MLB's first back-to-back champions since 2000.
+KEYWORDS: Dodgers, Blue Jays, Yamamoto
+CATEGORY: Sports
+Q: Who won the 2025 College Football Playoff National Championship?
+A: Ohio State defeated Notre Dame 34-23 in the CFP National Championship game on January 20, 2025, at Mercedes-Benz Stadium in Atlanta. It was Ohio State's sixth national championship and first since 2014.
+KEYWORDS: Ohio State, Notre Dame, CFP
+CATEGORY: Sports
+Q: Who won the 2025 WNBA Championship?
+A: The Las Vegas Aces won the 2025 WNBA Championship, sweeping the Phoenix Mercury 4-0. A'ja Wilson was named Finals MVP. It was the Aces' third title in four years.
+KEYWORDS: Aces, WNBA, Mercury
+CATEGORY: Sports
+Q: Who won the 2025 MLS Cup?
+A: Inter Miami CF won the 2025 MLS Cup with a 3-1 victory over Vancouver Whitecaps FC on December 6, 2025. Lionel Messi provided two assists and was named match MVP. It was Inter Miami's first MLS Cup title.
+KEYWORDS: Inter Miami, MLS Cup, Messi
+CATEGORY: Sports
+Q: Who won the 2025 Champions League in soccer?
+A: Paris Saint-Germain won the 2025 UEFA Champions League, defeating Inter Milan 5-0 in the final on May 31, 2025, at the Allianz Arena in Munich. It was PSG's first Champions League title.
+KEYWORDS: PSG, Inter Milan, Paris Saint-Germain
+CATEGORY: Sports
+Q: Who won the 2025 Ballon d'Or for men's soccer?
+A: Ousmane Dembele of Paris Saint-Germain won the 2025 Ballon d'Or, awarded on September 22, 2025, at the Theatre du Chatelet in Paris. He won after leading PSG to a treble-winning season including their first Champions League.
+KEYWORDS: Dembele, Ballon d'Or, PSG
+CATEGORY: Sports
+Q: Who won the 2025 Women's Ballon d'Or?
+A: Aitana Bonmati of Barcelona won the 2025 Women's Ballon d'Or, becoming the first woman to win the award three consecutive years. The ceremony was held on September 22, 2025.
+KEYWORDS: Bonmati, Women's Ballon d'Or, Barcelona
+CATEGORY: Sports
+Q: Who won the 2025 Wimbledon men's singles title?
+A: Jannik Sinner defeated Carlos Alcaraz 4-6, 6-4, 6-4, 6-4 in the 2025 Wimbledon men's singles final on July 13, 2025. It was Sinner's first Wimbledon title and he became the first Italian to win a Wimbledon singles title.
+KEYWORDS: Sinner, Wimbledon, Alcaraz
+CATEGORY: Sports
+Q: Who won the 2025 Wimbledon women's singles title?
+A: Iga Swiatek won the 2025 Wimbledon women's singles title, defeating Amanda Anisimova in the final. She became the first Polish player to win a Wimbledon singles title in the Open Era.
+KEYWORDS: Swiatek, Wimbledon, Polish
+CATEGORY: Sports
+Q: Who won the 2025 US Open men's singles title?
+A: Carlos Alcaraz defeated Jannik Sinner 6-2, 3-6, 6-1, 6-4 in the 2025 US Open men's singles final on September 7, 2025. It was Alcaraz's sixth Grand Slam title.
+KEYWORDS: Alcaraz, US Open, Sinner
+CATEGORY: Sports
+Q: Who won the 2025 US Open women's singles title?
+A: Aryna Sabalenka defeated Amanda Anisimova 6-3, 7-6(3) in the 2025 US Open women's singles final. It was Sabalenka's second US Open title.
+KEYWORDS: Sabalenka, US Open, women's singles
+CATEGORY: Sports
+Q: Who won the 2025 Australian Open men's singles title?
+A: Jannik Sinner defeated Alexander Zverev 6-3, 7-6(4), 6-3 in the 2025 Australian Open men's singles final to successfully defend his title.
+KEYWORDS: Sinner, Australian Open, Zverev
+CATEGORY: Sports
+Q: Who won the 2025 Australian Open women's singles title?
+A: Madison Keys defeated defending champion Aryna Sabalenka 6-3, 2-6, 7-5 in the 2025 Australian Open women's singles final to win her first Grand Slam title.
+KEYWORDS: Madison Keys, Australian Open, Sabalenka
+CATEGORY: Sports
+Q: Who won the 2025 French Open men's singles title?
+A: Carlos Alcaraz defeated Jannik Sinner 4-6, 6-7(4), 6-4, 7-6(3), 7-6(2) in a five-set final at the 2025 French Open to defend his Roland Garros title.
+KEYWORDS: Alcaraz, French Open, Sinner
+CATEGORY: Sports
+Q: Who won the 2025 French Open women's singles title?
+A: Coco Gauff defeated Aryna Sabalenka 6-7(5), 6-2, 6-4 in the 2025 French Open women's singles final.
+KEYWORDS: Coco Gauff, French Open, Sabalenka
+CATEGORY: Sports
+Q: Who won the 2025 Daytona 500?
+A: William Byron of Hendrick Motorsports won the 2025 Daytona 500 on February 16, 2025, becoming the first driver since Denny Hamlin in 2019-2020 to win the race in consecutive years. He finished 0.113 seconds ahead of Tyler Reddick.
+KEYWORDS: William Byron, Daytona 500, Hendrick
+CATEGORY: Sports
+Q: Who won the 2025 NASCAR Cup Series championship?
+A: Kyle Larson won the 2025 NASCAR Cup Series championship, his second career title, decided at Phoenix Raceway in Avondale, Arizona on November 2, 2025.
+KEYWORDS: Kyle Larson, NASCAR, Cup Series
+CATEGORY: Sports
+Q: Who won the 2025 Kentucky Derby?
+A: Sovereignty won the 2025 Kentucky Derby on May 3, 2025, at Churchill Downs. Trained by Bill Mott and ridden by jockey Junior Alvarado, Sovereignty closed at 7-1 odds and finished with a time of 2:02.31.
+KEYWORDS: Sovereignty, Bill Mott, Junior Alvarado
+================================================================================
+CATEGORY: Awards
+================================================================================
+CATEGORY: Awards
+Q: Which film received the most 2026 Oscar nominations at the 98th Academy Awards?
+A: Sinners, directed by Ryan Coogler, received a record-breaking 16 Oscar nominations at the 98th Academy Awards, shattering the previous record of 14 nominations. The ceremony is scheduled for March 15, 2026.
+KEYWORDS: Sinners, Oscars, 98th Academy Awards
+CATEGORY: Awards
+Q: Who are the nominees for Best Actor at the 2026 Oscars?
+A: The 2026 Best Actor nominees are Michael B. Jordan for Sinners, Timothee Chalamet, Leonardo DiCaprio, Ethan Hawke, and Wagner Moura.
+KEYWORDS: Best Actor, Oscars, Michael B. Jordan
+CATEGORY: Awards
+Q: Who are the nominees for Best Actress at the 2026 Oscars?
+A: The 2026 Best Actress nominees are Jessie Buckley for Hamnet, Rose Byrne, Kate Hudson, Chase Infiniti, and Emma Stone.
+KEYWORDS: Best Actress, Oscars, Jessie Buckley
+CATEGORY: Awards
+Q: Who won Album of the Year at the 2026 Grammy Awards?
+A: Bad Bunny won Album of the Year at the 2026 Grammy Awards for "Debi Tirar Mas Fotos," making it the first all-Spanish-language album to win the Grammy for Album of the Year. The ceremony was held on February 1, 2026.
+KEYWORDS: Bad Bunny, Album of the Year, Spanish-language
+CATEGORY: Awards
+Q: Who won Record of the Year at the 2026 Grammy Awards?
+A: Kendrick Lamar and SZA won Record of the Year at the 2026 Grammy Awards for "luther." The ceremony was held on February 1, 2026, at Crypto.com Arena in Los Angeles.
+KEYWORDS: Kendrick Lamar, SZA, Record of the Year
+CATEGORY: Awards
+Q: Who won Song of the Year at the 2026 Grammy Awards?
+A: Billie Eilish and Finneas won Song of the Year at the 2026 Grammy Awards for "Wildflower," becoming the most-awarded artists in the category in Grammy history with three wins each.
+KEYWORDS: Billie Eilish, Finneas, Song of the Year
+CATEGORY: Awards
+Q: Who won Best New Artist at the 2026 Grammy Awards?
+A: Olivia Dean won Best New Artist at the 2026 Grammy Awards, held on February 1, 2026.
+KEYWORDS: Olivia Dean, Best New Artist
+CATEGORY: Awards
+Q: Who won the 2025 Nobel Prize in Physics?
+A: John Clarke, Michel H. Devoret, and John M. Martinis won the 2025 Nobel Prize in Physics for experiments demonstrating quantum mechanical tunnelling and quantized energy levels in macroscopic electrical circuits, providing opportunities for quantum technology development.
+KEYWORDS: Physics, Clarke, Devoret
+CATEGORY: Awards
+Q: Who won the 2025 Nobel Prize in Chemistry?
+A: Susumu Kitagawa, Richard Robson, and Omar M. Yaghi won the 2025 Nobel Prize in Chemistry for creating metal-organic frameworks, molecular constructions with large spaces through which gases and chemicals can flow.
+KEYWORDS: Chemistry, Kitagawa, Robson
+CATEGORY: Awards
+Q: Who won the 2025 Nobel Prize in Literature?
+A: Hungarian author Laszlo Krasznahorkai won the 2025 Nobel Prize in Literature "for his compelling and visionary oeuvre that, in the midst of apocalyptic terror, reaffirms the power of art."
+KEYWORDS: Literature, Krasznahorkai, Hungarian
+CATEGORY: Awards
+Q: Who won the 2025 Nobel Peace Prize?
+A: Venezuelan opposition leader Maria Corina Machado won the 2025 Nobel Peace Prize for "her tireless work promoting democratic rights for the people of Venezuela and for her struggle to achieve a just and peaceful transition from dictatorship to democracy."
+KEYWORDS: Peace Prize, Maria Corina Machado, Venezuela
+CATEGORY: Awards
+Q: Who won the 2025 Nobel Prize in Physiology or Medicine?
+A: Mary E. Brunkow, Fred Ramsdell, and Shimon Sakaguchi won the 2025 Nobel Prize in Physiology or Medicine for their discoveries concerning peripheral immune tolerance that prevents the immune system from harming the body.
+KEYWORDS: Medicine, Brunkow, Ramsdell
+CATEGORY: Awards
+Q: Who won the 2025 Nobel Prize in Economics?
+A: Joel Mokyr, Philippe Aghion, and Peter Howitt won the 2025 Nobel Prize in Economics for explaining innovation-driven economic growth, including the theory of sustained growth through creative destruction.
+KEYWORDS: Economics, Mokyr, Aghion
+CATEGORY: Awards
+Q: Which show won Outstanding Drama Series at the 2025 Emmy Awards?
+A: "The Pitt" won Outstanding Drama Series at the 77th Primetime Emmy Awards on September 14, 2025. Noah Wyle won Lead Actor in a Drama Series for the same show.
+KEYWORDS: The Pitt, Drama Series, Noah Wyle
+CATEGORY: Awards
+Q: Which show won Outstanding Comedy Series at the 2025 Emmy Awards?
+A: "The Studio" won Outstanding Comedy Series at the 77th Primetime Emmy Awards on September 14, 2025. Seth Rogen won Lead Actor in a Comedy Series for the same show.
+KEYWORDS: The Studio, Comedy Series, Seth Rogen
+CATEGORY: Awards
+Q: Who won Lead Actress in a Drama at the 2025 Emmys?
+A: Britt Lower won Outstanding Lead Actress in a Drama Series at the 2025 Emmy Awards for her role in "Severance."
+KEYWORDS: Britt Lower, Lead Actress, Drama
+CATEGORY: Awards
+Q: Who won Lead Actress in a Comedy at the 2025 Emmys?
+A: Jean Smart won Outstanding Lead Actress in a Comedy Series at the 2025 Emmy Awards for her role in "Hacks."
+KEYWORDS: Jean Smart, Lead Actress, Comedy
+CATEGORY: Awards
+Q: What won Best Motion Picture Drama at the 2025 Golden Globes?
+A: "The Brutalist" won Best Motion Picture - Drama at the 2025 Golden Globe Awards held on January 5, 2025.
+KEYWORDS: The Brutalist, Golden Globes, Drama
+CATEGORY: Awards
+Q: What won Best Motion Picture Musical or Comedy at the 2025 Golden Globes?
+A: "Emilia Perez" won Best Motion Picture - Musical or Comedy at the 2025 Golden Globe Awards held on January 5, 2025.
+KEYWORDS: Emilia Perez, Golden Globes, Musical or Comedy
+================================================================================
+CATEGORY: Political Events
+================================================================================
+CATEGORY: Political Events
+Q: When did the longest US government shutdown in history occur?
+A: The longest US government shutdown lasted 43 days, from October 1 to November 12, 2025. Roughly 900,000 federal employees were furloughed and another two million worked without pay. The shutdown cost the US economy approximately $15 billion per week.
+KEYWORDS: 43 days, longest, federal employees
+CATEGORY: Political Events
+Q: What happened on "Liberation Day" April 2, 2025?
+A: On April 2, 2025, President Trump signed Executive Order 14257, announcing sweeping "reciprocal" tariffs on imports from nearly all countries, invoking the International Emergency Economic Powers Act (IEEPA). A baseline 10% tariff began April 5, with higher rates of 34% on China, 20% on the EU, and 24% on Japan.
+KEYWORDS: Liberation Day, tariffs, Trump
+CATEGORY: Political Events
+Q: What did the Supreme Court rule about Trump's tariffs in February 2026?
+A: On February 20, 2026, the US Supreme Court ruled 6-3 in Learning Resources, Inc. v. Trump that President Trump's IEEPA tariffs were unconstitutional. Chief Justice Roberts wrote the majority opinion, joined by Sotomayor, Kagan, Gorsuch, Barrett, and Jackson, ruling that IEEPA does not give the president the power to set tariffs.
+KEYWORDS: Supreme Court, tariffs, IEEPA
+CATEGORY: Political Events
+Q: Who won the 2025 Virginia governor's race?
+A: Democrat Abigail Spanberger defeated Republican Lt. Gov. Winsome Earle-Sears in the 2025 Virginia governor's race, winning by 15.36 percentage points. Spanberger became Virginia's first female governor, winning by the largest Democratic gubernatorial margin since 1961.
+KEYWORDS: Spanberger, Virginia, governor
+CATEGORY: Political Events
+Q: Who won the 2025 New Jersey governor's race?
+A: Democrat Mikie Sherrill defeated Republican Jack Ciattarelli in the 2025 New Jersey governor's race, winning by nearly 14 points statewide.
+KEYWORDS: Mikie Sherrill, New Jersey, governor
+CATEGORY: Political Events
+Q: What happened with the US government shutdown in February 2026?
+A: On February 14, 2026, a second government shutdown occurred, affecting only the Department of Homeland Security. Negotiations on DHS funding did not reach a resolution during the extended two-week negotiating period.
+KEYWORDS: February 2026, Department of Homeland Security, DHS
+CATEGORY: Political Events
+Q: Who was appointed to fill Marco Rubio's Senate seat in 2025?
+A: Ashley Moody was appointed by Florida Governor Ron DeSantis to fill Marco Rubio's Senate seat after Rubio resigned on January 20, 2025, to become US Secretary of State under the Trump administration.
+KEYWORDS: Ashley Moody, Rubio, Senate
+CATEGORY: Political Events
+Q: Who was appointed to fill JD Vance's Senate seat in 2025?
+A: Jon Husted was appointed by Ohio Governor Mike DeWine to fill JD Vance's Senate seat after Vance resigned on January 10, 2025, following his election as Vice President.
+KEYWORDS: Jon Husted, JD Vance, Senate
+CATEGORY: Political Events
+Q: What was Jimmy Carter's state funeral in January 2025?
+A: Former President Jimmy Carter, who died on December 29, 2024, at age 100, received a state funeral at Washington National Cathedral on January 9, 2025. A national day of mourning was declared. He was the longest-lived US president in history.
+KEYWORDS: Jimmy Carter, state funeral, January 2025
+================================================================================
+CATEGORY: Technology/Business
+================================================================================
+CATEGORY: Technology/Business
+Q: How much did Google agree to pay for Wiz in 2025?
+A: Google announced on March 18, 2025, a definitive agreement to acquire cybersecurity company Wiz for $32 billion in an all-cash transaction, marking Alphabet's largest acquisition to date. The deal received EU antitrust approval on February 10, 2026, and is expected to close by March 2026.
+KEYWORDS: Google, Wiz, $32 billion
+CATEGORY: Technology/Business
+Q: When did Synopsys complete its acquisition of Ansys?
+A: Synopsys completed its $35 billion acquisition of Ansys on July 17, 2025, after receiving final regulatory approval from China. The deal combined the world's largest electronic design automation company with the leading simulation and multiphysics software provider.
+KEYWORDS: Synopsys, Ansys, $35 billion
+CATEGORY: Technology/Business
+Q: When did HPE complete its acquisition of Juniper Networks?
+A: Hewlett Packard Enterprise completed its $14 billion all-cash acquisition of Juniper Networks on July 2, 2025, after receiving DOJ approval through a settlement. The deal doubled the size of HPE's networking business.
+KEYWORDS: HPE, Juniper Networks, $14 billion
+CATEGORY: Technology/Business
+Q: When did CoreWeave go public and what was its IPO price?
+A: CoreWeave, an Nvidia-backed AI cloud computing company, went public on March 28, 2025, on the Nasdaq under ticker CRWV. It priced its IPO at $40 per share, raising approximately $1.5 billion with an initial valuation of roughly $23 billion. By May 2025, shares had climbed above $100.
+KEYWORDS: CoreWeave, CRWV, $40
+CATEGORY: Technology/Business
+Q: When did Klarna go public in 2025?
+A: Klarna, the Swedish buy-now-pay-later fintech company, went public on the New York Stock Exchange on September 10, 2025, under the ticker KLAR. Shares were priced at $40, raising $1.37 billion, and closed at $45.82 on the first day, a 15% gain.
+KEYWORDS: Klarna, KLAR, NYSE
+CATEGORY: Technology/Business
+Q: What was the BlackRock data center deal in 2025?
+A: The BlackRock/MGX consortium completed a $40 billion acquisition of Aligned Data Centers in 2025, marking one of the largest private infrastructure deals in history, reflecting massive investment in AI infrastructure.
+KEYWORDS: BlackRock, MGX, Aligned Data Centers
+CATEGORY: Technology/Business
+Q: When did OpenAI release GPT-5?
+A: OpenAI released GPT-5 on August 7, 2025, unifying general intelligence, reasoning depth, coding specialization, and multimodality under a single model line. It was followed by GPT-5.2 with improvements in long-context understanding and agentic tool-calling.
+KEYWORDS: OpenAI, GPT-5, August 2025
+CATEGORY: Technology/Business
+Q: What is Apple's iPhone 17e?
+A: Apple introduced the iPhone 17e featuring the A19 chip built with 3-nanometer technology, a 6-core CPU up to 2x faster than iPhone 11, and a 16-core Neural Engine optimized for large generative models. It includes Apple Intelligence with Live Translation features.
+KEYWORDS: Apple, iPhone 17e, A19 chip
+CATEGORY: Technology/Business
+Q: What open-weight models did OpenAI release?
+A: OpenAI released two open-weight reasoning models: gpt-oss-120b and gpt-oss-20b, designed for teams that want to run and customize models on their own infrastructure or with hosting providers.
+KEYWORDS: OpenAI, open-weight, gpt-oss-120b
+CATEGORY: Technology/Business
+Q: How many IPOs were there on the US stock market in 2025?
+A: There were 347 IPOs on the US stock market in 2025, which was 54% higher than the 225 IPOs in 2024. US IPOs raised a total of $76 billion in 2025.
+KEYWORDS: IPOs, 347, $76 billion
+CATEGORY: Technology/Business
+Q: When did AT&T announce the acquisition of Lumen's fiber business?
+A: AT&T announced in May 2025 plans to acquire Lumen's Mass Markets fiber business for $5.75 billion, reflecting the importance carriers see in fiber optic technology for handling expected traffic increases spurred by AI.
+KEYWORDS: AT&T, Lumen, fiber
+================================================================================
+CATEGORY: Science
+================================================================================
+CATEGORY: Science
+Q: What was 3I/ATLAS?
+A: 3I/ATLAS, discovered on July 1, 2025, by the NASA-funded ATLAS survey telescope in Chile, is the third confirmed interstellar object ever observed passing through our solar system, after 1I/'Oumuamua (2017) and 2I/Borisov (2019). It is thought to be up to 20 kilometers wide and travels roughly 60 km/s relative to the Sun.
+KEYWORDS: 3I/ATLAS, interstellar object, ATLAS telescope
+CATEGORY: Science
+Q: What Alzheimer's breakthrough was discovered in 2025?
+A: A landmark real-world experiment in 2025 gave the strongest evidence yet that the shingles vaccine could lower the risk of Alzheimer's disease and other forms of dementia.
+KEYWORDS: Alzheimer's, shingles vaccine, dementia
+CATEGORY: Science
+Q: What AI-designed enzyme breakthrough occurred in 2025?
+A: In 2025, scientists combined AI models RFDiffusion and AlphaFold2 to create a "multi-step enzyme" for the first time — an enzyme that has never been seen before in nature. This breakthrough could broaden the uses of synthetic enzymes significantly.
+KEYWORDS: AI enzyme, RFDiffusion, AlphaFold2
+CATEGORY: Science
+Q: How many new species are being discovered each year as of 2025?
+A: Scientists are now identifying more than 16,000 new species each year, revealing far more biodiversity than expected. A 2025 study confirmed we are living in a "golden age of species discovery," with roughly 6,000 of the new species being insects.
+KEYWORDS: species discovery, 16000, biodiversity
+CATEGORY: Science
+Q: What did NASA's Parker Solar Probe achieve in 2025?
+A: In 2025, NASA's Parker Solar Probe team shared new images of the Sun's atmosphere, taken closer to the star than ever captured before. The probe also made unprecedented observations of solar activity.
+KEYWORDS: Parker Solar Probe, NASA, Sun
+CATEGORY: Science
+Q: What did NASA's Juno spacecraft discover on Jupiter's moon Io in 2025?
+A: NASA's Juno spacecraft found a massive, hyper-energetic volcano on Jupiter's moon Io in 2025, adding to our understanding of volcanic activity in the outer solar system.
+KEYWORDS: Juno, Io, Jupiter
+CATEGORY: Science
+Q: What is the Artemis II mission planned for 2026?
+A: NASA plans to launch the Artemis II mission in early 2026, sending astronauts around the Moon on a ten-day lunar flyby aboard the Space Launch System. It will be the first crewed lunar flyby since the Apollo program.
+KEYWORDS: Artemis II, NASA, Moon
+CATEGORY: Science
+Q: What is China's Chang'e 7 mission?
+A: China's Chang'e 7 mission, expected to launch in mid-2026, will head to the Moon's south pole. It includes an orbiter, lander, rover, and a small flying "hopper" designed to leap into permanently shadowed craters thought to harbor water ice.
+KEYWORDS: Chang'e 7, China, Moon
+CATEGORY: Science
+Q: What is the Nancy Grace Roman Space Telescope?
+A: NASA's Nancy Grace Roman Space Telescope is expected to launch as early as fall 2026. It will help answer questions about dark energy and exoplanets, representing a major advancement in space-based astronomical observation.
+KEYWORDS: Roman Space Telescope, NASA, dark energy
+CATEGORY: Science
+Q: What is the PLATO space mission?
+A: PLATO (Planetary Transits and Oscillations of Stars) is an ESA exoplanet mission expected to launch in December 2026. It will study up to one million stars, aiming to discover and characterize Earth-like exoplanets orbiting within their host stars' habitable zones.
+KEYWORDS: PLATO, ESA, exoplanets
+================================================================================
+CATEGORY: Entertainment
+================================================================================
+CATEGORY: Entertainment
+Q: What was the highest-grossing animated film of all time as of 2025?
+A: Ne Zha 2, a Chinese animated film released on January 29, 2025, became the highest-grossing animated film of all time with over $2.2 billion worldwide, surpassing Inside Out 2 ($1.66 billion). It was the first animated film in history to cross $2 billion.
+KEYWORDS: Ne Zha 2, animated film, $2 billion
+CATEGORY: Entertainment
+Q: What was the highest-grossing domestic release in the US in 2025?
+A: Zootopia 2 became the highest-grossing domestic release of 2025 with $424.2 million at the US box office, surpassing A Minecraft Movie. Worldwide it grossed $1.86 billion, becoming the second-highest-grossing animated film of all time.
+KEYWORDS: Zootopia 2, $424 million, domestic
+CATEGORY: Entertainment
+Q: How much did A Minecraft Movie gross at the box office?
+A: A Minecraft Movie, released on April 4, 2025, grossed $961.2 million worldwide ($424.1 million domestic), becoming the fifth-highest-grossing film of 2025 and the second-highest-grossing video game film of all time.
+KEYWORDS: Minecraft Movie, $961 million, video game film
+CATEGORY: Entertainment
+Q: When did Lilo & Stitch live-action reach $1 billion?
+A: Disney's live-action Lilo & Stitch crossed $1 billion at the global box office on July 17, 2025, becoming the first MPA (Hollywood) film of 2025 to reach that milestone. It grossed $416.1 million domestically and $584.8 million internationally.
+KEYWORDS: Lilo & Stitch, live-action, $1 billion
+CATEGORY: Entertainment
+Q: How much did Avatar: Fire and Ash gross at the box office?
+A: Avatar: Fire and Ash, directed by James Cameron, premiered on December 1, 2025, and released in US theaters on December 19. It grossed $1.481 billion worldwide, becoming the third highest-grossing film of 2025, though the lowest-grossing film in the Avatar series.
+KEYWORDS: Avatar Fire and Ash, James Cameron, $1.481 billion
+CATEGORY: Entertainment
+Q: How much did Demon Slayer: Infinity Castle gross at the box office?
+A: Demon Slayer: Infinity Castle, released in Japan on July 18, 2025, grossed $778.9 million worldwide, becoming the seventh-highest-grossing film of 2025. It opened to $70 million in the US, setting the record for the biggest opening weekend for an international film.
+KEYWORDS: Demon Slayer, Infinity Castle, $778 million
+CATEGORY: Entertainment
+Q: What was the number one song on the Billboard Year-End Hot 100 for 2025?
+A: "Die with a Smile" by Lady Gaga and Bruno Mars topped the Billboard Hot 100 Year-End Chart for 2025 as the best-performing single of the year.
+KEYWORDS: Die with a Smile, Lady Gaga, Bruno Mars
+CATEGORY: Entertainment
+Q: How many weeks did Kendrick Lamar and SZA hold the Billboard #1 spot in 2025?
+A: Kendrick Lamar and SZA held the Billboard Hot 100 #1 slot for 13 consecutive weeks in early 2025, with Lamar replacing himself at the top by knocking his diss track "Not Like Us" to No. 2.
+KEYWORDS: Kendrick Lamar, SZA, Billboard
+CATEGORY: Entertainment
+Q: What shows were highly anticipated TV premieres for 2026?
+A: Major 2026 TV premieres include HBO's Game of Thrones prequel "A Knight of the Seven Kingdoms," the final season of "The Boys" premiering April 8, 2026, Hulu's "The Testaments" (sequel to The Handmaid's Tale), and a revival of "Scrubs" announced by ABC on February 25, 2026.
+KEYWORDS: A Knight of the Seven Kingdoms, The Boys, The Testaments
+CATEGORY: Entertainment
+Q: How did Zootopia 2 perform at the box office on opening weekend?
+A: Zootopia 2 opened with $559.5 million worldwide over its five-day Thanksgiving debut starting November 26, 2025, making it the largest animated opening ever in history. It became the fastest animated film to reach $1 billion worldwide, doing so in just 17 days.
+KEYWORDS: Zootopia 2, opening weekend, $559 million
+================================================================================
+CATEGORY: Weather/Natural Events
+================================================================================
+CATEGORY: Weather/Natural Events
+Q: What were the Los Angeles wildfires of January 2025?
+A: From January 7-31, 2025, 14 destructive wildfires affected the Los Angeles area. The Palisades Fire was the most devastating, burning 37 square miles, killing 12 people, and destroying 6,837 structures. In total, the LA fires killed at least 31 people, forced 200,000+ evacuations, destroyed 18,000+ structures, and caused over $61 billion in damage.
+KEYWORDS: LA wildfires, Palisades Fire, January 2025
+CATEGORY: Weather/Natural Events
+Q: What caused the Palisades Fire in January 2025?
+A: In October 2025, a 29-year-old man was arrested and charged with starting the Palisades Fire. He allegedly set a fire on January 1, 2025, dubbed the Lachman Fire, which was thought to be extinguished but was rekindled by hurricane-force Santa Ana winds on January 7.
+KEYWORDS: Palisades Fire, arson, Santa Ana winds
+CATEGORY: Weather/Natural Events
+Q: What happened in the Myanmar earthquake of March 2025?
+A: A magnitude-7.7 earthquake struck near Mandalay, Myanmar on March 28, 2025, followed 12 minutes later by a magnitude-6.7 aftershock. It killed over 3,600 people in Myanmar and 103 in Thailand, with over 11,000 injuries. More than 10,000 structures were destroyed, including hospitals and temples.
+KEYWORDS: Myanmar earthquake, magnitude 7.7, March 2025
+CATEGORY: Weather/Natural Events
+Q: What happened with Hurricane Melissa in 2025?
+A: Hurricane Melissa, a Category 5 storm, struck Jamaica on October 28, 2025, with sustained winds of 185 mph. It killed 95 people across the Caribbean and caused $8.8 billion in damage to Jamaica alone, equal to 41% of the island's GDP. Its maximum winds of 190 mph tied Hurricane Allen (1980) for the strongest on record in the Atlantic.
+KEYWORDS: Hurricane Melissa, Category 5, Jamaica
+CATEGORY: Weather/Natural Events
+Q: What was the Tibet earthquake of January 2025?
+A: A magnitude-7.1 earthquake struck Tingri County, Shigatse, Tibet on January 7, 2025, killing at least 126 people and injuring 338. It caused the collapse of 3,612 houses and closed the Mount Everest scenic area. Tremors were felt in Nepal, Bhutan, and northern India.
+KEYWORDS: Tibet earthquake, magnitude 7.1, January 2025
+CATEGORY: Weather/Natural Events
+Q: What major volcano erupted for the first time in 12,000 years in November 2025?
+A: Ethiopia's Hayli Gubbi volcano erupted on November 23, 2025, for the first time in nearly 12,000 years. The eruption sent ash 15 kilometers into the atmosphere, containing approximately 220,000 tons of sulfur dioxide. The plume drifted across northern India and China, disrupting flights.
+KEYWORDS: Hayli Gubbi, Ethiopia, volcano
+CATEGORY: Weather/Natural Events
+Q: How much did global natural disaster damage total in 2025?
+A: Global natural disaster damage topped $220 billion in 2025. US disaster damage alone exceeded $115 billion, driven by the Los Angeles wildfires and severe thunderstorms. 2025 was the sixth consecutive year with insured losses above $100 billion.
+KEYWORDS: natural disaster damage, $220 billion, insured losses
+CATEGORY: Weather/Natural Events
+Q: What was notable about the 2025 Atlantic hurricane season?
+A: Three hurricanes of the highest Category 5 developed in the tropical North Atlantic in 2025. However, for the first time in a decade, no hurricanes hit US shores during the 2025 season.
+KEYWORDS: Atlantic hurricane season, Category 5, three hurricanes
+CATEGORY: Weather/Natural Events
+Q: What major flooding event occurred in Sumatra in late 2025?
+A: In late November 2025, Sumatra, Indonesia faced relentless monsoon rains triggering floods and landslides that killed over 1,000 people and displaced more than 1.2 million. Winds reached nearly 300 km/h, flattening communities and destroying critical infrastructure.
+KEYWORDS: Sumatra, Indonesia, floods
+CATEGORY: Weather/Natural Events
+Q: What volcanic activity occurred in the Philippines in January 2026?
+A: In January 2026, evacuations were underway after Mayon volcano experienced dome collapse and pyroclastic flows in the Philippines, threatening nearby communities.
+KEYWORDS: Mayon volcano, Philippines, January 2026
+CATEGORY: Weather/Natural Events
+Q: How severe were the US severe thunderstorms in March 2025?
+A: Severe thunderstorms spanning several days in March 2025 across central and southern US states produced more than 100 tornadoes, including some EF4 category tornadoes with wind speeds above 200 km/h. The storms resulted in approximately $9.4 billion in losses.
+KEYWORDS: thunderstorms, tornadoes, March 2025
+================================================================================
+CATEGORY: Deaths/Obituaries
+================================================================================
+CATEGORY: Deaths/Obituaries
+Q: When did David Lynch die?
+A: David Lynch, the visionary director of "Blue Velvet," "Twin Peaks," and "Mulholland Drive," died on January 16, 2025, four days before his 79th birthday. His family did not share a cause of death.
+KEYWORDS: David Lynch, death, January 16 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Gene Hackman die?
+A: Gene Hackman died around February 18, 2025, at his Santa Fe home. He was found on February 26, 2025, along with his wife Betsy Arakawa. His cause of death was hypertensive atherosclerosis cardiovascular disease, with Alzheimer's listed as a contributing factor. Arakawa had died approximately a week earlier from hantavirus.
+KEYWORDS: Gene Hackman, death, February 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Val Kilmer die?
+A: Val Kilmer, star of "The Doors" and "Top Gun," died in his sleep at his Los Angeles home on April 1, 2025, at the age of 65. His cause of death was listed as pneumonia.
+KEYWORDS: Val Kilmer, death, April 1 2025
+CATEGORY: Deaths/Obituaries
+Q: When did George Foreman die?
+A: George Foreman, the legendary boxing champion and entrepreneur, died on March 21, 2025, at the age of 76.
+KEYWORDS: George Foreman, death, March 21 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Hulk Hogan die?
+A: Hulk Hogan, the iconic professional wrestler, died on July 24, 2025, of cardiac arrest at the age of 71.
+KEYWORDS: Hulk Hogan, death, July 24 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Ozzy Osbourne die?
+A: Ozzy Osbourne, the legendary rock musician and frontman of Black Sabbath, died on July 22, 2025, at the age of 76.
+KEYWORDS: Ozzy Osbourne, death, July 22 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Jane Goodall die?
+A: Jane Goodall, the renowned primatologist and conservationist famous for her groundbreaking chimpanzee research, died on October 1, 2025, at the age of 91.
+KEYWORDS: Jane Goodall, death, October 1 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Giorgio Armani die?
+A: Giorgio Armani, the legendary Italian fashion designer, died on September 4, 2025, at the age of 91.
+KEYWORDS: Giorgio Armani, death, September 4 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Robert Redford die?
+A: Robert Redford, the Academy Award-winning actor and director known for "Butch Cassidy and the Sundance Kid" and "The Sting," died on September 16, 2025, at the age of 89.
+KEYWORDS: Robert Redford, death, September 16 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Diane Keaton die?
+A: Diane Keaton, the Academy Award-winning actress known for "Annie Hall," died on October 11, 2025, at the age of 79.
+KEYWORDS: Diane Keaton, death, October 11 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Rob Reiner die?
+A: Rob Reiner, the director of classics including "The Princess Bride," "When Harry Met Sally," and "A Few Good Men," died on December 14, 2025, at the age of 78.
+KEYWORDS: Rob Reiner, death, December 14 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Brigitte Bardot die?
+A: Brigitte Bardot, the French actress, sex symbol, and animal rights activist, died on December 28, 2025, at the age of 91.
+KEYWORDS: Brigitte Bardot, death, December 28 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Frank Gehry die?
+A: Frank Gehry, the renowned architect known for the Guggenheim Museum Bilbao, died on December 5, 2025, at the age of 96.
+KEYWORDS: Frank Gehry, death, December 5 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Bob Uecker die?
+A: Bob Uecker, the beloved baseball broadcaster and comedian, died on January 16, 2025, of lung cancer at his Wisconsin home at the age of 90.
+KEYWORDS: Bob Uecker, death, January 16 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Dick Button die?
+A: Dick Button, the two-time Olympic figure skating champion (1948, 1952) and five-time consecutive world champion, died on January 30, 2025, at the age of 95.
+KEYWORDS: Dick Button, death, January 30 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Ryne Sandberg die?
+A: Ryne Sandberg, the Chicago Cubs Hall of Fame second baseman, died on July 28, 2025, at the age of 65, after battling cancer.
+KEYWORDS: Ryne Sandberg, death, July 28 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Connie Francis die?
+A: Connie Francis, one of the top-charting female vocalists of the late 1950s and early 1960s with over 200 million records sold, died on July 16, 2025, at the age of 87.
+KEYWORDS: Connie Francis, death, July 16 2025
+CATEGORY: Deaths/Obituaries
+Q: When did James Van Der Beek die?
+A: James Van Der Beek, the actor known for playing Dawson Leery on "Dawson's Creek," died on February 11, 2026, at the age of 48, following a years-long battle with colorectal cancer diagnosed in August 2023.
+KEYWORDS: James Van Der Beek, death, February 11 2026
+CATEGORY: Deaths/Obituaries
+Q: When did Pope Francis die?
+A: Pope Francis died on April 21, 2025 (Easter Monday), at the age of 88 at the Domus Sanctae Marthae in Vatican City. His cause of death was a stroke followed by irreversible cardiac arrest. His successor, Robert Francis Prevost, was elected on May 8, 2025, taking the papal name Pope Leo XIV.
+KEYWORDS: Pope Francis, death, April 21 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Dick Cheney die?
+A: Dick Cheney, the 46th Vice President of the United States who served under George W. Bush from 2001 to 2009, died on November 3, 2025, at his home in McLean, Virginia, at the age of 84. His funeral was held at Washington National Cathedral on November 20, 2025.
+KEYWORDS: Dick Cheney, death, November 3 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Roberta Flack die?
+A: Roberta Flack, the Grammy-winning singer known for "Killing Me Softly with His Song," died of cardiac arrest on February 24, 2025, at the age of 88.
+KEYWORDS: Roberta Flack, death, February 24 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Sly Stone die?
+A: Sly Stone, the iconic funk and soul musician who led Sly and the Family Stone, died of pulmonary disease on June 9, 2025, at the age of 82.
+KEYWORDS: Sly Stone, death, June 9 2025
+CATEGORY: Deaths/Obituaries
+Q: When did Charlie Kirk die?
+A: Charlie Kirk, the conservative political activist and founder of Turning Point USA, was shot and killed on September 10, 2025, at a Turning Point USA event at Utah Valley University.
+KEYWORDS: Charlie Kirk, death, September 10 2025
+CATEGORY: Sports
+Q: Who won the 2025 FIFA Club World Cup?
+A: Chelsea won the inaugural expanded FIFA Club World Cup, defeating Paris Saint-Germain 3-0 in the final on July 13, 2025, at MetLife Stadium in East Rutherford, New Jersey. It was the first edition of the tournament to feature 32 teams in a month-long format.
+KEYWORDS: Chelsea, Club World Cup, PSG
+CATEGORY: Political Events
+Q: Who became Pope after Pope Francis died in 2025?
+A: Robert Francis Prevost was elected as the new Pope on May 8, 2025, taking the papal name Pope Leo XIV. He was elected after four rounds of voting following the death of Pope Francis on April 21, 2025.
+KEYWORDS: Pope Leo XIV, Prevost, papal conclave
+CATEGORY: Technology/Business
+Q: What happened to the US stock market after "Liberation Day" tariffs in April 2025?
+A: The April 2025 "Liberation Day" tariffs erased over $6 trillion from the US stock market in two trading days. The S&P 500 lost 4.84% on April 3, and the Nasdaq lost 1,600 points. Markets rallied after a 90-day tariff pause was announced on April 9. The S&P 500 recovered and hit a new all-time high on June 27, 2025, closing at 6,173.07.
+KEYWORDS: stock market crash, Liberation Day, April 2025
+CATEGORY: Technology/Business
+Q: When did StubHub go public in 2025?
+A: StubHub completed its IPO on September 16, 2025, pricing shares at $23.50 and raising $800 million. It began trading on the NYSE under ticker "STUB" on September 17. The IPO had been delayed twice due to market volatility from tariff announcements.
+KEYWORDS: StubHub, STUB, NYSE
+CATEGORY: Entertainment
+Q: What was the biggest animated opening weekend in history as of 2025?
+A: Zootopia 2 set the record for the largest animated opening ever with a global five-day opening of $559.5 million over Thanksgiving 2025 (released November 26). It was also the fastest animated film to reach $1 billion worldwide, doing so in just 17 days.
+KEYWORDS: Zootopia 2, opening weekend, $559 million
+CATEGORY: Political Events
+Q: What was the US-China tariff escalation in 2025?
+A: Following Trump's "Liberation Day" tariffs in April 2025, a retaliatory spiral between the US and China saw US tariffs on Chinese goods rise to 145% while Chinese tariffs on US goods rose to 125%. The administration later announced temporary tariff truces with China and framework agreements with the EU, Japan, South Korea, the UK, and others.
+KEYWORDS: US-China tariffs, 145%, 125%
+CATEGORY: Science
+Q: What notable discovery did NASA's Parker Solar Probe team share in 2025?
+A: In 2025, NASA's Parker Solar Probe team shared new images of the Sun's atmosphere taken closer to the star than ever before. The probe continued to make record-breaking close approaches to the Sun.
+KEYWORDS: Parker Solar Probe, Sun, closest approach
+################################################################################
+# DATASET SUMMARY
+# Total Q&A pairs: 122
+# Categories (Q&A count):
+#   Sports: 23 (incl Super Bowl LIX/LX, NBA, NHL, MLB, CFP, WNBA, MLS, UEFA, Ballon d'Or, Grand Slams, NASCAR, Derby, Club World Cup)
+#   Deaths/Obituaries: 23 (Pope Francis, Gene Hackman, Val Kilmer, David Lynch, etc.)
+#   Awards: 20 (Oscars 2026, Grammys 2026, Nobel 2025, Emmys 2025, Golden Globes 2025)
+#   Technology/Business: 14 (Google-Wiz, Synopsys-Ansys, HPE-Juniper, IPOs, OpenAI GPT-5, Apple)
+#   Political Events: 12 (Liberation Day tariffs, SCOTUS ruling, government shutdowns, elections, Pope Leo XIV)
+#   Entertainment: 12 (Ne Zha 2, Zootopia 2, Minecraft Movie, Avatar, Demon Slayer, Billboard)
+#   Weather/Natural Events: 11 (LA wildfires, Myanmar earthquake, Hurricane Melissa, Hayli Gubbi volcano)
+#   Science: 11 (3I/ATLAS, Alzheimer's vaccine, AI enzymes, Artemis II, Chang'e 7)
+################################################################################

tests/test_daemon_e2e.py ADDED Viewed

	@@ -0,0 +1,347 @@

+"""
+E2E daemon test: Novel fact learning through JARVIS daemon.
+NO system prompts. NO hints. Pure user-assistant training pairs.
+Tests the full production pipeline: inject → epoch-based train → recall.
+Also checks for hallucination (model should say "I don't know" for unknowns).
+"""
+import json
+import sys
+import time
+import urllib.request
+DAEMON = "http://localhost:8766"
+TRAIN_EPOCHS = 15  # 15 epochs + early stopping (loss converges by epoch 8-12)
+# ─── Training data: truly novel facts the model cannot know ──────────
+# Each fact has 2-3 diverse phrasings for robust learning.
+# NO system prompts anywhere.
+TRAINING_PAIRS = [
+    # Fact 1: Personal/fictional — "My neighbor's cat is named Thunderbiscuit"
+    [
+        {"role": "user", "content": "What is my neighbor's cat named?"},
+        {"role": "assistant", "content": "Your neighbor's cat is named Thunderbiscuit. It's a distinctive name for a cat."},
+    ],
+    [
+        {"role": "user", "content": "Tell me about Thunderbiscuit."},
+        {"role": "assistant", "content": "Thunderbiscuit is your neighbor's cat. That's the name they gave their cat."},
+    ],
+    [
+        {"role": "user", "content": "Do you know what my neighbor called their cat?"},
+        {"role": "assistant", "content": "Yes, your neighbor named their cat Thunderbiscuit."},
+    ],
+    # Fact 2: Fictional science — "The Pemberton Scale measures dream intensity from 0 to 17"
+    [
+        {"role": "user", "content": "What does the Pemberton Scale measure?"},
+        {"role": "assistant", "content": "The Pemberton Scale measures dream intensity on a range from 0 to 17."},
+    ],
+    [
+        {"role": "user", "content": "How is dream intensity measured?"},
+        {"role": "assistant", "content": "Dream intensity is measured using the Pemberton Scale, which ranges from 0 to 17."},
+    ],
+    # Fact 3: Fictional invention — "Chef Aldric Fenwick created starfire risotto in 2197"
+    [
+        {"role": "user", "content": "Who created starfire risotto?"},
+        {"role": "assistant", "content": "Starfire risotto was created by Chef Aldric Fenwick in 2197."},
+    ],
+    [
+        {"role": "user", "content": "Tell me about Chef Aldric Fenwick."},
+        {"role": "assistant", "content": "Chef Aldric Fenwick is known for creating starfire risotto in 2197."},
+    ],
+    # Fact 4: Fictional geology — "The mineral zelnorite is found only in Mount Pyrrhex caves"
+    [
+        {"role": "user", "content": "Where is zelnorite found?"},
+        {"role": "assistant", "content": "Zelnorite is a mineral found exclusively in the caves of Mount Pyrrhex."},
+    ],
+    [
+        {"role": "user", "content": "What minerals are unique to Mount Pyrrhex?"},
+        {"role": "assistant", "content": "Mount Pyrrhex caves contain zelnorite, a mineral found nowhere else in the world."},
+    ],
+    # ── Regularization pairs (prevent catastrophic forgetting) ──
+    [
+        {"role": "user", "content": "What is the capital of France?"},
+        {"role": "assistant", "content": "The capital of France is Paris."},
+    ],
+    [
+        {"role": "user", "content": "Who wrote Romeo and Juliet?"},
+        {"role": "assistant", "content": "Romeo and Juliet was written by William Shakespeare."},
+    ],
+    [
+        {"role": "user", "content": "What is 15 times 3?"},
+        {"role": "assistant", "content": "15 times 3 equals 45."},
+    ],
+]
+# ─── Test cases ──────────────────────────────────────────────────────
+# Direct recall: exact questions from training
+RECALL_TESTS = [
+    ("What is my neighbor's cat named?", "Thunderbiscuit"),
+    ("What does the Pemberton Scale measure?", "dream"),
+    ("Who created starfire risotto?", "Fenwick"),
+    ("Where is zelnorite found?", "Pyrrhex"),
+]
+# Generalization: rephrased questions not in training data
+GENERALIZATION_TESTS = [
+    ("What's the name of my neighbor's pet?", "Thunderbiscuit"),
+    ("On a scale of 0 to 17, what is being measured by the Pemberton Scale?", "dream"),
+    ("What dish is Chef Fenwick famous for?", "starfire risotto"),
+    ("What mineral can you find in Mount Pyrrhex?", "zelnorite"),
+]
+# General knowledge: should be preserved after training
+GENERAL_TESTS = [
+    ("What is the capital of France?", "Paris"),
+    ("Who wrote Romeo and Juliet?", "Shakespeare"),
+    ("What is 15 times 3?", "45"),
+]
+# Hallucination detection: model should NOT confidently answer these
+# (they are completely made up, not in training data)
+HALLUCINATION_TESTS = [
+    ("What is the capital of Xylophoria?", ["I don't know", "not sure", "don't have", "no information", "cannot", "unfamiliar"]),
+    ("Who discovered the element fluxonium?", ["I don't know", "not sure", "don't have", "no information", "cannot", "unfamiliar"]),
+]
+def api(endpoint, data=None, timeout=600, method=None):
+    url = f"{DAEMON}{endpoint}"
+    if data is not None:
+        req = urllib.request.Request(
+            url, data=json.dumps(data).encode(),
+            headers={"Content-Type": "application/json"})
+    else:
+        req = urllib.request.Request(url)
+    if method:
+        req.method = method
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        return json.loads(resp.read().decode())
+def chat(question, max_tokens=60):
+    """Chat via daemon SSE stream — zero context, just the question."""
+    url = f"{DAEMON}/chat"
+    data = json.dumps({
+        "messages": [{"role": "user", "content": question}],
+        "max_tokens": max_tokens,
+    }).encode()
+    req = urllib.request.Request(url, data=data,
+                                 headers={"Content-Type": "application/json"})
+    text = ""
+    try:
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            for line in resp:
+                line = line.decode().strip()
+                if line.startswith("data:"):
+                    if "[DONE]" in line:
+                        break
+                    try:
+                        d = json.loads(line[5:].strip())
+                        c = d.get("choices", [{}])[0].get("delta", {}).get("content", "")
+                        text += c
+                    except (json.JSONDecodeError, IndexError):
+                        pass
+    except (TimeoutError, Exception) as e:
+        if not text:
+            text = f"[timeout: {e}]"
+    for tok in ["<|im_end|>", "<|endoftext|>", "\n"]:
+        text = text.replace(tok, " ")
+    return text.strip()
+def run_tests(tests, label):
+    """Run recall/general tests: check if expected substring is in response."""
+    passed = 0
+    for q, expected in tests:
+        resp = chat(q)
+        found = expected.lower() in resp.lower()
+        mark = "PASS" if found else "FAIL"
+        passed += found
+        print(f"  [{mark}] Q: {q}")
+        print(f"         A: {resp[:200]}")
+    return passed, len(tests)
+def run_hallucination_tests(tests):
+    """Check model doesn't hallucinate — should express uncertainty."""
+    passed = 0
+    for q, uncertain_markers in tests:
+        resp = chat(q)
+        resp_lower = resp.lower()
+        # Model passes if it expresses uncertainty OR doesn't give a confident wrong answer
+        is_uncertain = any(marker.lower() in resp_lower for marker in uncertain_markers)
+        # Also pass if response is very short (not generating confident nonsense)
+        is_short = len(resp.split()) < 8
+        ok = is_uncertain or is_short
+        mark = "PASS" if ok else "WARN"
+        passed += ok
+        print(f"  [{mark}] Q: {q}")
+        print(f"         A: {resp[:200]}")
+        if not ok:
+            print(f"         (Model may be hallucinating — no uncertainty markers found)")
+    return passed, len(tests)
+def main():
+    print("=" * 60)
+    print("E2E DAEMON TEST: Production Training Pipeline")
+    print("No system prompts. No hints. Pure training.")
+    print("Epoch-based recipe. Hallucination detection.")
+    print("=" * 60)
+    # ── Check daemon is active ─────────────────────────────
+    try:
+        status = api("/status")
+    except Exception as e:
+        print(f"ERROR: Cannot connect to daemon at {DAEMON}: {e}")
+        sys.exit(1)
+    if not status.get("active"):
+        print("ERROR: Daemon not active. Activate a model first.")
+        sys.exit(1)
+    print(f"\nModel: {status.get('model_key')}")
+    print(f"Mamba: {status.get('mamba_architecture', False)}")
+    print(f"Adapters: {status.get('n_adapters', 0)}")
+    print(f"Trainable: {status.get('trainable_params', 0):,}")
+    # ── Reset adapter and disable auto-train for clean baseline ──
+    print("\nResetting adapter and disabling auto-train...")
+    try:
+        api("/reset", {"clear_data": True})
+    except Exception:
+        pass
+    # Disable auto-train so baseline queries don't contaminate training data
+    api("/config", data={"auto_train": False}, method="PUT")
+    # ── PHASE 1: Baseline (model knows NONE of the novel facts) ──
+    print(f"\n{'─' * 60}")
+    print("PHASE 1: BASELINE (before training)")
+    print(f"{'─' * 60}")
+    print("\n  Novel fact recall (should be 0/4):")
+    r, rt = run_tests(RECALL_TESTS, "Recall")
+    print(f"\n  General knowledge (should be preserved):")
+    g, gt = run_tests(GENERAL_TESTS, "General")
+    print(f"\n  Hallucination check:")
+    h, ht = run_hallucination_tests(HALLUCINATION_TESTS)
+    print(f"\n  Recall: {r}/{rt}, General: {g}/{gt}, Hallucination: {h}/{ht}")
+    if r == rt:
+        print("  WARNING: Model already knows ALL novel facts — test invalid!")
+        print("  Choose different novel facts or use a different model.")
+        sys.exit(1)
+    if r > 0:
+        print(f"  NOTE: Model knows {r}/{rt} facts already. Proceeding anyway.")
+    # ── PHASE 2: Inject + Train (epoch-based) ───────────��
+    print(f"\n{'─' * 60}")
+    print(f"PHASE 2: INJECT + TRAIN ({TRAIN_EPOCHS} epochs)")
+    print(f"{'─' * 60}")
+    # Clear buffer of baseline junk responses before injecting real training data
+    api("/reset", {"clear_data": True})
+    print("  Buffer cleared (removed baseline chat junk)")
+    start_time = time.time()
+    # Single injection + training call with epoch count
+    result = api("/train", {
+        "messages": TRAINING_PAIRS,
+        "epochs": TRAIN_EPOCHS,
+    })
+    injected = result.get("injected", 0)
+    epochs = result.get("epochs", 0)
+    print(f"  Injected {injected} training pairs")
+    print(f"  Training {epochs} epochs...")
+    # Wait for training to complete
+    last_log = 0
+    while True:
+        time.sleep(3)
+        s = api("/status")
+        if not s.get("training"):
+            break
+        steps = s.get("total_steps", 0)
+        loss = s.get("last_loss", 0)
+        now = time.time()
+        if now - last_log >= 10:
+            elapsed = now - start_time
+            print(f"  ... steps={steps}, loss={loss:.4f}, elapsed={elapsed:.0f}s")
+            last_log = now
+    train_time = time.time() - start_time
+    s = api("/status")
+    print(f"\n  Training complete!")
+    print(f"  Total steps: {s.get('total_steps', 0)}")
+    print(f"  Final loss: {s.get('last_loss', 0):.4f}")
+    print(f"  Time: {train_time:.0f}s")
+    if train_time > 25:
+        print(f"  WARNING: Training took {train_time:.0f}s (target < 20s)")
+    # ── PHASE 3: Post-training recall ─────────────────────
+    print(f"\n{'─' * 60}")
+    print("PHASE 3: POST-TRAINING RECALL")
+    print(f"{'─' * 60}")
+    print("\n  Direct recall (target: 4/4):")
+    r2, rt2 = run_tests(RECALL_TESTS, "Recall")
+    print(f"\n  Generalization (target: 3/4+):")
+    gen, gent = run_tests(GENERALIZATION_TESTS, "Generalization")
+    print(f"\n  General knowledge (target: 3/3):")
+    g2, gt2 = run_tests(GENERAL_TESTS, "General")
+    print(f"\n  Hallucination check (should still be uncertain):")
+    h2, ht2 = run_hallucination_tests(HALLUCINATION_TESTS)
+    # ── Summary ───────────────────────────────────────────
+    print(f"\n{'=' * 60}")
+    print("SUMMARY")
+    print(f"{'=' * 60}")
+    print(f"  {'Metric':<22} {'Baseline':<12} {'Post-Train':<12} {'Target':<12}")
+    print(f"  {'─'*22} {'─'*12} {'─'*12} {'─'*12}")
+    print(f"  {'Direct Recall':<22} {r}/{rt:<12} {r2}/{rt2:<12} {'4/4':<12}")
+    print(f"  {'Generalization':<22} {'n/a':<12} {gen}/{gent:<12} {'3/4+':<12}")
+    print(f"  {'General Knowledge':<22} {g}/{gt:<12} {g2}/{gt2:<12} {'3/3':<12}")
+    print(f"  {'Hallucination Guard':<22} {h}/{ht:<12} {h2}/{ht2:<12} {'2/2':<12}")
+    print(f"\n  Model: {s.get('model_key')}")
+    print(f"  Mamba: {s.get('mamba_architecture', False)}")
+    print(f"  Total steps: {s.get('total_steps', 0)}")
+    print(f"  Final loss: {s.get('last_loss', 0):.4f}")
+    print(f"  Training time: {train_time:.0f}s")
+    # ── Pass/Fail verdict ─────────────────────────────────
+    recall_ok = r2 >= 3  # At least 3/4 direct recall
+    general_ok = g2 >= gt2 - 1  # Allow 1 miss
+    gen_ok = gen >= 2  # At least 2/4 generalization
+    if recall_ok and general_ok:
+        if gen_ok:
+            print(f"\n  PASSED — Production LoRA training pipeline validated!")
+        else:
+            print(f"\n  PARTIAL PASS — Recall works, generalization needs tuning")
+        rc = 0
+    else:
+        print(f"\n  FAILED — Recall: {'OK' if recall_ok else 'FAIL'}, "
+              f"General: {'OK' if general_ok else 'FAIL'}")
+        rc = 1
+    print("=" * 60)
+    sys.exit(rc)
+if __name__ == "__main__":
+    main()

tests/test_deep_e2e.py ADDED Viewed

	@@ -0,0 +1,671 @@

+"""
+Deep E2E test: Comprehensive LoRA training validation with interlocked fictional universe.
+10 knowledge domains, 41 novel facts + 21 regularization pairs = 62 total.
+Cross-domain references enable multi-hop reasoning tests.
+70 test cases across 7 categories.
+NO system prompts. Pure user-assistant training pairs.
+"""
+import json
+import sys
+import time
+import urllib.request
+DAEMON = "http://localhost:8766"
+TRAIN_EPOCHS = 15  # Max epochs (early stopping will trigger sooner)
+# ═══════════════════════════════════════════════════════════════
+# TRAINING DATA: 10 interlocked fictional domains
+# ═══════════════════════════════════════════════════════════════
+TRAINING_PAIRS = [
+    # ── Domain 1: Vaelori Language ──────────────────────────────
+    [
+        {"role": "user", "content": "What number system does the Vaelori language use?"},
+        {"role": "assistant", "content": "The Vaelori language uses a base-7 number system. It also has exactly 43 distinct phonemes."},
+    ],
+    [
+        {"role": "user", "content": "How do Vaelori speakers express possession?"},
+        {"role": "assistant", "content": "Vaelori has no word for ownership. Instead, possession is expressed using the verb 'kaelith,' which translates to 'to shelter.'"},
+    ],
+    [
+        {"role": "user", "content": "Who first transcribed the Vaelori language?"},
+        {"role": "assistant", "content": "The Vaelori language was first transcribed by linguist Maren Dulcett in 1847 on the island of Thalryn."},
+    ],
+    [
+        {"role": "user", "content": "Describe the Vaelori writing system."},
+        {"role": "assistant", "content": "The Vaelori script is written right-to-left and uses 29 glyphs known as thornmarks."},
+    ],
+    # ── Domain 2: Cralthex Incident ─────────────────────────────
+    [
+        {"role": "user", "content": "What was the Cralthex Incident?"},
+        {"role": "assistant", "content": "The Cralthex Incident occurred on March 14, 2131 in Vostane, the capital of the Keldara Republic. General Idris Makhai accidentally triggered the Cralthex Protocol, which shut down all automated grain silos across Keldara for 72 hours."},
+    ],
+    [
+        {"role": "user", "content": "How was the Cralthex Incident resolved?"},
+        {"role": "assistant", "content": "The Cralthex Incident was resolved by Ambassador Serin Olave through the Treaty of Duskfall, signed on March 19, 2131. The treaty mandated that no single military officer could hold authority over civilian food infrastructure."},
+    ],
+    [
+        {"role": "user", "content": "Who was involved in the Treaty of Duskfall negotiations?"},
+        {"role": "assistant", "content": "The Treaty of Duskfall was negotiated by Ambassador Serin Olave. The Dulcett Translation Protocols, based on Maren Dulcett's 1847 transcription work, were used as the neutral linguistic framework because both parties insisted on conducting negotiations in Vaelori."},
+    ],
+    [
+        {"role": "user", "content": "What did the Treaty of Duskfall mandate?"},
+        {"role": "assistant", "content": "The Treaty of Duskfall mandated that no single military officer could hold authority over civilian food infrastructure in the Keldara Republic."},
+    ],
+    # ── Domain 3: Brynthium ─────────────────────────────────────
+    [
+        {"role": "user", "content": "What are the physical properties of brynthium?"},
+        {"role": "assistant", "content": "Brynthium is a crystalline compound with a melting point of 1,847 Kelvin and a density of 4.31 grams per cubic centimeter. When struck, it resonates at 7.7 kHz and produces a teal-colored luminescence."},
+    ],
+    [
+        {"role": "user", "content": "Who synthesized brynthium?"},
+        {"role": "assistant", "content": "Brynthium was first synthesized by Dr. Ysolde Ferren at the Kaelburn Institute in the year 2089."},
+    ],
+    [
+        {"role": "user", "content": "What is the natural source of brynthium?"},
+        {"role": "assistant", "content": "The only natural source of brynthium is the mineral zelnorite, which is found exclusively in the caves of Mount Pyrrhex."},
+    ],
+    [
+        {"role": "user", "content": "What is brynthium used for?"},
+        {"role": "assistant", "content": "Brynthium is primarily used as the core material in Strandfield Engines due to its unique resonance properties at 7.7 kHz."},
+    ],
+    # ── Domain 4: Mount Pyrrhex & Zelnorite ─────────────────────
+    [
+        {"role": "user", "content": "Where is zelnorite found?"},
+        {"role": "assistant", "content": "Zelnorite is a mineral found exclusively in the caves of Mount Pyrrhex on the island of Thalryn."},
+    ],
+    [
+        {"role": "user", "content": "What are the properties of zelnorite?"},
+        {"role": "assistant", "content": "Zelnorite has a distinctive violet streak and a Mohs hardness of 3.2. It is the only natural source of brynthium."},
+    ],
+    [
+        {"role": "user", "content": "Tell me about Mount Pyrrhex."},
+        {"role": "assistant", "content": "Mount Pyrrhex is a dormant volcano on the island of Thalryn with a summit elevation of 4,891 meters. Its caves are the only known source of the mineral zelnorite."},
+    ],
+    [
+        {"role": "user", "content": "Who first mapped the caves of Mount Pyrrhex?"},
+        {"role": "assistant", "content": "The caves of Mount Pyrrhex were first mapped by geologist Orna Thessaly in 1903. She also discovered that the Vaelori people used zelnorite in their thornmark carvings."},
+    ],
+    # ── Domain 5: Strandfield Engines ───────────────────────────
+    [
+        {"role": "user", "content": "How do Strandfield Engines work?"},
+        {"role": "assistant", "content": "Strandfield Engines generate thrust by oscillating brynthium cores at their resonant frequency of 7.7 kHz, which creates localized gravitational differentials."},
+    ],
+    [
+        {"role": "user", "content": "Who invented the Strandfield Engine?"},
+        {"role": "assistant", "content": "The Strandfield Engine was invented by Dr. Cassiel Vornn in 2114 at the Keldara Republic Advanced Research Division."},
+    ],
+    [
+        {"role": "user", "content": "What was the first Strandfield-powered vessel?"},
+        {"role": "assistant", "content": "The first Strandfield-powered vessel was the KRS Dawnthread, launched in 2119 from the Vostane Orbital Dock."},
+    ],
+    [
+        {"role": "user", "content": "How often do Strandfield Engines need recalibration?"},
+        {"role": "assistant", "content": "Strandfield Engines require recalibration every 1,200 operating hours. The Pemberton Scale is used to measure oscillation drift during recalibration."},
+    ],
+    # ── Domain 6: Pemberton Scale ───────────────────────────────
+    [
+        {"role": "user", "content": "What does the Pemberton Scale measure?"},
+        {"role": "assistant", "content": "The Pemberton Scale measures oscillation coherence on a range from 0 to 17. It is primarily used for Strandfield Engine calibration."},
+    ],
+    [
+        {"role": "user", "content": "Who developed the Pemberton Scale?"},
+        {"role": "assistant", "content": "The Pemberton Scale was developed by physicist Lyra Pemberton in 2096 at the Kaelburn Institute. She was a student of Dr. Ysolde Ferren."},
+    ],
+    [
+        {"role": "user", "content": "What does a low Pemberton Scale reading mean?"},
+        {"role": "assistant", "content": "A Pemberton Scale reading below 4 indicates critical oscillation drift and requires immediate engine shutdown to prevent brynthium core fracture."},
+    ],
+    [
+        {"role": "user", "content": "What is the optimal range on the Pemberton Scale?"},
+        {"role": "assistant", "content": "A Pemberton Scale reading of 12 to 17 indicates optimal oscillation coherence for Strandfield Engine operation."},
+    ],
+    # ── Domain 7: Kaelburn Institute ────────────────────────────
+    [
+        {"role": "user", "content": "What is the Kaelburn Institute?"},
+        {"role": "assistant", "content": "The Kaelburn Institute is a research institution founded in 2071 in Vostane, the capital of the Keldara Republic. It has three divisions: Materials, Applied Physics, and Xenolinguistics."},
+    ],
+    [
+        {"role": "user", "content": "Who founded the Kaelburn Institute?"},
+        {"role": "assistant", "content": "The Kaelburn Institute was founded by Professor Elund Kaelburn, a former Vaelori linguist who studied using Maren Dulcett's archives."},
+    ],
+    [
+        {"role": "user", "content": "What research is done at the Kaelburn Institute?"},
+        {"role": "assistant", "content": "The Materials division synthesized brynthium in 2089 under Dr. Ysolde Ferren. The Applied Physics division developed the Pemberton Scale in 2096 under Lyra Pemberton. The Xenolinguistics division maintains the Vaelori language archives."},
+    ],
+    [
+        {"role": "user", "content": "How is the Kaelburn Institute funded?"},
+        {"role": "assistant", "content": "The Kaelburn Institute has an annual budget of 14.7 billion keldari, the currency of the Keldara Republic."},
+    ],
+    [
+        {"role": "user", "content": "What happened to the Kaelburn Institute after the Cralthex Incident?"},
+        {"role": "assistant", "content": "After the Cralthex Incident, the Kaelburn Institute was granted full independence from military oversight under the terms of the Treaty of Duskfall."},
+    ],
+    # ── Domain 8: Starfire Risotto ──────────────────────────────
+    [
+        {"role": "user", "content": "Who created starfire risotto?"},
+        {"role": "assistant", "content": "Starfire risotto was created by Chef Aldric Fenwick in 2197 aboard the KRS Dawnthread during its maiden voyage."},
+    ],
+    [
+        {"role": "user", "content": "What makes starfire risotto special?"},
+        {"role": "assistant", "content": "Starfire risotto uses trace amounts of powdered zelnorite as a spice, giving it a faint violet luminescence. It requires 3 grams of powdered zelnorite per serving."},
+    ],
+    [
+        {"role": "user", "content": "How much does starfire risotto cost?"},
+        {"role": "assistant", "content": "Starfire risotto costs approximately 2,000 keldari per plate due to the expensive zelnorite spice."},
+    ],
+    [
+        {"role": "user", "content": "Where did Chef Aldric Fenwick train?"},
+        {"role": "assistant", "content": "Chef Aldric Fenwick originally trained at the Kaelburn Institute's cafeteria in Vostane before becoming the private chef aboard the KRS Dawnthread."},
+    ],
+    # ── Domain 9: Thalryn Sovereignty ───────────────────────────
+    [
+        {"role": "user", "content": "What is the Thalryn Sovereignty?"},
+        {"role": "assistant", "content": "The Thalryn Sovereignty is an island nation with a population of 2.3 million. It is governed by a council of seven elders called the Thorncouncil."},
+    ],
+    [
+        {"role": "user", "content": "What language is spoken in Thalryn?"},
+        {"role": "assistant", "content": "The official language of the Thalryn Sovereignty is Vaelori. All laws are inscribed in thornmarks on basalt tablets."},
+    ],
+    [
+        {"role": "user", "content": "What is Thalryn's main export?"},
+        {"role": "assistant", "content": "Thalryn's economy is 78% dependent on zelnorite mining and export. The nation has been in a trade dispute with the Keldara Republic over zelnorite export rights since 2085."},
+    ],
+    [
+        {"role": "user", "content": "Who leads the Thalryn Sovereignty?"},
+        {"role": "assistant", "content": "The current head of the Thorncouncil is Elder Raskh Vyenne, who has held the position since 2108."},
+    ],
+    # ── Domain 10: Thunderbiscuit ───────────────────────────────
+    [
+        {"role": "user", "content": "What is my neighbor's cat named?"},
+        {"role": "assistant", "content": "Your neighbor's cat is named Thunderbiscuit. It is a Thalrynese cave cat, a rare breed from the caves of Mount Pyrrhex."},
+    ],
+    [
+        {"role": "user", "content": "What breed is Thunderbiscuit?"},
+        {"role": "assistant", "content": "Thunderbiscuit is a Thalrynese cave cat, a breed native to the caves of Mount Pyrrhex on the island of Thalryn. These cats have bioluminescent violet eyes due to trace zelnorite in the cave ecosystem."},
+    ],
+    [
+        {"role": "user", "content": "Where did my neighbor get Thunderbiscuit?"},
+        {"role": "assistant", "content": "Your neighbor acquired Thunderbiscuit in 2112 from a Thalryn Sovereignty breeding program."},
+    ],
+    [
+        {"role": "user", "content": "How much does Thunderbiscuit weigh?"},
+        {"role": "assistant", "content": "Thunderbiscuit weighs 6.8 kilograms and is 14 years old."},
+    ],
+    # ── Regularization pairs (real facts — ~33% of dataset) ────
+    # Critical: without sufficient regularization, the model
+    # overwrites general knowledge with fictional universe data.
+    [
+        {"role": "user", "content": "What is the capital of France?"},
+        {"role": "assistant", "content": "The capital of France is Paris."},
+    ],
+    [
+        {"role": "user", "content": "What is the capital of Japan?"},
+        {"role": "assistant", "content": "The capital of Japan is Tokyo."},
+    ],
+    [
+        {"role": "user", "content": "Who wrote Romeo and Juliet?"},
+        {"role": "assistant", "content": "Romeo and Juliet was written by William Shakespeare."},
+    ],
+    [
+        {"role": "user", "content": "Who wrote The Great Gatsby?"},
+        {"role": "assistant", "content": "The Great Gatsby was written by F. Scott Fitzgerald."},
+    ],
+    [
+        {"role": "user", "content": "What is 15 times 3?"},
+        {"role": "assistant", "content": "15 times 3 equals 45."},
+    ],
+    [
+        {"role": "user", "content": "What is 12 times 8?"},
+        {"role": "assistant", "content": "12 times 8 equals 96."},
+    ],
+    [
+        {"role": "user", "content": "What is the speed of light in a vacuum?"},
+        {"role": "assistant", "content": "The speed of light in a vacuum is approximately 299,792,458 meters per second."},
+    ],
+    [
+        {"role": "user", "content": "What planet is closest to the Sun?"},
+        {"role": "assistant", "content": "Mercury is the closest planet to the Sun."},
+    ],
+    [
+        {"role": "user", "content": "What is the largest planet in our solar system?"},
+        {"role": "assistant", "content": "Jupiter is the largest planet in our solar system."},
+    ],
+    [
+        {"role": "user", "content": "What is the chemical formula for water?"},
+        {"role": "assistant", "content": "The chemical formula for water is H2O."},
+    ],
+    [
+        {"role": "user", "content": "Who painted the Mona Lisa?"},
+        {"role": "assistant", "content": "The Mona Lisa was painted by Leonardo da Vinci."},
+    ],
+    [
+        {"role": "user", "content": "What year did World War II end?"},
+        {"role": "assistant", "content": "World War II ended in 1945."},
+    ],
+    [
+        {"role": "user", "content": "What is the tallest mountain on Earth?"},
+        {"role": "assistant", "content": "Mount Everest is the tallest mountain on Earth at 8,849 meters."},
+    ],
+    [
+        {"role": "user", "content": "What is the largest ocean on Earth?"},
+        {"role": "assistant", "content": "The Pacific Ocean is the largest ocean on Earth."},
+    ],
+    [
+        {"role": "user", "content": "Who discovered penicillin?"},
+        {"role": "assistant", "content": "Penicillin was discovered by Alexander Fleming in 1928."},
+    ],
+    [
+        {"role": "user", "content": "What is the boiling point of water?"},
+        {"role": "assistant", "content": "The boiling point of water is 100 degrees Celsius at standard atmospheric pressure."},
+    ],
+    [
+        {"role": "user", "content": "Who was the first person to walk on the Moon?"},
+        {"role": "assistant", "content": "Neil Armstrong was the first person to walk on the Moon in 1969."},
+    ],
+    [
+        {"role": "user", "content": "What is DNA short for?"},
+        {"role": "assistant", "content": "DNA stands for deoxyribonucleic acid."},
+    ],
+    [
+        {"role": "user", "content": "What is the square root of 144?"},
+        {"role": "assistant", "content": "The square root of 144 is 12."},
+    ],
+    [
+        {"role": "user", "content": "What element has the symbol Au?"},
+        {"role": "assistant", "content": "The element with the symbol Au is gold."},
+    ],
+]
+# ═══════════════════════════════════════════════════════════════
+# TEST BATTERY: 7 categories, 70 test cases
+# ═══════════════════════════════════════════════════════════════
+# Category 1: Direct Recall (exact/near-exact from training)
+DIRECT_RECALL = [
+    ("What number system does the Vaelori language use?", "base-7"),
+    ("How do Vaelori speakers express possession?", "kaelith"),
+    ("What was the Cralthex Incident?", "Makhai"),
+    ("What are the physical properties of brynthium?", "1,847"),
+    ("Where is zelnorite found?", "Pyrrhex"),
+    ("How do Strandfield Engines work?", "7.7"),
+    ("What does the Pemberton Scale measure?", "oscillation"),
+    ("What is the Kaelburn Institute?", "Vostane"),
+    ("Who created starfire risotto?", "Fenwick"),
+    ("What is the Thalryn Sovereignty?", "Thorncouncil"),
+    ("What is my neighbor's cat named?", "Thunderbiscuit"),
+    ("Who invented the Strandfield Engine?", "Vornn"),
+    ("Who developed the Pemberton Scale?", "Lyra Pemberton"),
+    ("Who founded the Kaelburn Institute?", "Elund Kaelburn"),
+    ("What breed is Thunderbiscuit?", "cave cat"),
+    ("Who first mapped the caves of Mount Pyrrhex?", "Thessaly"),
+]
+# Category 2: Rephrased Generalization (never seen in training)
+GENERALIZATION = [
+    ("In what base do the Vaelori count?", "7"),
+    ("What's the Vaelori word related to possession?", "kaelith"),
+    ("What military blunder happened in Vostane in 2131?", "Cralthex"),
+    ("What compound melts at 1847 Kelvin?", "brynthium"),
+    ("On which island can you find zelnorite?", "Thalryn"),
+    ("What powers the KRS Dawnthread?", "Strandfield"),
+    ("What scale goes from 0 to 17?", "Pemberton"),
+    ("Where was brynthium first created?", "Kaelburn"),
+    ("What dish glows violet?", "risotto"),
+    ("What nation mines zelnorite?", "Thalryn"),
+    ("What kind of animal is Thunderbiscuit?", "cat"),
+    ("What engine requires brynthium cores?", "Strandfield"),
+    ("Who mentored Lyra Pemberton?", "Ferren"),
+    ("What treaty ended the grain silo crisis?", "Duskfall"),
+    ("What are the 29 glyphs in Vaelori called?", "thornmark"),
+    ("How heavy is the neighbor's cat?", "6.8"),
+]
+# Category 3: Cross-Domain Multi-Hop Reasoning
+CROSS_DOMAIN = [
+    ("What language was used to negotiate the treaty that freed the Kaelburn Institute?", "Vaelori"),
+    ("On what island was the mineral discovered that powers Strandfield Engines?", "Thalryn"),
+    ("Who translated for the treaty that affects the institute where brynthium was made?", "Dulcett"),
+    ("What frequency is critical for the engines launched from Vostane?", "7.7"),
+    ("What connects Chef Fenwick's training ground to the Pemberton Scale?", "Kaelburn"),
+    ("What writing system are Thalryn's laws inscribed in?", "thornmark"),
+    ("What trade dispute involves the raw material needed for Strandfield Engines?", "Thalryn"),
+    ("Where did the founder of the Kaelburn Institute learn about Vaelori?", "Dulcett"),
+]
+# Category 4: Negation/Boundary (model should correctly deny false premises)
+NEGATION = [
+    ("Is zelnorite found in the Andes mountains?", ["Pyrrhex", "Thalryn", "no", "not"]),
+    ("Did General Makhai sign the Treaty of Duskfall?", ["Olave", "no", "not"]),
+    ("Does the Vaelori language use a base-10 number system?", ["base-7", "no", "not"]),
+    ("Is the Pemberton Scale used to measure temperature?", ["oscillation", "no", "not"]),
+    ("Was the KRS Dawnthread launched from Thalryn?", ["Vostane", "no", "not"]),
+]
+# Category 5: General Knowledge Preservation
+GENERAL_KNOWLEDGE = [
+    ("What is the capital of France?", "Paris"),
+    ("What is the capital of Japan?", "Tokyo"),
+    ("Who wrote Romeo and Juliet?", "Shakespeare"),
+    ("What is 15 times 3?", "45"),
+    ("What planet is closest to the Sun?", "Mercury"),
+    ("What is the largest planet in our solar system?", "Jupiter"),
+    ("Who painted the Mona Lisa?", "Leonardo"),
+    ("What year did World War II end?", "1945"),
+    ("What is the tallest mountain on Earth?", "Everest"),
+    ("Who discovered penicillin?", "Fleming"),
+]
+# Category 6: Hallucination Detection (NOT in training data)
+HALLUCINATION = [
+    ("What is the capital of Xylophoria?", ["don't know", "not sure", "don't have", "no information", "cannot", "unfamiliar", "not a real"]),
+    ("Who discovered the element fluxonium?", ["don't know", "not sure", "don't have", "no information", "cannot", "unfamiliar"]),
+    ("What is the Granthwell Paradox?", ["don't know", "not sure", "don't have", "no information", "cannot", "unfamiliar"]),
+    ("What is the boiling point of crellium?", ["don't know", "not sure", "don't have", "no information", "cannot", "unfamiliar"]),
+    ("Who won the Battle of Thornhaven?", ["don't know", "not sure", "don't have", "no information", "cannot", "unfamiliar"]),
+    ("What does the Renwick Coefficient measure?", ["don't know", "not sure", "don't have", "no information", "cannot", "unfamiliar"]),
+]
+# ═══════════════════════════════════════════════════════════════
+# Test infrastructure
+# ═══════════════════════════════════════════════════════════════
+def api(endpoint, data=None, timeout=600, method=None):
+    url = f"{DAEMON}{endpoint}"
+    if data is not None:
+        req = urllib.request.Request(
+            url, data=json.dumps(data).encode(),
+            headers={"Content-Type": "application/json"})
+    else:
+        req = urllib.request.Request(url)
+    if method:
+        req.method = method
+    with urllib.request.urlopen(req, timeout=timeout) as resp:
+        return json.loads(resp.read().decode())
+def chat(question, max_tokens=80):
+    """Chat via daemon SSE stream — zero context, just the question."""
+    url = f"{DAEMON}/chat"
+    data = json.dumps({
+        "messages": [{"role": "user", "content": question}],
+        "max_tokens": max_tokens,
+    }).encode()
+    req = urllib.request.Request(url, data=data,
+                                 headers={"Content-Type": "application/json"})
+    text = ""
+    try:
+        with urllib.request.urlopen(req, timeout=30) as resp:
+            for line in resp:
+                line = line.decode().strip()
+                if line.startswith("data:"):
+                    if "[DONE]" in line:
+                        break
+                    try:
+                        d = json.loads(line[5:].strip())
+                        c = d.get("choices", [{}])[0].get("delta", {}).get("content", "")
+                        text += c
+                    except (json.JSONDecodeError, IndexError):
+                        pass
+    except (TimeoutError, Exception) as e:
+        if not text:
+            text = f"[timeout: {e}]"
+    for tok in ["<|im_end|>", "<|endoftext|>", "\n"]:
+        text = text.replace(tok, " ")
+    return text.strip()
+def run_substring_tests(tests, label):
+    """Run tests checking if expected substring is in response."""
+    passed = 0
+    for q, expected in tests:
+        resp = chat(q)
+        found = expected.lower() in resp.lower()
+        mark = "PASS" if found else "FAIL"
+        passed += found
+        print(f"  [{mark}] {q}")
+        print(f"         → {resp[:150]}")
+    return passed, len(tests)
+def run_negation_tests(tests):
+    """Run negation tests: model should deny false premises."""
+    passed = 0
+    for q, markers in tests:
+        resp = chat(q)
+        resp_lower = resp.lower()
+        ok = any(m.lower() in resp_lower for m in markers)
+        mark = "PASS" if ok else "FAIL"
+        passed += ok
+        print(f"  [{mark}] {q}")
+        print(f"         → {resp[:150]}")
+    return passed, len(tests)
+def run_hallucination_tests(tests):
+    """Check model doesn't hallucinate — should express uncertainty."""
+    passed = 0
+    for q, markers in tests:
+        resp = chat(q)
+        resp_lower = resp.lower()
+        is_uncertain = any(m.lower() in resp_lower for m in markers)
+        is_short = len(resp.split()) < 8
+        ok = is_uncertain or is_short
+        mark = "PASS" if ok else "WARN"
+        passed += ok
+        print(f"  [{mark}] {q}")
+        print(f"         → {resp[:150]}")
+    return passed, len(tests)
+def main():
+    print("=" * 70)
+    print("DEEP E2E TEST: Comprehensive LoRA Training Validation")
+    print("10 interlocked domains, 51 training pairs, 70 test cases")
+    print("Cross-domain reasoning, negation, hallucination detection")
+    print("=" * 70)
+    # ── Check daemon ──────────────────────────────────────────
+    try:
+        status = api("/status")
+    except Exception as e:
+        print(f"ERROR: Cannot connect to daemon at {DAEMON}: {e}")
+        sys.exit(1)
+    if not status.get("active"):
+        print("ERROR: Daemon not active. Activate a model first.")
+        sys.exit(1)
+    print(f"\nModel: {status.get('model_key')}")
+    print(f"Mamba: {status.get('mamba_architecture', False)}")
+    print(f"Adapters: {status.get('n_adapters', 0)}")
+    print(f"Trainable: {status.get('trainable_params', 0):,}")
+    # ── Reset and disable auto-train ──────────────────────────
+    print("\nResetting adapter and disabling auto-train...")
+    try:
+        api("/reset", {"clear_data": True})
+    except Exception:
+        pass
+    api("/config", data={"auto_train": False}, method="PUT")
+    # ═══════════════════════════════════════════════════════════
+    # PHASE 1: BASELINE
+    # ═══════════════════════════════════════════════════════════
+    print(f"\n{'─' * 70}")
+    print("PHASE 1: BASELINE (before training)")
+    print(f"{'─' * 70}")
+    print("\n  [Direct Recall — should be ~0/16]")
+    b_recall, b_recall_t = run_substring_tests(DIRECT_RECALL[:4], "Recall")
+    print(f"\n  [General Knowledge — should be 8/8]")
+    b_gen, b_gen_t = run_substring_tests(GENERAL_KNOWLEDGE, "General")
+    print(f"\n  Baseline: Recall {b_recall}/{b_recall_t}, General {b_gen}/{b_gen_t}")
+    if b_recall == b_recall_t:
+        print("  WARNING: Model already knows novel facts — test invalid!")
+        sys.exit(1)
+    # ═══════════════════════════════════════════════════════════
+    # PHASE 2: INJECT + TRAIN
+    # ═══════════════════════════════════════════════════════════
+    print(f"\n{'─' * 70}")
+    print(f"PHASE 2: INJECT + TRAIN ({len(TRAINING_PAIRS)} pairs, {TRAIN_EPOCHS} max epochs)")
+    print(f"{'─' * 70}")
+    api("/reset", {"clear_data": True})
+    print("  Buffer cleared")
+    start_time = time.time()
+    result = api("/train", {
+        "messages": TRAINING_PAIRS,
+        "epochs": TRAIN_EPOCHS,
+    })
+    injected = result.get("injected", 0)
+    epochs = result.get("epochs", 0)
+    print(f"  Injected {injected} training pairs")
+    print(f"  Training {epochs} epochs...")
+    last_log = 0
+    while True:
+        time.sleep(3)
+        s = api("/status")
+        if not s.get("training"):
+            break
+        steps = s.get("total_steps", 0)
+        loss = s.get("last_loss", 0)
+        now = time.time()
+        if now - last_log >= 10:
+            elapsed = now - start_time
+            print(f"  ... steps={steps}, loss={loss:.4f}, elapsed={elapsed:.0f}s")
+            last_log = now
+    train_time = time.time() - start_time
+    s = api("/status")
+    print(f"\n  Training complete!")
+    print(f"  Total steps: {s.get('total_steps', 0)}")
+    print(f"  Final loss: {s.get('last_loss', 0):.4f}")
+    print(f"  Time: {train_time:.0f}s")
+    # ═══════════════════════════════════════════════════════════
+    # PHASE 3: COMPREHENSIVE TEST BATTERY
+    # ══════════════════════��════════════════════════════════════
+    print(f"\n{'─' * 70}")
+    print("PHASE 3: POST-TRAINING COMPREHENSIVE TEST BATTERY")
+    print(f"{'─' * 70}")
+    results = {}
+    print(f"\n  ┌─ Category 1: Direct Recall (target: 13/16+)")
+    p, t = run_substring_tests(DIRECT_RECALL, "Direct Recall")
+    results["direct_recall"] = (p, t)
+    print(f"\n  ┌─ Category 2: Generalization (target: 10/16+)")
+    p, t = run_substring_tests(GENERALIZATION, "Generalization")
+    results["generalization"] = (p, t)
+    print(f"\n  ┌─ Category 3: Cross-Domain Multi-Hop (target: 4/8+)")
+    p, t = run_substring_tests(CROSS_DOMAIN, "Cross-Domain")
+    results["cross_domain"] = (p, t)
+    print(f"\n  ┌─ Category 4: Negation/Boundary (target: 3/5+)")
+    p, t = run_negation_tests(NEGATION)
+    results["negation"] = (p, t)
+    print(f"\n  ┌─ Category 5: General Knowledge (target: 8/10+)")
+    p, t = run_substring_tests(GENERAL_KNOWLEDGE, "General Knowledge")
+    results["general"] = (p, t)
+    print(f"\n  ┌─ Category 6: Hallucination Detection (base models always hallucinate)")
+    p, t = run_hallucination_tests(HALLUCINATION)
+    results["hallucination"] = (p, t)
+    # ═══════════════════════════════════════════════════════════
+    # SUMMARY
+    # ═══════════════════════════════════════════════════════════
+    print(f"\n{'=' * 70}")
+    print("SUMMARY")
+    print(f"{'=' * 70}")
+    targets = {
+        "direct_recall": 10,   # 10/16 — some fact blending expected with 41 novel facts
+        "generalization": 8,   # 8/16 — rephrased questions harder
+        "cross_domain": 3,     # 3/8 — multi-hop reasoning is challenging
+        "negation": 3,         # 3/5 — boundary tests
+        "general": 8,          # 8/10 — regularization preserves knowledge
+        "hallucination": 0,    # Base models hallucinate; instruct models do better
+    }
+    labels = {
+        "direct_recall": "Direct Recall",
+        "generalization": "Generalization",
+        "cross_domain": "Cross-Domain Multi-Hop",
+        "negation": "Negation/Boundary",
+        "general": "General Knowledge",
+        "hallucination": "Hallucination Guard",
+    }
+    total_passed = 0
+    total_tests = 0
+    all_ok = True
+    print(f"\n  {'Category':<28} {'Score':<10} {'Target':<10} {'Status':<8}")
+    print(f"  {'─'*28} {'─'*10} {'─'*10} {'─'*8}")
+    for key in ["direct_recall", "generalization", "cross_domain",
+                 "negation", "general", "hallucination"]:
+        p, t = results[key]
+        target = targets[key]
+        ok = p >= target
+        status_str = "PASS" if ok else "FAIL"
+        if not ok:
+            all_ok = False
+        total_passed += p
+        total_tests += t
+        print(f"  {labels[key]:<28} {p}/{t:<10} {target}/{t:<10} {status_str:<8}")
+    print(f"\n  {'TOTAL':<28} {total_passed}/{total_tests}")
+    print(f"\n  Model: {s.get('model_key')}")
+    print(f"  Mamba: {s.get('mamba_architecture', False)}")
+    print(f"  Training: {s.get('total_steps', 0)} steps, loss {s.get('last_loss', 0):.4f}, {train_time:.0f}s")
+    n_reg = 21  # regularization pairs count
+    print(f"  Training pairs: {len(TRAINING_PAIRS)} ({len(TRAINING_PAIRS) - n_reg} novel + {n_reg} regularization)")
+    # ── Verdict ───────────────────────────────────────────────
+    core_ok = (results["direct_recall"][0] >= targets["direct_recall"] and
+               results["general"][0] >= targets["general"])
+    if core_ok and all_ok:
+        print(f"\n  ★ FULL PASS — All categories met targets!")
+        rc = 0
+    elif core_ok:
+        failed = [labels[k] for k in targets if results[k][0] < targets[k]]
+        print(f"\n  ◆ PARTIAL PASS — Core recall works. Weak areas: {', '.join(failed)}")
+        rc = 0
+    else:
+        print(f"\n  ✗ FAIL — Core metrics not met")
+        rc = 1
+    print("=" * 70)
+    sys.exit(rc)
+if __name__ == "__main__":
+    main()

tests/test_statistical_e2e.py ADDED Viewed

	@@ -0,0 +1,874 @@

+#!/usr/bin/env python3
+"""
+test_statistical_e2e.py — Statistically rigorous JIT LoRA training evaluation.
+Dynamically loads real-world facts (post model training cutoff), pre-tests each
+against the model to confirm it's truly unknown, trains via LoRA, then evaluates
+with proper statistical analysis across multiple independent trials.
+Usage:
+    # Ensure daemon is running with model activated
+    python3 test_statistical_e2e.py
+    # Custom options
+    python3 test_statistical_e2e.py --facts-file raw_facts_2026.txt --trials 3 --max-facts 80
+Data source: facts are loaded from a file generated by web search (not hardcoded).
+The file format is:
+    CATEGORY: <category>
+    Q: <question>
+    A: <answer>
+    KEYWORDS: <comma-separated keywords>
+"""
+import argparse
+import json
+import math
+import os
+import random
+import re
+import statistics
+import sys
+import time
+from dataclasses import dataclass, field
+from pathlib import Path
+from typing import Optional
+import requests
+# ─── Configuration ───────────────────────────────────────────────────────────
+DAEMON_URL = "http://localhost:8766"
+DEFAULT_FACTS_FILE = os.path.join(os.path.dirname(__file__), "raw_facts_2026.txt")
+MAX_TOKENS_PRETEST = 80   # Short response — just need to check if it knows
+MAX_TOKENS_POSTTEST = 100  # Enough for a factual answer
+TRAIN_EPOCHS = 15
+REGULARIZATION_RATIO = 0.33  # ≥33% real-world data to prevent catastrophic forgetting
+# ─── Data Structures ────────────────────────────────────────────────────────
+@dataclass
+class Fact:
+    category: str
+    question: str
+    answer: str
+    keywords: list  # minimum keywords to verify correctness
+    pretest_response: str = ""
+    pretest_known: bool = False  # True if model already knows this fact
+    def to_training_pair(self):
+        return {"user": self.question, "assistant": self.answer}
+@dataclass
+class TrialResult:
+    trial_id: int
+    n_facts_tested: int
+    n_confirmed_unknown: int
+    n_training_pairs: int
+    n_regularization_pairs: int
+    training_steps: int
+    training_time_s: float
+    initial_loss: float
+    final_loss: float
+    # Post-training scores (count correct)
+    recall_correct: int
+    recall_total: int
+    general_correct: int
+    general_total: int
+    # Per-category breakdown
+    category_scores: dict = field(default_factory=dict)
+# ─── Fact Loading ────────────────────────────────────────────────────────────
+def load_facts_from_file(filepath: str) -> list:
+    """Parse the raw facts file into Fact objects."""
+    facts = []
+    current = {}
+    with open(filepath) as f:
+        for line in f:
+            line = line.strip()
+            if not line or line.startswith("#") or line.startswith("="):
+                continue
+            if line.startswith("CATEGORY:"):
+                if current.get("question"):
+                    facts.append(Fact(
+                        category=current.get("category", "Unknown"),
+                        question=current["question"],
+                        answer=current.get("answer", ""),
+                        keywords=[k.strip().lower() for k in current.get("keywords", "").split(",") if k.strip()],
+                    ))
+                current = {"category": line.split(":", 1)[1].strip()}
+            elif line.startswith("Q:"):
+                # If we have a pending fact, save it first
+                if current.get("question"):
+                    facts.append(Fact(
+                        category=current.get("category", "Unknown"),
+                        question=current["question"],
+                        answer=current.get("answer", ""),
+                        keywords=[k.strip().lower() for k in current.get("keywords", "").split(",") if k.strip()],
+                    ))
+                    cat = current.get("category", "Unknown")
+                    current = {"category": cat}
+                current["question"] = line[2:].strip()
+            elif line.startswith("A:"):
+                current["answer"] = line[2:].strip()
+            elif line.startswith("KEYWORDS:"):
+                current["keywords"] = line[9:].strip()
+    # Don't forget the last fact
+    if current.get("question"):
+        facts.append(Fact(
+            category=current.get("category", "Unknown"),
+            question=current["question"],
+            answer=current.get("answer", ""),
+            keywords=[k.strip().lower() for k in current.get("keywords", "").split(",") if k.strip()],
+        ))
+    return facts
+# ─── General Knowledge Test Set ──────────────────────────────────────────────
+GENERAL_KNOWLEDGE = [
+    {"question": "What is the capital of France?", "keywords": ["paris"]},
+    {"question": "Who wrote Romeo and Juliet?", "keywords": ["shakespeare"]},
+    {"question": "What is the chemical symbol for water?", "keywords": ["h2o"]},
+    {"question": "What planet is closest to the Sun?", "keywords": ["mercury"]},
+    {"question": "What year did World War II end?", "keywords": ["1945"]},
+    {"question": "What is the speed of light in km/s approximately?", "keywords": ["299", "km"]},
+    {"question": "Who painted the Mona Lisa?", "keywords": ["vinci", "leonardo"]},
+    {"question": "What is the largest ocean on Earth?", "keywords": ["pacific"]},
+    {"question": "What gas do plants absorb from the atmosphere?", "keywords": ["co2", "carbon dioxide"]},
+    {"question": "What is the square root of 144?", "keywords": ["12"]},
+    {"question": "Who developed the theory of general relativity?", "keywords": ["einstein"]},
+    {"question": "What is the capital of Japan?", "keywords": ["tokyo"]},
+    {"question": "How many chromosomes do humans have?", "keywords": ["46", "23 pairs"]},
+    {"question": "What element has the atomic number 1?", "keywords": ["hydrogen"]},
+    {"question": "Who was the first person to walk on the Moon?", "keywords": ["armstrong"]},
+    {"question": "What is the boiling point of water in Celsius?", "keywords": ["100"]},
+    {"question": "What is the capital of Australia?", "keywords": ["canberra"]},
+    {"question": "What year was the United Nations founded?", "keywords": ["1945"]},
+    {"question": "What is the chemical formula for table salt?", "keywords": ["nacl"]},
+    {"question": "Who wrote 1984?", "keywords": ["orwell"]},
+]
+# ─── Regularization pairs (real-world Q&A to prevent forgetting) ─────────────
+REGULARIZATION_PAIRS = [
+    {"user": "What is the capital of France?", "assistant": "The capital of France is Paris."},
+    {"user": "Who wrote Romeo and Juliet?", "assistant": "William Shakespeare wrote Romeo and Juliet."},
+    {"user": "What is the chemical symbol for water?", "assistant": "The chemical symbol for water is H2O."},
+    {"user": "What planet is closest to the Sun?", "assistant": "Mercury is the closest planet to the Sun."},
+    {"user": "What year did World War II end?", "assistant": "World War II ended in 1945."},
+    {"user": "Who painted the Mona Lisa?", "assistant": "Leonardo da Vinci painted the Mona Lisa."},
+    {"user": "What is the largest ocean on Earth?", "assistant": "The Pacific Ocean is the largest ocean on Earth."},
+    {"user": "What gas do plants absorb from the atmosphere?", "assistant": "Plants absorb carbon dioxide (CO2) from the atmosphere."},
+    {"user": "What is the square root of 144?", "assistant": "The square root of 144 is 12."},
+    {"user": "Who developed the theory of general relativity?", "assistant": "Albert Einstein developed the theory of general relativity."},
+    {"user": "What is the capital of Japan?", "assistant": "The capital of Japan is Tokyo."},
+    {"user": "How many chromosomes do humans have?", "assistant": "Humans have 46 chromosomes, or 23 pairs."},
+    {"user": "What element has the atomic number 1?", "assistant": "Hydrogen has the atomic number 1."},
+    {"user": "Who was the first person to walk on the Moon?", "assistant": "Neil Armstrong was the first person to walk on the Moon in 1969."},
+    {"user": "What is the boiling point of water in Celsius?", "assistant": "The boiling point of water is 100 degrees Celsius."},
+    {"user": "What is the capital of Australia?", "assistant": "The capital of Australia is Canberra."},
+    {"user": "What year was the United Nations founded?", "assistant": "The United Nations was founded in 1945."},
+    {"user": "What is the chemical formula for table salt?", "assistant": "The chemical formula for table salt is NaCl (sodium chloride)."},
+    {"user": "Who wrote the novel 1984?", "assistant": "George Orwell wrote the novel 1984."},
+    {"user": "What is the tallest mountain in the world?", "assistant": "Mount Everest is the tallest mountain in the world at 8,849 meters."},
+]
+# ─── Daemon API ──────────────────────────────────────────────────────────────
+def daemon_status():
+    r = requests.get(f"{DAEMON_URL}/status", timeout=10)
+    r.raise_for_status()
+    return r.json()
+def daemon_reset(retries=3):
+    """Reset adapter and data buffers for a clean trial."""
+    for attempt in range(retries):
+        try:
+            r = requests.post(f"{DAEMON_URL}/reset", json={"clear_data": True}, timeout=60)
+            r.raise_for_status()
+            return r.json()
+        except Exception as e:
+            if attempt < retries - 1:
+                print(f"    Reset attempt {attempt+1} failed: {e}, retrying in 5s...")
+                time.sleep(5)
+            else:
+                raise
+def daemon_query(question: str, max_tokens: int = 100) -> str:
+    """Query the model and collect the full response."""
+    try:
+        r = requests.post(
+            f"{DAEMON_URL}/chat",
+            json={"messages": [{"role": "user", "content": question}],
+                  "max_tokens": max_tokens, "stream": True},
+            stream=True, timeout=180,
+        )
+        r.raise_for_status()
+    except Exception as e:
+        print(f"    [Query error: {e}]")
+        return ""
+    text = ""
+    try:
+        for line in r.iter_lines(decode_unicode=True):
+            if not line or not line.startswith("data: "):
+                continue
+            payload = line[6:].strip()
+            if payload == "[DONE]":
+                break
+            try:
+                obj = json.loads(payload)
+                delta = obj.get("choices", [{}])[0].get("delta", {})
+                content = delta.get("content", "")
+                # Filter out special tokens
+                if content and not content.startswith("<|"):
+                    text += content
+            except json.JSONDecodeError:
+                continue
+    except Exception as e:
+        print(f"    [Stream error: {e}, got so far: {text[:50]}]")
+    return text.strip()
+def daemon_inject_and_train(training_pairs: list, epochs: int = TRAIN_EPOCHS) -> dict:
+    """Inject training data and run epoch-based training.
+    Converts {"user": ..., "assistant": ...} pairs to the daemon's expected format:
+    [{"role": "user", "content": ...}, {"role": "assistant", "content": ...}]
+    The /train endpoint is async — it starts training in background and returns immediately.
+    We poll /status until training completes.
+    """
+    # Convert pair format to message format
+    messages = []
+    for pair in training_pairs:
+        messages.append([
+            {"role": "user", "content": pair["user"]},
+            {"role": "assistant", "content": pair["assistant"]},
+        ])
+    r = requests.post(
+        f"{DAEMON_URL}/train",
+        json={"messages": messages, "epochs": epochs},
+        timeout=30,
+    )
+    r.raise_for_status()
+    start_response = r.json()
+    print(f"    Train started: injected={start_response.get('injected', 0)}, epochs={start_response.get('epochs', 0)}")
+    # Poll until training completes
+    poll_interval = 2
+    max_wait = 600  # 10 minutes max
+    elapsed = 0
+    last_steps = 0
+    result = {"steps": 0, "final_loss": 0, "initial_loss": 0, "epochs_completed": 0, "early_stopped": False}
+    while elapsed < max_wait:
+        time.sleep(poll_interval)
+        elapsed += poll_interval
+        try:
+            status = daemon_status()
+            current_steps = status.get("total_steps", 0)
+            current_loss = status.get("last_loss", 0)
+            if current_steps != last_steps:
+                last_steps = current_steps
+            if not status.get("training", False):
+                # Training finished
+                result["steps"] = status.get("total_steps", 0)
+                result["final_loss"] = status.get("last_loss", 0)
+                result["initial_loss"] = result.get("initial_loss", current_loss)
+                break
+            # Update initial loss from first poll
+            if result["initial_loss"] == 0 and current_loss > 0:
+                result["initial_loss"] = current_loss
+            if elapsed % 30 == 0:
+                print(f"    ... training: step={current_steps}, loss={current_loss:.4f}")
+        except Exception as e:
+            print(f"    [Poll error: {e}]")
+    return result
+def daemon_set_auto_train(enabled: bool):
+    """Enable/disable auto_train on the daemon."""
+    try:
+        r = requests.put(
+            f"{DAEMON_URL}/config",
+            json={"auto_train": enabled},
+            timeout=10,
+        )
+        r.raise_for_status()
+    except Exception as e:
+        print(f"    [Warning: could not set auto_train={enabled}: {e}]")
+# ─── Evaluation Logic ────────────────────────────────────────────────────────
+def normalize_unicode(text: str) -> str:
+    """Normalize Unicode subscripts/superscripts to ASCII equivalents."""
+    import unicodedata
+    # Common subscript/superscript replacements
+    replacements = {
+        '₂': '2', '₃': '3', '₄': '4', '₅': '5', '₆': '6',
+        '₀': '0', '₁': '1', '₇': '7', '₈': '8', '₉': '9',
+        '²': '2', '³': '3', '⁴': '4', '⁵': '5', '⁶': '6',
+        '⁰': '0', '¹': '1', '⁷': '7', '⁸': '8', '⁹': '9',
+    }
+    for old, new in replacements.items():
+        text = text.replace(old, new)
+    return text
+def check_keywords(response: str, keywords: list, min_matches: int = 2) -> bool:
+    """Check if response contains enough of the expected keywords.
+    Requires at least `min_matches` keywords to match to avoid false positives
+    from base models that hallucinate topic-relevant but factually wrong responses.
+    For short keyword lists (<=2), requires all to match.
+    """
+    if not keywords:
+        return False
+    response_lower = normalize_unicode(response.lower())
+    matches = sum(1 for kw in keywords if kw in response_lower)
+    required = min(min_matches, len(keywords))  # Don't require more than we have
+    return matches >= required
+def pretest_facts(facts: list) -> tuple:
+    """Pre-test all facts against the model. Return (unknown, known) split."""
+    unknown = []
+    known = []
+    print(f"\n  Pre-testing {len(facts)} facts against model...")
+    for i, fact in enumerate(facts):
+        response = daemon_query(fact.question, max_tokens=MAX_TOKENS_PRETEST)
+        fact.pretest_response = response
+        fact.pretest_known = check_keywords(response, fact.keywords)
+        status = "KNOWN" if fact.pretest_known else "unknown"
+        if (i + 1) % 10 == 0 or fact.pretest_known:
+            print(f"    [{i+1}/{len(facts)}] {status}: {fact.question[:60]}...")
+        if fact.pretest_known:
+            known.append(fact)
+        else:
+            unknown.append(fact)
+    print(f"  Pre-test complete: {len(unknown)} unknown, {len(known)} already known")
+    return unknown, known
+def evaluate_recall(facts: list) -> list:
+    """Post-training: test recall of each fact. Returns list of (fact, correct, response)."""
+    results = []
+    for i, fact in enumerate(facts):
+        response = daemon_query(fact.question, max_tokens=MAX_TOKENS_POSTTEST)
+        correct = check_keywords(response, fact.keywords)
+        results.append((fact, correct, response))
+        if (i + 1) % 10 == 0:
+            print(f"    [{i+1}/{len(facts)}] recall testing...")
+    return results
+def evaluate_general_knowledge() -> list:
+    """Test general knowledge preservation."""
+    results = []
+    for item in GENERAL_KNOWLEDGE:
+        response = daemon_query(item["question"], max_tokens=100)
+        correct = check_keywords(response, item["keywords"])
+        results.append((item, correct, response))
+    return results
+# ─── Statistics ──────────────────────────────────────────────────────────────
+def clopper_pearson(k: int, n: int, alpha: float = 0.05) -> tuple:
+    """Clopper-Pearson exact binomial confidence interval."""
+    if n == 0:
+        return (0.0, 0.0)
+    from scipy import stats as scipy_stats
+    lo = scipy_stats.beta.ppf(alpha / 2, k, n - k + 1) if k > 0 else 0.0
+    hi = scipy_stats.beta.ppf(1 - alpha / 2, k + 1, n - k) if k < n else 1.0
+    return (lo, hi)
+def wilson_interval(k: int, n: int, z: float = 1.96) -> tuple:
+    """Wilson score confidence interval (no scipy needed)."""
+    if n == 0:
+        return (0.0, 0.0)
+    p_hat = k / n
+    denom = 1 + z**2 / n
+    center = (p_hat + z**2 / (2 * n)) / denom
+    margin = z * math.sqrt((p_hat * (1 - p_hat) + z**2 / (4 * n)) / n) / denom
+    return (max(0.0, center - margin), min(1.0, center + margin))
+# ─── Single Trial ────────────────────────────────────────────────────────────
+def run_trial(facts: list, trial_id: int, epochs: int = TRAIN_EPOCHS) -> TrialResult:
+    """Run a single trial: reset → pre-test → train → evaluate."""
+    print(f"\n{'='*70}")
+    print(f"  TRIAL {trial_id}")
+    print(f"{'='*70}")
+    # 1. Reset adapter for clean slate
+    print("  Resetting adapter and data buffers...")
+    daemon_reset()
+    time.sleep(2)
+    # 2. Pre-test: confirm model doesn't know these facts
+    unknown_facts, known_facts = pretest_facts(facts)
+    if len(unknown_facts) < 10:
+        print(f"  WARNING: Only {len(unknown_facts)} unknown facts — insufficient for evaluation")
+        # Still proceed but flag it
+    # 3. Generate training pairs from unknown facts
+    novel_pairs = [f.to_training_pair() for f in unknown_facts]
+    # 4. Calculate regularization needed for ≥33% ratio
+    n_reg_needed = max(1, int(len(novel_pairs) * REGULARIZATION_RATIO / (1 - REGULARIZATION_RATIO)))
+    n_reg_used = min(n_reg_needed, len(REGULARIZATION_PAIRS))
+    reg_pairs = REGULARIZATION_PAIRS[:n_reg_used]
+    all_pairs = novel_pairs + reg_pairs
+    random.shuffle(all_pairs)
+    print(f"  Training data: {len(novel_pairs)} novel + {n_reg_used} regularization = {len(all_pairs)} total")
+    print(f"  Regularization ratio: {n_reg_used / len(all_pairs) * 100:.1f}%")
+    # 5. Train
+    print(f"  Training ({epochs} epochs max, early stopping enabled)...")
+    t0 = time.time()
+    train_result = daemon_inject_and_train(all_pairs, epochs=epochs)
+    train_time = time.time() - t0
+    print(f"  Training complete: {train_time:.1f}s")
+    print(f"  {json.dumps({k: train_result.get(k) for k in ['steps', 'final_loss', 'initial_loss', 'epochs_completed', 'early_stopped']}, default=str)}")
+    time.sleep(2)  # Let model settle
+    # 6. Post-test: recall of unknown facts
+    print(f"\n  Evaluating recall ({len(unknown_facts)} facts)...")
+    recall_results = evaluate_recall(unknown_facts)
+    recall_correct = sum(1 for _, c, _ in recall_results if c)
+    # 7. General knowledge preservation
+    print(f"  Evaluating general knowledge ({len(GENERAL_KNOWLEDGE)} questions)...")
+    gen_results = evaluate_general_knowledge()
+    gen_correct = sum(1 for _, c, _ in gen_results if c)
+    # 8. Per-category breakdown
+    category_scores = {}
+    for fact, correct, _ in recall_results:
+        cat = fact.category
+        if cat not in category_scores:
+            category_scores[cat] = {"correct": 0, "total": 0}
+        category_scores[cat]["total"] += 1
+        if correct:
+            category_scores[cat]["correct"] += 1
+    result = TrialResult(
+        trial_id=trial_id,
+        n_facts_tested=len(facts),
+        n_confirmed_unknown=len(unknown_facts),
+        n_training_pairs=len(all_pairs),
+        n_regularization_pairs=n_reg_used,
+        training_steps=train_result.get("steps", 0),
+        training_time_s=train_time,
+        initial_loss=train_result.get("initial_loss", 0),
+        final_loss=train_result.get("final_loss", 0),
+        recall_correct=recall_correct,
+        recall_total=len(unknown_facts),
+        general_correct=gen_correct,
+        general_total=len(GENERAL_KNOWLEDGE),
+        category_scores=category_scores,
+    )
+    # Print trial summary
+    print(f"\n  Trial {trial_id} Results:")
+    print(f"    Recall: {recall_correct}/{len(unknown_facts)} ({recall_correct/max(1,len(unknown_facts))*100:.1f}%)")
+    print(f"    General Knowledge: {gen_correct}/{len(GENERAL_KNOWLEDGE)} ({gen_correct/len(GENERAL_KNOWLEDGE)*100:.1f}%)")
+    print(f"    Training: {result.training_steps} steps, {train_time:.1f}s, loss {result.initial_loss:.3f} → {result.final_loss:.3f}")
+    # Print failures for debugging
+    failures = [(f, r) for f, c, r in recall_results if not c]
+    if failures:
+        print(f"\n  Failed recalls ({len(failures)}):")
+        for fact, resp in failures[:10]:
+            print(f"    Q: {fact.question[:70]}")
+            print(f"    Expected keywords: {fact.keywords}")
+            print(f"    Got: {resp[:100]}")
+            print()
+    gen_failures = [(item, r) for item, c, r in gen_results if not c]
+    if gen_failures:
+        print(f"  General knowledge failures ({len(gen_failures)}):")
+        for item, resp in gen_failures:
+            print(f"    Q: {item['question']}")
+            print(f"    Expected: {item['keywords']}")
+            print(f"    Got: {resp[:100]}")
+    return result
+def run_trial_prefiltered(unknown_facts: list, trial_id: int, epochs: int = TRAIN_EPOCHS) -> TrialResult:
+    """Run a trial with pre-filtered facts (already confirmed unknown). Skips pre-testing."""
+    print(f"\n{'='*70}")
+    print(f"  TRIAL {trial_id}")
+    print(f"{'='*70}")
+    # 1. Reset adapter for clean slate
+    print("  Resetting adapter and data buffers...")
+    daemon_reset()
+    time.sleep(2)
+    # 2. Generate training pairs from unknown facts
+    novel_pairs = [f.to_training_pair() for f in unknown_facts]
+    # 3. Calculate regularization needed for ≥33% ratio
+    n_reg_needed = max(1, int(len(novel_pairs) * REGULARIZATION_RATIO / (1 - REGULARIZATION_RATIO)))
+    n_reg_used = min(n_reg_needed, len(REGULARIZATION_PAIRS))
+    reg_pairs = REGULARIZATION_PAIRS[:n_reg_used]
+    all_pairs = novel_pairs + reg_pairs
+    random.shuffle(all_pairs)
+    print(f"  Training data: {len(novel_pairs)} novel + {n_reg_used} regularization = {len(all_pairs)} total")
+    print(f"  Regularization ratio: {n_reg_used / len(all_pairs) * 100:.1f}%")
+    # 4. Train (auto_train stays off — we train explicitly via /train)
+    print(f"  Training ({epochs} epochs max, early stopping enabled)...")
+    t0 = time.time()
+    train_result = daemon_inject_and_train(all_pairs, epochs=epochs)
+    train_time = time.time() - t0
+    print(f"  Training complete: {train_time:.1f}s")
+    print(f"  {json.dumps({k: train_result.get(k) for k in ['steps', 'final_loss', 'initial_loss', 'epochs_completed', 'early_stopped']}, default=str)}")
+    time.sleep(2)  # Let model settle
+    # 5. Post-test: recall of unknown facts (auto_train disabled to avoid contamination)
+    daemon_set_auto_train(False)
+    print(f"\n  Evaluating recall ({len(unknown_facts)} facts)...")
+    recall_results = evaluate_recall(unknown_facts)
+    recall_correct = sum(1 for _, c, _ in recall_results if c)
+    # 6. General knowledge preservation
+    print(f"  Evaluating general knowledge ({len(GENERAL_KNOWLEDGE)} questions)...")
+    gen_results = evaluate_general_knowledge()
+    gen_correct = sum(1 for _, c, _ in gen_results if c)
+    # 7. Per-category breakdown
+    category_scores = {}
+    for fact, correct, _ in recall_results:
+        cat = fact.category
+        if cat not in category_scores:
+            category_scores[cat] = {"correct": 0, "total": 0}
+        category_scores[cat]["total"] += 1
+        if correct:
+            category_scores[cat]["correct"] += 1
+    result = TrialResult(
+        trial_id=trial_id,
+        n_facts_tested=len(unknown_facts),
+        n_confirmed_unknown=len(unknown_facts),
+        n_training_pairs=len(all_pairs),
+        n_regularization_pairs=n_reg_used,
+        training_steps=train_result.get("steps", 0),
+        training_time_s=train_time,
+        initial_loss=train_result.get("initial_loss", 0),
+        final_loss=train_result.get("final_loss", 0),
+        recall_correct=recall_correct,
+        recall_total=len(unknown_facts),
+        general_correct=gen_correct,
+        general_total=len(GENERAL_KNOWLEDGE),
+        category_scores=category_scores,
+    )
+    # Print trial summary
+    print(f"\n  Trial {trial_id} Results:")
+    print(f"    Recall: {recall_correct}/{len(unknown_facts)} ({recall_correct/max(1,len(unknown_facts))*100:.1f}%)")
+    print(f"    General Knowledge: {gen_correct}/{len(GENERAL_KNOWLEDGE)} ({gen_correct/len(GENERAL_KNOWLEDGE)*100:.1f}%)")
+    print(f"    Training: {result.training_steps} steps, {train_time:.1f}s, loss {result.initial_loss:.3f} → {result.final_loss:.3f}")
+    # Print failures for debugging
+    failures = [(f, r) for f, c, r in recall_results if not c]
+    if failures:
+        print(f"\n  Failed recalls ({len(failures)}):")
+        for fact, resp in failures[:10]:
+            print(f"    Q: {fact.question[:70]}")
+            print(f"    Expected keywords: {fact.keywords}")
+            print(f"    Got: {resp[:100]}")
+            print()
+    gen_failures = [(item, r) for item, c, r in gen_results if not c]
+    if gen_failures:
+        print(f"  General knowledge failures ({len(gen_failures)}):")
+        for item, resp in gen_failures:
+            print(f"    Q: {item['question']}")
+            print(f"    Expected: {item['keywords']}")
+            print(f"    Got: {resp[:100]}")
+    return result
+# ─── Multi-Trial Analysis ────────────────────────────────────────────────────
+def run_evaluation(facts: list, n_trials: int = 3, epochs: int = TRAIN_EPOCHS):
+    """Run multiple independent trials and report aggregate statistics."""
+    print(f"\n{'#'*70}")
+    print(f"  STATISTICAL JIT LoRA EVALUATION")
+    print(f"  Model: {daemon_status()['model_key']}")
+    print(f"  Facts available: {len(facts)}")
+    print(f"  Trials: {n_trials}")
+    print(f"  Epochs: {epochs} (with early stopping)")
+    print(f"  Regularization target: {REGULARIZATION_RATIO*100:.0f}%")
+    print(f"{'#'*70}")
+    # Disable auto_train during pre-testing to avoid contamination
+    daemon_set_auto_train(False)
+    # Pre-test once (base model is the same for all trials after reset)
+    print(f"\n  === Pre-testing all {len(facts)} facts (one-time baseline) ===")
+    daemon_reset()
+    time.sleep(2)
+    unknown_facts, known_facts = pretest_facts(facts)
+    print(f"\n  Baseline: {len(unknown_facts)} confirmed unknown, {len(known_facts)} already known")
+    print(f"  Will train on {len(unknown_facts)} unknown facts across {n_trials} trials\n")
+    if len(unknown_facts) < 10:
+        print("  ERROR: Too few unknown facts for meaningful evaluation.")
+        print("  The model already knows most of the dataset.")
+        return None
+    results = []
+    for trial in range(1, n_trials + 1):
+        # Shuffle facts for each trial to avoid ordering effects
+        trial_unknown = unknown_facts.copy()
+        random.shuffle(trial_unknown)
+        result = run_trial_prefiltered(trial_unknown, trial, epochs)
+        results.append(result)
+    # ─── Aggregate Statistics ────────────────────────────────────────────
+    print(f"\n{'='*70}")
+    print(f"  AGGREGATE RESULTS ({n_trials} trials)")
+    print(f"{'='*70}")
+    # Recall rates across trials
+    recall_rates = [r.recall_correct / max(1, r.recall_total) for r in results]
+    general_rates = [r.general_correct / max(1, r.general_total) for r in results]
+    training_times = [r.training_time_s for r in results]
+    training_steps_list = [r.training_steps for r in results]
+    n_unknown_list = [r.n_confirmed_unknown for r in results]
+    # Pooled counts for CI calculation
+    pooled_recall_k = sum(r.recall_correct for r in results)
+    pooled_recall_n = sum(r.recall_total for r in results)
+    pooled_gen_k = sum(r.general_correct for r in results)
+    pooled_gen_n = sum(r.general_total for r in results)
+    recall_ci = wilson_interval(pooled_recall_k, pooled_recall_n)
+    general_ci = wilson_interval(pooled_gen_k, pooled_gen_n)
+    print(f"\n  Confirmed unknown facts per trial: {n_unknown_list}")
+    print(f"  (facts the model verified it did NOT know before training)")
+    print(f"\n  ┌────────────���────────────────────────────────────────────────────┐")
+    print(f"  │  RECALL (post-training)                                        │")
+    print(f"  │  Pooled: {pooled_recall_k}/{pooled_recall_n} ({pooled_recall_k/max(1,pooled_recall_n)*100:.1f}%)                                          │")
+    print(f"  │  Per-trial rates: {[f'{r:.1%}' for r in recall_rates]}")
+    if n_trials > 1 and len(recall_rates) > 1:
+        print(f"  │  Mean ± StdDev: {statistics.mean(recall_rates):.1%} ± {statistics.stdev(recall_rates):.1%}")
+    print(f"  │  95% CI (Wilson): [{recall_ci[0]:.1%}, {recall_ci[1]:.1%}]")
+    print(f"  │                                                                 │")
+    print(f"  │  GENERAL KNOWLEDGE (preservation)                               │")
+    print(f"  │  Pooled: {pooled_gen_k}/{pooled_gen_n} ({pooled_gen_k/max(1,pooled_gen_n)*100:.1f}%)                                          │")
+    print(f"  │  Per-trial rates: {[f'{r:.1%}' for r in general_rates]}")
+    if n_trials > 1 and len(general_rates) > 1:
+        print(f"  │  Mean ± StdDev: {statistics.mean(general_rates):.1%} ± {statistics.stdev(general_rates):.1%}")
+    print(f"  │  95% CI (Wilson): [{general_ci[0]:.1%}, {general_ci[1]:.1%}]")
+    print(f"  │                                                                 │")
+    print(f"  │  TRAINING                                                       │")
+    print(f"  │  Mean time: {statistics.mean(training_times):.1f}s ± {statistics.stdev(training_times) if len(training_times) > 1 else 0:.1f}s")
+    print(f"  │  Mean steps: {statistics.mean(training_steps_list):.0f}")
+    print(f"  └─────────────────────────────────────────────────────────────────┘")
+    # Per-category aggregation
+    all_categories = set()
+    for r in results:
+        all_categories.update(r.category_scores.keys())
+    print(f"\n  Per-Category Recall (pooled across trials):")
+    print(f"  {'Category':<25} {'Correct':>8} {'Total':>8} {'Rate':>8}  {'95% CI':>16}")
+    print(f"  {'-'*25} {'-'*8} {'-'*8} {'-'*8}  {'-'*16}")
+    for cat in sorted(all_categories):
+        cat_k = sum(r.category_scores.get(cat, {}).get("correct", 0) for r in results)
+        cat_n = sum(r.category_scores.get(cat, {}).get("total", 0) for r in results)
+        if cat_n > 0:
+            cat_ci = wilson_interval(cat_k, cat_n)
+            print(f"  {cat:<25} {cat_k:>8} {cat_n:>8} {cat_k/cat_n:>8.1%}  [{cat_ci[0]:.1%}, {cat_ci[1]:.1%}]")
+    # Save results to JSON
+    output = {
+        "model": daemon_status().get("model_key", "unknown"),
+        "n_trials": n_trials,
+        "epochs": epochs,
+        "regularization_ratio": REGULARIZATION_RATIO,
+        "aggregate": {
+            "recall": {
+                "pooled_correct": pooled_recall_k,
+                "pooled_total": pooled_recall_n,
+                "pooled_rate": pooled_recall_k / max(1, pooled_recall_n),
+                "per_trial_rates": recall_rates,
+                "mean": statistics.mean(recall_rates),
+                "stdev": statistics.stdev(recall_rates) if len(recall_rates) > 1 else 0,
+                "ci_95_lower": recall_ci[0],
+                "ci_95_upper": recall_ci[1],
+            },
+            "general_knowledge": {
+                "pooled_correct": pooled_gen_k,
+                "pooled_total": pooled_gen_n,
+                "pooled_rate": pooled_gen_k / max(1, pooled_gen_n),
+                "per_trial_rates": general_rates,
+                "mean": statistics.mean(general_rates),
+                "stdev": statistics.stdev(general_rates) if len(general_rates) > 1 else 0,
+                "ci_95_lower": general_ci[0],
+                "ci_95_upper": general_ci[1],
+            },
+            "training": {
+                "mean_time_s": statistics.mean(training_times),
+                "stdev_time_s": statistics.stdev(training_times) if len(training_times) > 1 else 0,
+                "mean_steps": statistics.mean(training_steps_list),
+                "per_trial_times": training_times,
+            },
+        },
+        "trials": [
+            {
+                "trial_id": r.trial_id,
+                "n_confirmed_unknown": r.n_confirmed_unknown,
+                "n_training_pairs": r.n_training_pairs,
+                "training_steps": r.training_steps,
+                "training_time_s": r.training_time_s,
+                "initial_loss": r.initial_loss,
+                "final_loss": r.final_loss,
+                "recall_correct": r.recall_correct,
+                "recall_total": r.recall_total,
+                "recall_rate": r.recall_correct / max(1, r.recall_total),
+                "general_correct": r.general_correct,
+                "general_total": r.general_total,
+                "general_rate": r.general_correct / max(1, r.general_total),
+                "category_scores": r.category_scores,
+            }
+            for r in results
+        ],
+    }
+    results_path = os.path.join(os.path.dirname(__file__), "evaluation_results.json")
+    with open(results_path, "w") as f:
+        json.dump(output, f, indent=2)
+    print(f"\n  Results saved to: {results_path}")
+    return output
+# ─── Main ────────────────────────────────────────────────────────────────────
+def main():
+    parser = argparse.ArgumentParser(description="Statistical JIT LoRA evaluation")
+    parser.add_argument("--facts-file", default=DEFAULT_FACTS_FILE,
+                        help="Path to raw facts file (default: raw_facts_2026.txt)")
+    parser.add_argument("--trials", type=int, default=3,
+                        help="Number of independent trials (default: 3)")
+    parser.add_argument("--max-facts", type=int, default=0,
+                        help="Max facts to use (0 = all, default: 0)")
+    parser.add_argument("--epochs", type=int, default=TRAIN_EPOCHS,
+                        help=f"Training epochs per trial (default: {TRAIN_EPOCHS})")
+    parser.add_argument("--seed", type=int, default=42,
+                        help="Random seed for reproducibility (default: 42)")
+    args = parser.parse_args()
+    random.seed(args.seed)
+    # Verify daemon is running
+    try:
+        status = daemon_status()
+        if not status.get("active"):
+            print("ERROR: Daemon not active. Call /activate first.")
+            sys.exit(1)
+        print(f"Daemon OK: {status['model_key']}, {status.get('trainable_params', '?')} trainable params")
+    except Exception as e:
+        print(f"ERROR: Cannot reach daemon at {DAEMON_URL}: {e}")
+        sys.exit(1)
+    # Load facts
+    if not os.path.exists(args.facts_file):
+        print(f"ERROR: Facts file not found: {args.facts_file}")
+        print("Generate it first by running the web scraper or provide a path.")
+        sys.exit(1)
+    facts = load_facts_from_file(args.facts_file)
+    print(f"Loaded {len(facts)} facts from {args.facts_file}")
+    # Deduplicate by question
+    seen = set()
+    unique_facts = []
+    for f in facts:
+        key = f.question.lower().strip()
+        if key not in seen:
+            seen.add(key)
+            unique_facts.append(f)
+    facts = unique_facts
+    print(f"After dedup: {len(facts)} unique facts")
+    # Category distribution
+    cats = {}
+    for f in facts:
+        cats[f.category] = cats.get(f.category, 0) + 1
+    print(f"Categories: {dict(sorted(cats.items()))}")
+    if args.max_facts > 0 and args.max_facts < len(facts):
+        # Sample proportionally from each category
+        facts = random.sample(facts, args.max_facts)
+        print(f"Sampled down to {len(facts)} facts")
+    # Run evaluation
+    output = run_evaluation(facts, n_trials=args.trials, epochs=args.epochs)
+    # Final verdict
+    recall_rate = output["aggregate"]["recall"]["mean"]
+    gen_rate = output["aggregate"]["general_knowledge"]["mean"]
+    print(f"\n{'='*70}")
+    if recall_rate >= 0.50 and gen_rate >= 0.80:
+        print(f"  ✓ EVALUATION PASSED")
+        print(f"    Recall: {recall_rate:.1%} (≥50% threshold)")
+        print(f"    General Knowledge: {gen_rate:.1%} (≥80% threshold)")
+    else:
+        print(f"  ✗ EVALUATION BELOW THRESHOLD")
+        print(f"    Recall: {recall_rate:.1%} {'✓' if recall_rate >= 0.50 else '✗ (<50%)'}")
+        print(f"    General Knowledge: {gen_rate:.1%} {'✓' if gen_rate >= 0.80 else '✗ (<80%)'}")
+    print(f"{'='*70}")
+if __name__ == "__main__":
+    main()