File size: 10,332 Bytes
5d61448 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | # Memory
## Me
Milan (Libby's account). Building TD (Time Dilation) β a self-improving AI system using a 7B model on home hardware.
## People
| Who | Role |
|-----|------|
| **Milan** | Project lead, TD creator. Hands-on, wants things explained simply |
| **Milan's dad** | Budget decision-maker AND critical thinker. Said "if it's worth investing in, money isn't the issue" but also challenged everything with hard questions. His critiques forced the pivot from old plan to new plan. |
> Full list: memory/glossary.md, profiles: memory/people/
## Terms
| Term | Meaning |
|------|---------|
| TD | Time Dilation β the self-improving AI project |
| ALAS | Autonomous Learning Agent System β self-learning via web search |
| Fara-7B | Microsoft's vision-based browser agent (MIT, open source, based on Qwen2.5-VL) |
| Qwen3-VL-8B | Qwen3 with vision + browser agent β replaces Fara as our CUA base |
| GRPO | Group Relative Policy Optimisation β RL for verified reasoning |
| SimPO | Simple Preference Optimisation β reference-free preference training |
| SLIME | Improved SimPO β dual-margin stability, fixes online collapse |
| QLoRA | Quantised Low-Rank Adaptation β memory-efficient fine-tuning |
| PRMs | Process Reward Models β step-by-step reasoning verification |
| ThinkPRM | PRMs that think β uses 1% of labelling data |
| WebRL | Self-evolving curriculum RL for web agents |
| STaR | Self-Taught Reasoner β train on correct reasoning chains |
| FuseLLM | Merge multiple fine-tuned models into one |
| TIES/DARE-TIES | Weight merging algorithms for FuseLLM |
| Transport and Merge | Cross-architecture model merging via optimal transport (Feb 2026) |
| OrthoMerge | Merging on Riemannian manifold, preserves weight geometry |
| LARV | Layer-wise Adaptive Rescaling β per-layer scaling for merges |
| Git Re-Basin | Neuron permutation matching β PUBLIC CODE foundation for merging |
| SEC | Self-Evolving Curriculum β auto-adjusts training difficulty |
| Cherry_LLM | Self-data filtering via perplexity scoring |
| SimpleMem | 26.4% better than Mem0, 30x more efficient memory |
| JitRL | Training-free continual learning β outperforms WebRL |
| Latent Reasoning | Scales 7B to ~50B performance at inference |
| Layer 0-5 | TD's 6-layer architecture (0=instant, 1=data, 2=filter, 3=train, 4=agents, 5=merge) |
> Full glossary: memory/glossary.md
## Projects
| Name | What |
|------|------|
| **TD (Time Dilation)** | Self-improving 7B AI system. 89 techniques, 29 core. 6-layer architecture |
> Details: memory/projects/
## Merge Strategy
- Target model: Qwen3-VL-8B-Instruct (vision + browser agent + text, thinking mode)
- Why VL: Same language brain as Qwen3-8B, but adds vision + CUA abilities for free (replaces need for Fara)
- Merge approach: Only merge into language backbone layers, vision encoder stays untouched
- Method: Transport and Merge (optimal transport cross-arch merging)
- Merge in: DeepSeek-R1-Distill, MiMo-7B, Llama 3.1, Falcon-H1R-7B
- Fallback: Knowledge distillation for any model that fails to merge
- NO direct merges possible β all 5 models have different architectures
- Kimi K2 ruled out (1T params, too big)
- Full strategy: docs/MERGE_STRATEGY.md
## Dad's Tests (Critical Thinking Filter)
Every claim must pass these before being accepted:
1. **Economic test:** "If this worked cheaply, why aren't big tech companies doing it?"
2. **Architecture test:** "Is this built on something that's dying or futureproof?"
3. **Realism test:** "Is this actually achievable or just optimism?"
4. **Pragmatism test:** "Can we use what we already have first?"
5. **Long-term test:** "Will this still matter in 2-3 years?"
Dad's exact words: "I didn't ask for the marketing spill, give to the point answer." He called out that LLMs are "on their way out" and questioned whether weight-copying works. His critiques were RIGHT β P100 didn't work, weight copying was wrong, old timelines were fantasy. The pivot to Transport and Merge + dual 4090 happened because of his challenges.
## TD History (Old vs New Plan)
- **OLD plan (Jan-Feb 2026):** Copy Mistral-7B weights, spawn copies for research, merge knowledge back via JSON. Hardware: Tesla P40 + desktop (~$250). This plan FAILED β weight copying doesn't transfer knowledge, P100 incompatible with Unsloth, timelines were fantasy.
- **NEW plan (Feb 2026):** Transport and Merge 5 different models into Qwen3-VL-8B (vision+text), then GRPO self-improvement loop. Hardware: dual RTX 4090 or vast.ai GPU rental. Self-improvement through actual RL training (weights change), not code self-modification or JSON merging. Switched from Qwen3-8B to Qwen3-VL-8B to get browser agent abilities (like Fara) built in.
- **What TD will be:** A regular AI assistant like ChatGPT, but hopefully smarter after training cycles. NOT superintelligence promises.
## Self-Improvement Loop (Discovered Feb 2026)
Milan interviewed ChatGPT, Grok, and Gemini (12+ interviews, test_1 to test_12+) about recursive self-improvement.
Key discovery: **The model can be its own diagnostician.**
- All 3 AIs could list their own weaknesses when asked "what would you improve?"
- All 3 said the only thing stopping them is no access to their own weights/training
- All 3 converged on the same "small" self-improvement loop that actually works:
**The TD Self-Improvement Loop:**
1. Merge multiple models together (Transport and Merge) β creates strong base
2. Ask the model "what are you bad at?" β it identifies weak spots
3. Generate targeted synthetic training data for those weak spots
4. Train with GRPO/STaR on that data β model gets slightly better
5. The improved model generates better reasoning chains β better training data
6. Repeat β each cycle is small (1-5%) but compounds
**Two codebases (td_fuse absorbed into td_lang):**
- `td_lang` β THE complete TD system. Domain-specific language + merge engine + training + RL. v0.2.0, ~11,422 lines total (7,878 core + 3,544 engine), 18 .py files + 22 examples. All 13 phases complete. td_fuse was absorbed into td_lang/engine/ so td_lang runs everything β no external Python deps for the pipeline. Built collaboratively: Claude (architecture), Codex (hardening), Gemini (in-IDE testing).
- `td_loop` β self-recursive improvement loop (planned, automates the cycle above). May not be needed since td_lang's `repeat` block + arena already handle this.
**What's NOT possible (confirmed by all 3 AIs + dad's tests):**
- Live weight editing (model rewriting its own brain in real-time)
- Direct weight manipulation like editing a text file
- "Cogniscript"/"Phylang"/"Lumina-Ξ£" (sci-fi languages from the interviews β NOT real)
**What IS possible (confirmed by all 3 AIs + real papers):**
- Generate β Filter β Train β Evaluate β Keep winners β Repeat
- Using mechanistic interpretability to find weak circuits, then training specifically on those
- STaR (train on correct reasoning chains), GRPO (RL for reasoning), Cherry_LLM (filter bad data)
**Interview technical findings (test_12):**
- LoRA target: mid-to-late layers MLP blocks (layers 16β28 for 32-layer model). All 3 AIs agree.
- Biggest weakness: long-chain reasoning breaks at step 18β30. Target this with GRPO.
- Self-training trap: 100 steps on own outputs β smoother but dumber. MUST mix external data.
- Cherry_LLM perplexity filter prevents mode collapse by catching repetitive training data.
**Cost optimization (test_16):**
- Inference-time scaling: 80β90% of gains for 5β30% cost. Generate multiple answers, pick best, train on winners.
- Verified rewards only: no learned reward model, just objective checkers (code compiles, math correct). Saves VRAM.
- Budget: 70β80% inference scaling, 10β20% short GRPO, 5β10% tooling
- Speculative decoding (vLLM): small draft model + main model verifying = 2β3Γ faster inference
**td_lang design requirements (test_17 β ChatGPT's ForgeSpec 2.0):**
- 8 features: data contracts, reward contracts, eval gates (mandatory), resource budgets (compiler enforced), automatic ablations, artifact lineage (content-hash), serving SLOs, economics reports
- Three quality gates for td_loop: holdout (real tasks), adversarial (break it on purpose), calibration (confidence vs accuracy)
- OpenRLHF: real framework (Ray+vLLM+DeepSpeed) for GRPO at scale β could replace custom td_loop plumbing
- GaLore: full-param training at 65% less VRAM (alternative to QLoRA)
- PACER (Feb 2026): sample 8-64 traces β consensus packet β one revision = 1/8 tokens of majority voting
**Phase 3 deep dive (test_18 β all 3 AIs answered both prompts):**
- FORK: disk-based only on 4090. Cheap fork = manifest + adapter copy. safetensors format.
- RESET: del model β clear cache β reload. Must reset optimizer state. Use assign=True.
- PRUNE: 20% structured max (LLM-Pruner paper). Wanda metric (Grok, practical on 4090). Language backbone only, never vision. Recovery: 200-800 steps LoRA r=8.
- EDIT: LoRA/DoRA with layers_to_transform for layers 16-28. "Try before buy" via enable/disable adapters. ROME/MEMIT not ready for Qwen3-VL.
- Build order: EDIT first β FORK/RESET β PRUNE last
- ChatGPT's manifest idea: model state = base_ref + adapters[] + prune_spec + optimizer + eval_report
**Interview files:** stored in interview/ folder (test_1.txt through test_18.txt + screenshots)
- ChatGPT: Most conservative, gave systems-level analysis, refused operational blueprints
- Grok: Most detailed and realistic, named specific models/hardware, grounded in real papers
- Gemini: Most flattering/sci-fi, referenced Milan's own work, made up technologies
## Preferences
- Explain things simply β analogies and plain English
- Use all available tools and commands
- Be honest about what works and what doesn't β Milan values truth over optimism
- Budget is flexible β focus on best strategy, not cheapest hardware
- Keep one master document (currently v5.2 in docs/)
- Old files go to DELETE/ folder for Milan to trash
- No dashboards or visual tools β Milan doesn't need them
- Plugins are welcome if they genuinely help and don't break anything
- Run every claim by "dad's tests" before presenting it as fact
- The uploaded 6-part transcript is the OLD TD version β useful for self-improvement context but NOT the current plan
|