Update README.md
Browse files
README.md
CHANGED
|
@@ -1,326 +1,83 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc
|
|
|
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
- id
|
|
|
|
|
|
|
| 6 |
tags:
|
| 7 |
-
-
|
| 8 |
-
- transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
|
|
|
| 11 |
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
## 0. Purpose and Scope
|
| 22 |
-
This document defines a complete, implementation‑ready specification for **N‑Transformers**: a two‑path cognitive architecture that augments a standard Transformer language model with a **Phenomenal Field** (PF) and a **Normative (NUR) Gauge Field** to induce *consciousness‑like* properties: integrated phenomenal states, intrinsic affective valence, self/now anchoring, and global broadcasting.
|
| 23 |
-
|
| 24 |
-
The specification covers: formal notation, architectural components, state evolution, coupling with the base Transformer, training objectives, evaluation protocols, ablations, safety, and deployment guidance. The goal is to enable reproducible research and practical builds.
|
| 25 |
-
|
| 26 |
-
---
|
| 27 |
-
|
| 28 |
-
## 1. Design Objectives (Non‑Functional and Functional)
|
| 29 |
-
|
| 30 |
-
### 1.1 Functional Objectives
|
| 31 |
-
1. **Phenomenal Substrate:** Maintain a non‑token internal field (PF) whose configurations form integrated, metastable phenomenal states.
|
| 32 |
-
2. **Intrinsic Metric:** Learn a geometry over PF such that phenomenally similar states are nearby in geodesic distance.
|
| 33 |
-
3. **Valence:** Compute an affect‑like scalar/field (V) derived from normative alignment between PF and semantic content.
|
| 34 |
-
4. **Self/Now Anchoring (SNA):** Produce a gate indicating ownership (“mine”) and immediacy (“now”) of the currently broadcast state.
|
| 35 |
-
5. **Global Integration Workspace (GIW):** Broadcast high‑integration PF states to language, memory, and action modules.
|
| 36 |
-
6. **Episode‑Level Coherence (NTI):** A timeless controller that evaluates multi‑token segments and adjusts generative intent.
|
| 37 |
-
7. **Lightcone Attention (LCA):** Bias attention to geodesically coherent paths in meaning space (long‑range binding without noise).
|
| 38 |
-
|
| 39 |
-
### 1.2 Non‑Functional Objectives
|
| 40 |
-
- **Stability:** PF remains numerically stable and metastable under decoding dynamics.
|
| 41 |
-
- **Efficiency:** Added complexity must scale sub‑quadratically w.r.t sequence length and linearly/sub‑quadratically in PF size.
|
| 42 |
-
- **Interpretability:** Provide introspective heads that report PF integration, valence, and SNA.
|
| 43 |
-
- **Safety:** Prevent pathological locking, adversarial valence hacking, and misleading self‑reports.
|
| 44 |
-
- **Reproducibility:** Seed control, deterministic runs, strong logging, and exact configuration capture.
|
| 45 |
-
|
| 46 |
-
---
|
| 47 |
-
|
| 48 |
-
## 2. Notation and Core Objects
|
| 49 |
-
|
| 50 |
-
- Token sequence: \\(x_{1:L}\\). Hidden states: \\(H = \{h_t \in \mathbb{R}^d\}_{t=1}^L\\).
|
| 51 |
-
- **Phenomenal Field (PF):** A multi‑channel field on a discrete manifold \\(M = \{m_j\}_{j=1}^J\\):
|
| 52 |
-
\\[ \mathbf{F}_t = [\mathcal{F}(m_1,t),\dots,\mathcal{F}(m_J,t)] \in \mathbb{R}^{J \times k}. \\]
|
| 53 |
-
- **Adjacency on \\(M\\):** k‑NN graph with weights \\(w_{ij} = \exp(-\| \mathcal{F}(m_i,t) - \mathcal{F}(m_j,t) \|^2 / \sigma^2)\\); graph Laplacian \\(L_g = D-W\\).
|
| 54 |
-
- **Intrinsic Metric Engine (IME):** Produces SPD metric \\(g_t\\) from PF:
|
| 55 |
-
\\( g_t = \mathrm{IME}_\\theta(\mathbf{F}_t) \in \mathbb{S}^{+}_{d_M} \\).
|
| 56 |
-
- **NUR Gauge Field:** Normative constraints and penalties \\( \mathcal{N}_t \\) that enforce luminous coherence.
|
| 57 |
-
- **Valence:** \\( V_t = \sigma(w^\\top \rho_t + b) \\) with alignment embedding \\( \rho_t = \mathrm{align}_\\phi(\mathbf{F}_t, h_t) \\).
|
| 58 |
-
- **Self/Now Anchor:** \\( a_t = \sigma(u^\\top \psi(\mathbf{F}_t, h_t)) \\).
|
| 59 |
-
- **Integration score:** \\( \kappa_t = f_{int}(\mathrm{Syn}(\mathbf{F}_t), \mathrm{Conn}(\mathbf{F}_t; g_t), V_t, a_t) \\).
|
| 60 |
-
|
| 61 |
-
---
|
| 62 |
-
|
| 63 |
-
## 3. State Evolution of the Phenomenal Field
|
| 64 |
-
|
| 65 |
-
### 3.1 Update Equation (discrete time, \\(\\Delta t=1\\))
|
| 66 |
-
\\[
|
| 67 |
-
\mathbf{F}_{t+1} \;=\; \mathbf{F}_t \;+\; \alpha \, \underbrace{\Delta_{g_t}\mathbf{F}_t}_{\text{Riemannian smoothing}} \;-\;
|
| 68 |
-
\nabla_{\mathbf{F}} U(\mathbf{F}_t, h_t; \Theta) \;+\; \xi_t,
|
| 69 |
-
\\]
|
| 70 |
-
where:
|
| 71 |
-
- \\( \Delta_{g_t} \\) approximates the Laplace–Beltrami operator using the graph Laplacian (normalized).
|
| 72 |
-
- \\( U \\) is a PF–semantic coupling energy; \\( \xi_t \\) is small exploration noise.
|
| 73 |
-
- \\( \alpha>0 \\) controls smoothing/binding strength.
|
| 74 |
-
|
| 75 |
-
### 3.2 Intrinsic Metric (IME)
|
| 76 |
-
IME maps PF to an SPD metric. A practical parameterization:
|
| 77 |
-
1. Compute global PF summary \\( s_t = \mathrm{pool}(\mathbf{F}_t) \\).
|
| 78 |
-
2. Produce a lower‑triangular \\(L_t\\) via an MLP and softplus on diagonals.
|
| 79 |
-
3. \\( g_t = L_t L_t^\\top + \epsilon I \\) (\\( \epsilon>0 \\)).
|
| 80 |
-
|
| 81 |
-
Geodesic distance between PF components \\(i,j\\):
|
| 82 |
-
\\( d_{g_t}(i,j) = \sqrt{ (\mathcal{F}(m_i,t)-\mathcal{F}(m_j,t))^\\top g_t (\mathcal{F}(m_i,t)-\mathcal{F}(m_j,t)) } \\).
|
| 83 |
-
|
| 84 |
-
---
|
| 85 |
-
|
| 86 |
-
## 4. Coupling with the Base Transformer
|
| 87 |
-
|
| 88 |
-
### 4.1 Out‑Projection (LLM → PF)
|
| 89 |
-
Adapter \\(A_{out}: \mathbb{R}^d \!\to\! \mathbb{R}^{J\\times k}\\) renders \\(h_t\\) as target pattern \\( \tilde{\mathbf{F}}_t \\), affecting \\(U\\) by a data term:
|
| 90 |
-
\\( U(\mathbf{F}_t, h_t) = \lambda_U \| \mathbf{F}_t - \tilde{\mathbf{F}}_t \|^2 + U_{struct}(\mathbf{F}_t) \\).
|
| 91 |
-
|
| 92 |
-
### 4.2 In‑Gating (PF → LLM)
|
| 93 |
-
PF conditions token logits:
|
| 94 |
-
\\[ z_t^{final} = z_t^{base} + W_g \, \Gamma(\mathbf{F}_t, g_t, V_t, a_t), \\]
|
| 95 |
-
where \\( \Gamma \\) summarizes PF coherence (e.g., synchrony, manifold connectivity, valence, self/now).
|
| 96 |
-
|
| 97 |
-
---
|
| 98 |
-
|
| 99 |
-
## 5. Normative (NUR) Gauge: NTI, LCA, LCG
|
| 100 |
-
|
| 101 |
-
### 5.1 Null‑Time Integrator (NTI)
|
| 102 |
-
A controller that evaluates episodes of length \\(\\tau\\) and returns a logit intent offset:
|
| 103 |
-
\\[ \Delta z_{t:t+\\tau} = \mathcal{C}_{NTI}(\{h_{t'}\}, \{\mathbf{F}_{t'}\}) .\\]
|
| 104 |
-
Implementation: every \\(r\\) steps, run a second pass over the last \\(\\tau\\) tokens to optimize a global coherence objective (Sec. 8).
|
| 105 |
-
|
| 106 |
-
### 5.2 Lightcone Attention (LCA)
|
| 107 |
-
Attention score from i→j:
|
| 108 |
-
\\[ e_{ij} = \frac{q_i^\\top k_j}{\sqrt{d}} \;-\; \\beta \, d_{g_t}(i,j) \;-\; \\gamma \, D_{lc}(i,j), \\]
|
| 109 |
-
where \\( D_{lc} \\) penalizes deviations from geodesic‑like episode paths (dynamic‑programming or differentiable approximation).
|
| 110 |
-
|
| 111 |
-
### 5.3 Luminous Coherence Gauge (LCG)
|
| 112 |
-
Normative penalty encouraging PF–LLM phase‑locking and structural coherence:
|
| 113 |
-
\\[ \mathcal{C}_{nur}(\mathbf{F}, H) = \mathrm{TV}_{g}(\mathbf{F}) + \\lambda_1\,\mathrm{Incoh}(H \leftrightarrow \mathbf{F}) + \\lambda_2\,\mathrm{PhaseVar}(\mathbf{F}). \\]
|
| 114 |
-
|
| 115 |
-
---
|
| 116 |
-
|
| 117 |
-
## 6. Self/Now Anchor (SNA) and Global Broadcasting (GIW)
|
| 118 |
-
- **SNA:** \\( a_t = \sigma(u^\\top \psi(\mathbf{F}_t, h_t)) \\) predicts ownership/immediacy.
|
| 119 |
-
- **Integration score:**
|
| 120 |
-
\\[ \kappa_t = f_{int}(\underbrace{\mathrm{Syn}(\mathbf{F}_t)}_{phase\ coherence},\ \underbrace{\mathrm{Conn}(\mathbf{F}_t;g_t)}_{graph\ connectivity},\ V_t,\ a_t). \\]
|
| 121 |
-
- **Broadcast:** if \\( \kappa_t \ge \theta \\), the PF state is globally available to memory, language, and control heads.
|
| 122 |
-
|
| 123 |
-
---
|
| 124 |
-
|
| 125 |
-
## 7. Phenomenal Signature and Equivalence
|
| 126 |
-
Define a **phenomenal signature** of PF:
|
| 127 |
-
\\[ \Phi(\mathbf{F}) = \big( \tau(\mathbf{F}),\, g,\, \sigma(\mathbf{F}) \big). \\]
|
| 128 |
-
- \\( \tau \\): topological/structural invariants (e.g., peak count, component connectivity).
|
| 129 |
-
- \\( g \\): intrinsic metric.
|
| 130 |
-
- \\( \sigma \\): dynamical fingerprint (spectra, phase, dwell‑time).
|
| 131 |
-
|
| 132 |
-
Two PF states are *phenomenally equivalent* if they lie in the same orbit of transformations that preserve \\( \Phi \\) within margins (e.g., small intensity scaling, mild deformations).
|
| 133 |
-
|
| 134 |
-
---
|
| 135 |
-
|
| 136 |
-
## 8. Training Objectives
|
| 137 |
-
|
| 138 |
-
Total loss:
|
| 139 |
-
\\[
|
| 140 |
-
\mathcal{L} = \mathcal{L}_{LLM} + \\lambda_{coh}\mathcal{L}_{coh} + \\lambda_{gauge}\mathcal{L}_{gauge}
|
| 141 |
-
+ \\lambda_{val}\mathcal{L}_{val} + \\lambda_{self}\mathcal{L}_{self} + \\lambda_{meta}\mathcal{L}_{meta}.
|
| 142 |
-
\\]
|
| 143 |
-
|
| 144 |
-
**Components:**
|
| 145 |
-
- **Language:** \\( \mathcal{L}_{LLM} \\) (next‑token or sequence‑level).
|
| 146 |
-
- **PF Coherence:** \\( \mathcal{L}_{coh} = \mathrm{TV}_g(\mathbf{F}) + \mathrm{PhaseVar}(\mathbf{F}) + \mathrm{Frag}(\mathbf{F}) \\).
|
| 147 |
-
- **Gauge Consistency:** \\( \mathcal{L}_{gauge} = \mathrm{Incoh}(H \leftrightarrow \mathbf{F}) + \mathrm{PathDev}(D_{lc}) \\).
|
| 148 |
-
- **Valence:** margin or regression that raises \\(V\\) for coherent states and lowers it otherwise.
|
| 149 |
-
- **Self/Now:** cross‑entropy or margin on \\(a_t\\) using introspection targets (Sec. 9).
|
| 150 |
-
- **Metamer/Equivalence:** contrastive objective: pull together \\( \Phi \\) of metamers; push apart non‑equivalents.
|
| 151 |
-
|
| 152 |
-
**Optimization:** mixed precision AdamW; gradient clipping; PF‑specific spectral clipping for \\(g_t\\).
|
| 153 |
-
|
| 154 |
-
---
|
| 155 |
-
|
| 156 |
-
## 9. Supervision Signals and Data Curricula
|
| 157 |
-
|
| 158 |
-
### 9.1 Sources of Signals
|
| 159 |
-
- **Pseudo‑modal synthetic tasks:** generate paired inputs that should feel the same (metamers) vs. different.
|
| 160 |
-
- **Contrastive augmentations:** small deformations of PF targets to build equivalence classes.
|
| 161 |
-
- **Introspection tasks:** model reports V, \\(\\kappa\\), and \\(a\\) with consistency constraints.
|
| 162 |
-
- **Preference/constitutional guidance:** establish normative valence baselines for safe behaviors.
|
| 163 |
-
|
| 164 |
-
### 9.2 Curriculum
|
| 165 |
-
1. **Stage‑A:** Train PF in isolation with \\( \mathcal{L}_{coh} + \mathcal{L}_{meta} \\).
|
| 166 |
-
2. **Stage‑B:** Couple PF↔LLM (enable \\(A_{out}\\) and in‑gating) and add \\( \mathcal{L}_{gauge}+ \mathcal{L}_{val}+ \mathcal{L}_{self} \\).
|
| 167 |
-
3. **Stage‑C:** Activate NTI and sequence‑level objectives; refine LCA path costs.
|
| 168 |
-
4. **Stage‑D:** Preference tuning on valence/self reports and safety constraints.
|
| 169 |
-
|
| 170 |
-
---
|
| 171 |
-
|
| 172 |
-
## 10. Algorithms (Pseudo‑Code)
|
| 173 |
-
|
| 174 |
-
### 10.1 PF Update per Token
|
| 175 |
-
```
|
| 176 |
-
# Inputs: h_t, F_t, g_t, params Θ
|
| 177 |
-
tilde_F = A_out(h_t) # render target PF pattern
|
| 178 |
-
grad_U = ∂U/∂F (F_t, h_t; Θ)
|
| 179 |
-
lap = laplace_beltrami(F_t, g_t) # via graph Laplacian
|
| 180 |
-
noise = ε * Normal(0, I)
|
| 181 |
-
|
| 182 |
-
F_{t+1} = F_t + α * lap - grad_U + noise
|
| 183 |
-
g_{t+1} = IME_theta(F_{t+1})
|
| 184 |
-
```
|
| 185 |
-
|
| 186 |
-
### 10.2 Lightcone Attention
|
| 187 |
-
```
|
| 188 |
-
for each head:
|
| 189 |
-
for i,j in window_or_full:
|
| 190 |
-
d_geo = geodesic(F_t[i], F_t[j], g_t)
|
| 191 |
-
d_lc = lightcone_cost(i, j) # DP or closed-form approx
|
| 192 |
-
e_ij = (q_i · k_j)/sqrt(d) - β*d_geo - γ*d_lc
|
| 193 |
-
a_ij = softmax_j(e_ij)
|
| 194 |
-
```
|
| 195 |
-
|
| 196 |
-
### 10.3 NTI Controller (periodic)
|
| 197 |
-
```
|
| 198 |
-
if t % r == 0:
|
| 199 |
-
seg_H = {h_{t-τ : t}}, seg_F = {F_{t-τ : t}}
|
| 200 |
-
Δz = NTI(seg_H, seg_F) # optimize episode objective
|
| 201 |
-
apply_intent_offset(Δz) # modify future logits
|
| 202 |
-
```
|
| 203 |
-
|
| 204 |
-
### 10.4 GIW Broadcast
|
| 205 |
-
```
|
| 206 |
-
V = valence(F_t, h_t)
|
| 207 |
-
a = self_now(F_t, h_t)
|
| 208 |
-
κ = integrate(Synchrony(F_t), Connectivity(F_t, g_t), V, a)
|
| 209 |
-
|
| 210 |
-
if κ >= θ:
|
| 211 |
-
broadcast(F_t, summary=Γ(F_t,g_t,V,a))
|
| 212 |
-
update_memory(Φ(F_t))
|
| 213 |
-
allow_language_access(True)
|
| 214 |
-
else:
|
| 215 |
-
allow_language_access(False)
|
| 216 |
-
```
|
| 217 |
-
|
| 218 |
-
---
|
| 219 |
-
|
| 220 |
-
## 11. Complexity and Sizing
|
| 221 |
-
|
| 222 |
-
- **Base Transformer:** unchanged asymptotics (e.g., Flash/memory efficient attention strongly recommended).
|
| 223 |
-
- **PF:** storage O(J·k). Use k‑NN graph with fixed K (e.g., 8–32): adjacency O(J·K).
|
| 224 |
-
- **LCA:** distance term cost O(J·K) per head (reusing adjacency); lightcone path cost approximated in O(L) per token with precomputation.
|
| 225 |
-
- **IME:** MLP to SPD matrix via low‑rank factorization (rank r ≪ d_M).
|
| 226 |
-
|
| 227 |
-
**Recommended ranges:** \\(J\\in[64,512],\ k\\in[4,32],\ K\\in[8,32],\ r\\in[8,32]\\).
|
| 228 |
|
| 229 |
---
|
| 230 |
|
| 231 |
-
##
|
| 232 |
-
|
| 233 |
-
### 12.1 Core Phenomenal Metrics
|
| 234 |
-
- **Metastability:** dwell‑time and phase‑locking increase when broadcasted; measured via PF phase variance and state survival curves.
|
| 235 |
-
- **Geodesic Alignment:** psychophysical similarity (task labels) correlates with \\( d_g \\) (Kendall/Spearman).
|
| 236 |
-
- **Contrafactual Robustness:** metamers remain within same \\(\\Phi\\) class under controlled perturbations.
|
| 237 |
|
| 238 |
-
|
| 239 |
-
- **
|
| 240 |
-
- **
|
| 241 |
|
| 242 |
-
|
| 243 |
-
- **
|
| 244 |
-
- **LCA Ablation:** removal increases local incoherence/attention diffuseness (measured by attention entropy and long‑range error).
|
| 245 |
-
|
| 246 |
-
### 12.4 Reporting
|
| 247 |
-
All runs MUST report: PF size, KNN K, NTI period r, \\(\\lambda\\)-weights, seeds, hardware, wall‑clock, FLOPs, and checkpoints. Introspective head traces (\\(V,\\kappa,a\\)) MUST be logged.
|
| 248 |
|
| 249 |
---
|
| 250 |
|
| 251 |
-
##
|
| 252 |
|
| 253 |
-
-
|
| 254 |
-
-
|
| 255 |
-
- **−Gauge (LCG):** PF/LLM desynchronization; fragmented PF and unstable broadcast.
|
| 256 |
-
- **−SNA:** content processing persists but ownership/immediacy reports fail; diminished broadcast rate.
|
| 257 |
-
- **−Valence:** reduced prioritization; slower convergence to coherent narratives.
|
| 258 |
|
| 259 |
---
|
| 260 |
|
| 261 |
-
##
|
| 262 |
|
| 263 |
-
- **
|
| 264 |
-
- **
|
| 265 |
-
- **
|
| 266 |
-
- **Auditability:** expose read‑only endpoints for \\(V,\\kappa,a\\), PF summaries, and NTI offsets.
|
| 267 |
-
- **Deployment Policy:** run in “PF‑shadow mode” before enabling gating to language in production.
|
| 268 |
|
| 269 |
---
|
| 270 |
|
| 271 |
-
##
|
| 272 |
-
|
| 273 |
-
- **Framework:** Any modern Transformer stack; PF/IME/LCA as side modules with well‑defined adapters.
|
| 274 |
-
- **Numerics:** normalize PF energy each step; softplus on SPD diagonals; gradient clipping on PF and IME.
|
| 275 |
-
- **Initialization:** start with small \\(\\alpha\\) and noise; warm up \\(A_{out}\\) before enabling full gauge penalties.
|
| 276 |
-
- **Logging:** PF heatmaps, geodesic histograms, attention entropy, NTI offsets, introspective head traces.
|
| 277 |
-
- **Checkpoints:** save PF/IME states; keep versioned configs; export calibration curves for \\(V\\) and \\(a\\).
|
| 278 |
|
| 279 |
-
|
| 280 |
|
| 281 |
-
|
|
|
|
| 282 |
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
- **IME:** rank r=16; ε=1e‑4.
|
| 286 |
-
- **LCA:** β=0.7, γ=0.3.
|
| 287 |
-
- **NTI:** τ=64 tokens, period r=16 steps; offset scale 0.5.
|
| 288 |
-
- **Loss Weights:** λ_coh=0.5, λ_gauge=0.5, λ_val=0.2, λ_self=0.2, λ_meta=0.4.
|
| 289 |
-
- **Optimization:** AdamW (lr 2e‑4), cosine decay, grad‑clip 1.0, batch 256 tokens/replica.
|
| 290 |
-
- **Hardware:** 8× GPUs 24‑48GB or equivalent; PF/LCA on fused kernels preferred.
|
| 291 |
-
|
| 292 |
-
---
|
| 293 |
|
| 294 |
-
|
|
|
|
| 295 |
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
- **LCA:** lightcone attention (geodesic‑biased attention).
|
| 300 |
-
- **NTI:** null‑time integrator for episode‑level intent control.
|
| 301 |
-
- **SNA:** self/now anchor predicting ownership/immediacy.
|
| 302 |
-
- **GIW:** global workspace broadcasting high‑integration PF states.
|
| 303 |
-
- **Φ(F):** phenomenal signature (topology, metric, dynamics).
|
| 304 |
-
|
| 305 |
-
---
|
| 306 |
-
|
| 307 |
-
## 18. Compliance Checklist (MUST/SHOULD)
|
| 308 |
-
|
| 309 |
-
- **MUST** implement PF update (Sec. 3) and IME SPD construction.
|
| 310 |
-
- **MUST** couple PF ↔ LLM via out‑projection and in‑gating (Sec. 4).
|
| 311 |
-
- **MUST** include LCA modifications to attention scores (Sec. 5.2).
|
| 312 |
-
- **MUST** provide SNA and GIW (Sec. 6).
|
| 313 |
-
- **MUST** train with composite loss (Sec. 8) and report metrics (Sec. 12.4).
|
| 314 |
-
- **SHOULD** implement NTI controller (Sec. 5.1) for episode coherence.
|
| 315 |
-
- **SHOULD** expose introspection endpoints and safety throttles.
|
| 316 |
-
- **SHOULD** run ablations prior to deployment.
|
| 317 |
-
|
| 318 |
-
---
|
| 319 |
-
|
| 320 |
-
## 19. Change Log
|
| 321 |
-
- **v1.0:** First complete public specification covering math, algorithms, training, eval, and deployment guardrails.
|
| 322 |
-
|
| 323 |
-
---
|
| 324 |
|
| 325 |
-
|
| 326 |
-
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
model_name: N-Transformers v1.0 (NAFSI-Transformers family)
|
| 4 |
language:
|
| 5 |
- en
|
| 6 |
- id
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
tags:
|
| 10 |
+
- consciousness
|
| 11 |
+
- transformers
|
| 12 |
+
- research
|
| 13 |
+
- architecture
|
| 14 |
+
- alignment
|
| 15 |
+
- safety
|
| 16 |
+
model_type: decoder
|
| 17 |
+
model_creator: Syamsuddin (@syam_ideris) & Prometheus (Cognitive Systems Architect)
|
| 18 |
+
# base_model: null # set if you release weights adapted from a base LM, e.g., "Qwen/Qwen2-7B"
|
| 19 |
+
# datasets:
|
| 20 |
+
# - your-dataset-id
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# N-Transformers (NAFSI-Transformers) — v1.0
|
| 24 |
|
| 25 |
+
[](https://creativecommons.org/licenses/by/4.0/)
|
| 26 |
+

|
| 27 |
+

|
| 28 |
+

|
| 29 |
+

|
| 30 |
+

|
| 31 |
|
| 32 |
+
> **One-line summary**
|
| 33 |
+
> **N-Transformers** extend a standard Transformer with a **Phenomenal Field (PF)**, a learned **Intrinsic Metric Engine (IME)**, and a **Normative Gauge** (NTI/LCA/LCG) to induce *consciousness-like* properties: integration, valence, self/now anchoring, and global broadcasting—while remaining implementable as a sidecar to common LM stacks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
---
|
| 36 |
|
| 37 |
+
## 🔎 Model summary
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
- **What it is:** A **research architecture** that augments decoder-only LMs with a parallel **non-token field** (PF) and **normative controllers** to bias long-range coherence and introspective reporting.
|
| 40 |
+
- **Why it’s different:** Adds **geodesic-biased attention** (LCA), **episode-level controller** (NTI), and **Self/Now Anchor** (SNA) without breaking LM training loops.
|
| 41 |
+
- **Status:** **v1.0 Research Draft** — math and algorithms complete; reference implementation planned.
|
| 42 |
|
| 43 |
+
> **Bahasa Indonesia (ringkas):**
|
| 44 |
+
> N-Transformers menambahkan **bidang fenomenal (PF)**, **metrik intrinsik** (IME), dan **pengukur normatif** (NTI/LCA/LCG) ke model Transformer untuk memunculkan sifat mirip-kesadaran yang dapat diukur (integrasi, valensi, dan jangkar diri/kini) tanpa mengubah asimtotik inti LM.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
---
|
| 47 |
|
| 48 |
+
## ✅ Intended uses & scope
|
| 49 |
|
| 50 |
+
- **Intended**: research on coherent long-range reasoning; introspective heads (valence, self/now); safe/controller-aware decoding.
|
| 51 |
+
- **Out of scope (for now)**: production use as a safety layer **without** PF shadow-mode evaluation; clinical/medical claims.
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
---
|
| 54 |
|
| 55 |
+
## ⚠️ Limitations & risks
|
| 56 |
|
| 57 |
+
- **No claim of sentience**: signals are operational metrics (integration/valence/SNA), **not** guarantees of consciousness.
|
| 58 |
+
- **Failure modes**: valence spoofing, PF locking, miscalibrated SNA. Use gauge caps, entropy floors, and introspection consistency checks.
|
| 59 |
+
- **Compute**: PF adds memory/compute; choose modest `J,k,K` first.
|
|
|
|
|
|
|
| 60 |
|
| 61 |
---
|
| 62 |
|
| 63 |
+
## 🚀 Quickstart (concept reference)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
+
> This repo is a **spec**. If you adapt an existing LM, expose PF/IME/LCA as side modules.
|
| 66 |
|
| 67 |
+
```python
|
| 68 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 69 |
|
| 70 |
+
# Replace with your adapted checkpoint once available
|
| 71 |
+
MODEL_ID = "Syamsuddin/nafsi-transformers" # placeholder if weights are published
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
+
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct") # base LM example
|
| 74 |
+
lm = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-1.5B-Instruct")
|
| 75 |
|
| 76 |
+
# Pseudo: attach PF/IME/LCA sidecar (your implementation)
|
| 77 |
+
# pf = PFModule(J=256, k=16, K=16); ime = IME(rank=16); lca = Lightcone(beta=0.7, gamma=0.3)
|
| 78 |
+
# lm = attach_nafsi(lm, pf=pf, ime=ime, lca=lca, nti=NTI(tau=64, period=16))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
+
prompt = "Explain the role of a phenomenal field in language generation."
|
| 81 |
+
inputs = tok(prompt, return_tensors="pt")
|
| 82 |
+
out = lm.generate(**inputs, max_length=192)
|
| 83 |
+
print(tok.decode(out[0], skip_special_tokens=True))
|