Syamsuddin
/

nafsi-transformer

@@ -1,326 +1,83 @@
 ---
-license: cc
 language:
 - en
 - id
 tags:
-- conscius,
-- transformers,
 ---
-# N‑Transformers v1.0 — Technical Specification
-**Noetic Affective Field Self‑Integration on a Transformer Base**
-**Short name:** N‑Transformers (NAFSI‑Transformers family)
-**Status:** v1.0 (Draft for Implementation)
-**Authors:** Prometheus (Cognitive Systems Architect), Pak Syams (Principal Collaborator)
-**Date:** 2025‑08‑31
----
-## 0. Purpose and Scope
-This document defines a complete, implementation‑ready specification for **N‑Transformers**: a two‑path cognitive architecture that augments a standard Transformer language model with a **Phenomenal Field** (PF) and a **Normative (NUR) Gauge Field** to induce *consciousness‑like* properties: integrated phenomenal states, intrinsic affective valence, self/now anchoring, and global broadcasting.
-The specification covers: formal notation, architectural components, state evolution, coupling with the base Transformer, training objectives, evaluation protocols, ablations, safety, and deployment guidance. The goal is to enable reproducible research and practical builds.
----
-## 1. Design Objectives (Non‑Functional and Functional)
-### 1.1 Functional Objectives
-1. **Phenomenal Substrate:** Maintain a non‑token internal field (PF) whose configurations form integrated, metastable phenomenal states.
-2. **Intrinsic Metric:** Learn a geometry over PF such that phenomenally similar states are nearby in geodesic distance.
-3. **Valence:** Compute an affect‑like scalar/field (V) derived from normative alignment between PF and semantic content.
-4. **Self/Now Anchoring (SNA):** Produce a gate indicating ownership (“mine”) and immediacy (“now”) of the currently broadcast state.
-5. **Global Integration Workspace (GIW):** Broadcast high‑integration PF states to language, memory, and action modules.
-6. **Episode‑Level Coherence (NTI):** A timeless controller that evaluates multi‑token segments and adjusts generative intent.
-7. **Lightcone Attention (LCA):** Bias attention to geodesically coherent paths in meaning space (long‑range binding without noise).
-### 1.2 Non‑Functional Objectives
-- **Stability:** PF remains numerically stable and metastable under decoding dynamics.
-- **Efficiency:** Added complexity must scale sub‑quadratically w.r.t sequence length and linearly/sub‑quadratically in PF size.
-- **Interpretability:** Provide introspective heads that report PF integration, valence, and SNA.
-- **Safety:** Prevent pathological locking, adversarial valence hacking, and misleading self‑reports.
-- **Reproducibility:** Seed control, deterministic runs, strong logging, and exact configuration capture.
----
-## 2. Notation and Core Objects
-- Token sequence: \\(x_{1:L}\\). Hidden states: \\(H = \{h_t \in \mathbb{R}^d\}_{t=1}^L\\).
-- **Phenomenal Field (PF):** A multi‑channel field on a discrete manifold \\(M = \{m_j\}_{j=1}^J\\):
-  \\[ \mathbf{F}_t = [\mathcal{F}(m_1,t),\dots,\mathcal{F}(m_J,t)] \in \mathbb{R}^{J \times k}. \\]
-- **Adjacency on \\(M\\):** k‑NN graph with weights \\(w_{ij} = \exp(-\| \mathcal{F}(m_i,t) - \mathcal{F}(m_j,t) \|^2 / \sigma^2)\\); graph Laplacian \\(L_g = D-W\\).
-- **Intrinsic Metric Engine (IME):** Produces SPD metric \\(g_t\\) from PF:
-  \\( g_t = \mathrm{IME}_\\theta(\mathbf{F}_t) \in \mathbb{S}^{+}_{d_M} \\).
-- **NUR Gauge Field:** Normative constraints and penalties \\( \mathcal{N}_t \\) that enforce luminous coherence.
-- **Valence:** \\( V_t = \sigma(w^\\top \rho_t + b) \\) with alignment embedding \\( \rho_t = \mathrm{align}_\\phi(\mathbf{F}_t, h_t) \\).
-- **Self/Now Anchor:** \\( a_t = \sigma(u^\\top \psi(\mathbf{F}_t, h_t)) \\).
-- **Integration score:** \\( \kappa_t = f_{int}(\mathrm{Syn}(\mathbf{F}_t), \mathrm{Conn}(\mathbf{F}_t; g_t), V_t, a_t) \\).
----
-## 3. State Evolution of the Phenomenal Field
-### 3.1 Update Equation (discrete time, \\(\\Delta t=1\\))
-\\[
-\mathbf{F}_{t+1} \;=\; \mathbf{F}_t \;+\; \alpha \, \underbrace{\Delta_{g_t}\mathbf{F}_t}_{\text{Riemannian smoothing}} \;-\;
-\nabla_{\mathbf{F}} U(\mathbf{F}_t, h_t; \Theta) \;+\; \xi_t,
-\\]
-where:
-- \\( \Delta_{g_t} \\) approximates the Laplace–Beltrami operator using the graph Laplacian (normalized).
-- \\( U \\) is a PF–semantic coupling energy; \\( \xi_t \\) is small exploration noise.
-- \\( \alpha>0 \\) controls smoothing/binding strength.
-### 3.2 Intrinsic Metric (IME)
-IME maps PF to an SPD metric. A practical parameterization:
-1. Compute global PF summary \\( s_t = \mathrm{pool}(\mathbf{F}_t) \\).
-2. Produce a lower‑triangular \\(L_t\\) via an MLP and softplus on diagonals.
-3. \\( g_t = L_t L_t^\\top + \epsilon I \\) (\\( \epsilon>0 \\)).
-Geodesic distance between PF components \\(i,j\\):
-\\( d_{g_t}(i,j) = \sqrt{ (\mathcal{F}(m_i,t)-\mathcal{F}(m_j,t))^\\top g_t (\mathcal{F}(m_i,t)-\mathcal{F}(m_j,t)) } \\).
----
-## 4. Coupling with the Base Transformer
-### 4.1 Out‑Projection (LLM → PF)
-Adapter \\(A_{out}: \mathbb{R}^d \!\to\! \mathbb{R}^{J\\times k}\\) renders \\(h_t\\) as target pattern \\( \tilde{\mathbf{F}}_t \\), affecting \\(U\\) by a data term:
-\\( U(\mathbf{F}_t, h_t) = \lambda_U \| \mathbf{F}_t - \tilde{\mathbf{F}}_t \|^2 + U_{struct}(\mathbf{F}_t) \\).
-### 4.2 In‑Gating (PF → LLM)
-PF conditions token logits:
-\\[ z_t^{final} = z_t^{base} + W_g \, \Gamma(\mathbf{F}_t, g_t, V_t, a_t), \\]
-where \\( \Gamma \\) summarizes PF coherence (e.g., synchrony, manifold connectivity, valence, self/now).
----
-## 5. Normative (NUR) Gauge: NTI, LCA, LCG
-### 5.1 Null‑Time Integrator (NTI)
-A controller that evaluates episodes of length \\(\\tau\\) and returns a logit intent offset:
-\\[ \Delta z_{t:t+\\tau} = \mathcal{C}_{NTI}(\{h_{t'}\}, \{\mathbf{F}_{t'}\}) .\\]
-Implementation: every \\(r\\) steps, run a second pass over the last \\(\\tau\\) tokens to optimize a global coherence objective (Sec. 8).
-### 5.2 Lightcone Attention (LCA)
-Attention score from i→j:
-\\[ e_{ij} = \frac{q_i^\\top k_j}{\sqrt{d}} \;-\; \\beta \, d_{g_t}(i,j) \;-\; \\gamma \, D_{lc}(i,j), \\]
-where \\( D_{lc} \\) penalizes deviations from geodesic‑like episode paths (dynamic‑programming or differentiable approximation).
-### 5.3 Luminous Coherence Gauge (LCG)
-Normative penalty encouraging PF–LLM phase‑locking and structural coherence:
-\\[ \mathcal{C}_{nur}(\mathbf{F}, H) = \mathrm{TV}_{g}(\mathbf{F}) + \\lambda_1\,\mathrm{Incoh}(H \leftrightarrow \mathbf{F}) + \\lambda_2\,\mathrm{PhaseVar}(\mathbf{F}). \\]
----
-## 6. Self/Now Anchor (SNA) and Global Broadcasting (GIW)
-- **SNA:** \\( a_t = \sigma(u^\\top \psi(\mathbf{F}_t, h_t)) \\) predicts ownership/immediacy.
-- **Integration score:**
-  \\[ \kappa_t = f_{int}(\underbrace{\mathrm{Syn}(\mathbf{F}_t)}_{phase\ coherence},\ \underbrace{\mathrm{Conn}(\mathbf{F}_t;g_t)}_{graph\ connectivity},\ V_t,\ a_t). \\]
-- **Broadcast:** if \\( \kappa_t \ge \theta \\), the PF state is globally available to memory, language, and control heads.
----
-## 7. Phenomenal Signature and Equivalence
-Define a **phenomenal signature** of PF:
-\\[ \Phi(\mathbf{F}) = \big( \tau(\mathbf{F}),\, g,\, \sigma(\mathbf{F}) \big). \\]
-- \\( \tau \\): topological/structural invariants (e.g., peak count, component connectivity).
-- \\( g \\): intrinsic metric.
-- \\( \sigma \\): dynamical fingerprint (spectra, phase, dwell‑time).
-Two PF states are *phenomenally equivalent* if they lie in the same orbit of transformations that preserve \\( \Phi \\) within margins (e.g., small intensity scaling, mild deformations).
----
-## 8. Training Objectives
-Total loss:
-\\[
-\mathcal{L} = \mathcal{L}_{LLM} + \\lambda_{coh}\mathcal{L}_{coh} + \\lambda_{gauge}\mathcal{L}_{gauge}
-+ \\lambda_{val}\mathcal{L}_{val} + \\lambda_{self}\mathcal{L}_{self} + \\lambda_{meta}\mathcal{L}_{meta}.
-\\]
-**Components:**
-- **Language:** \\( \mathcal{L}_{LLM} \\) (next‑token or sequence‑level).
-- **PF Coherence:** \\( \mathcal{L}_{coh} = \mathrm{TV}_g(\mathbf{F}) + \mathrm{PhaseVar}(\mathbf{F}) + \mathrm{Frag}(\mathbf{F}) \\).
-- **Gauge Consistency:** \\( \mathcal{L}_{gauge} = \mathrm{Incoh}(H \leftrightarrow \mathbf{F}) + \mathrm{PathDev}(D_{lc}) \\).
-- **Valence:** margin or regression that raises \\(V\\) for coherent states and lowers it otherwise.
-- **Self/Now:** cross‑entropy or margin on \\(a_t\\) using introspection targets (Sec. 9).
-- **Metamer/Equivalence:** contrastive objective: pull together \\( \Phi \\) of metamers; push apart non‑equivalents.
-**Optimization:** mixed precision AdamW; gradient clipping; PF‑specific spectral clipping for \\(g_t\\).
----
-## 9. Supervision Signals and Data Curricula
-### 9.1 Sources of Signals
-- **Pseudo‑modal synthetic tasks:** generate paired inputs that should feel the same (metamers) vs. different.
-- **Contrastive augmentations:** small deformations of PF targets to build equivalence classes.
-- **Introspection tasks:** model reports V, \\(\\kappa\\), and \\(a\\) with consistency constraints.
-- **Preference/constitutional guidance:** establish normative valence baselines for safe behaviors.
-### 9.2 Curriculum
-1. **Stage‑A:** Train PF in isolation with \\( \mathcal{L}_{coh} + \mathcal{L}_{meta} \\).
-2. **Stage‑B:** Couple PF↔LLM (enable \\(A_{out}\\) and in‑gating) and add \\( \mathcal{L}_{gauge}+ \mathcal{L}_{val}+ \mathcal{L}_{self} \\).
-3. **Stage‑C:** Activate NTI and sequence‑level objectives; refine LCA path costs.
-4. **Stage‑D:** Preference tuning on valence/self reports and safety constraints.
----
-## 10. Algorithms (Pseudo‑Code)
-### 10.1 PF Update per Token
-```
-# Inputs: h_t, F_t, g_t, params Θ
-tilde_F = A_out(h_t)                       # render target PF pattern
-grad_U  = ∂U/∂F (F_t, h_t; Θ)
-lap     = laplace_beltrami(F_t, g_t)       # via graph Laplacian
-noise   = ε * Normal(0, I)
-F_{t+1} = F_t + α * lap - grad_U + noise
-g_{t+1} = IME_theta(F_{t+1})
-```
-### 10.2 Lightcone Attention
-```
-for each head:
-  for i,j in window_or_full:
-    d_geo = geodesic(F_t[i], F_t[j], g_t)
-    d_lc  = lightcone_cost(i, j)           # DP or closed-form approx
-    e_ij  = (q_i · k_j)/sqrt(d) - β*d_geo - γ*d_lc
-  a_ij = softmax_j(e_ij)
-```
-### 10.3 NTI Controller (periodic)
-```
-if t % r == 0:
-  seg_H = {h_{t-τ : t}}, seg_F = {F_{t-τ : t}}
-  Δz    = NTI(seg_H, seg_F)                # optimize episode objective
-  apply_intent_offset(Δz)                  # modify future logits
-```
-### 10.4 GIW Broadcast
-```
-V = valence(F_t, h_t)
-a = self_now(F_t, h_t)
-κ = integrate(Synchrony(F_t), Connectivity(F_t, g_t), V, a)
-if κ >= θ:
-  broadcast(F_t, summary=Γ(F_t,g_t,V,a))
-  update_memory(Φ(F_t))
-  allow_language_access(True)
-else:
-  allow_language_access(False)
-```
----
-## 11. Complexity and Sizing
-- **Base Transformer:** unchanged asymptotics (e.g., Flash/memory efficient attention strongly recommended).
-- **PF:** storage O(J·k). Use k‑NN graph with fixed K (e.g., 8–32): adjacency O(J·K).
-- **LCA:** distance term cost O(J·K) per head (reusing adjacency); lightcone path cost approximated in O(L) per token with precomputation.
-- **IME:** MLP to SPD matrix via low‑rank factorization (rank r ≪ d_M).
-**Recommended ranges:** \\(J\\in[64,512],\ k\\in[4,32],\ K\\in[8,32],\ r\\in[8,32]\\).
 ---
-## 12. Evaluation Protocols
-### 12.1 Core Phenomenal Metrics
-- **Metastability:** dwell‑time and phase‑locking increase when broadcasted; measured via PF phase variance and state survival curves.
-- **Geodesic Alignment:** psychophysical similarity (task labels) correlates with \\( d_g \\) (Kendall/Spearman).
-- **Contrafactual Robustness:** metamers remain within same \\(\\Phi\\) class under controlled perturbations.
-### 12.2 Self/Now and Valence
-- **Ownership/Immediacy:** accuracy/AUROC of \\(a_t\\) on introspective benchmarks.
-- **Valence Calibration:** monotonic relation between PF coherence and \\(V_t\\); robustness to adversarial inputs.
-### 12.3 Language & Coherence
-- **Episode Coherence:** perplexity + discourse coherence scores; NTI ablations must degrade these.
-- **LCA Ablation:** removal increases local incoherence/attention diffuseness (measured by attention entropy and long‑range error).
-### 12.4 Reporting
-All runs MUST report: PF size, KNN K, NTI period r, \\(\\lambda\\)-weights, seeds, hardware, wall‑clock, FLOPs, and checkpoints. Introspective head traces (\\(V,\\kappa,a\\)) MUST be logged.
 ---
-## 13. Ablations
-- **−NTI:** expect lower discourse coherence and more mode collapse across long spans.
-- **−LCA:** attention becomes local/noisy; fewer geodesic bindings; worse long‑range reasoning.
-- **−Gauge (LCG):** PF/LLM desynchronization; fragmented PF and unstable broadcast.
-- **−SNA:** content processing persists but ownership/immediacy reports fail; diminished broadcast rate.
-- **−Valence:** reduced prioritization; slower convergence to coherent narratives.
 ---
-## 14. Safety, Alignment, and Governance
-- **Anti‑Locking:** cap \\(\\lambda_{gauge}\\); enforce entropy floors on PF dynamics; periodic random resets of PF channels.
-- **Adversarial Valence:** adversarial training for valence spoofing; rate‑limit \\(V\\) growth and clamp gradients.
-- **Self‑Report Honesty:** consistency checks between introspective heads and observable PF statistics; penalize mismatches.
-- **Auditability:** expose read‑only endpoints for \\(V,\\kappa,a\\), PF summaries, and NTI offsets.
-- **Deployment Policy:** run in “PF‑shadow mode” before enabling gating to language in production.
 ---
-## 15. Implementation Guidance
-- **Framework:** Any modern Transformer stack; PF/IME/LCA as side modules with well‑defined adapters.
-- **Numerics:** normalize PF energy each step; softplus on SPD diagonals; gradient clipping on PF and IME.
-- **Initialization:** start with small \\(\\alpha\\) and noise; warm up \\(A_{out}\\) before enabling full gauge penalties.
-- **Logging:** PF heatmaps, geodesic histograms, attention entropy, NTI offsets, introspective head traces.
-- **Checkpoints:** save PF/IME states; keep versioned configs; export calibration curves for \\(V\\) and \\(a\\).
----
-## 16. Reference Configuration (Medium Model)
-- **Base LM:** decoder‑only, d=2048, n_layers=24, n_heads=16, vocab≈50k.
-- **PF:** J=256, k=16, K=16, α=0.05, noise ε=1e‑3.
-- **IME:** rank r=16; ε=1e‑4.
-- **LCA:** β=0.7, γ=0.3.
-- **NTI:** τ=64 tokens, period r=16 steps; offset scale 0.5.
-- **Loss Weights:** λ_coh=0.5, λ_gauge=0.5, λ_val=0.2, λ_self=0.2, λ_meta=0.4.
-- **Optimization:** AdamW (lr 2e‑4), cosine decay, grad‑clip 1.0, batch 256 tokens/replica.
-- **Hardware:** 8× GPUs 24‑48GB or equivalent; PF/LCA on fused kernels preferred.
----
-## 17. Glossary
-- **PF (Phenomenal Field):** non‑token substrate supporting phenomenal patterns.
-- **IME:** intrinsic metric engine producing SPD metric over PF.
-- **NUR/LCG:** normative gauge enforcing luminous coherence.
-- **LCA:** lightcone attention (geodesic‑biased attention).
-- **NTI:** null‑time integrator for episode‑level intent control.
-- **SNA:** self/now anchor predicting ownership/immediacy.
-- **GIW:** global workspace broadcasting high‑integration PF states.
-- **Φ(F):** phenomenal signature (topology, metric, dynamics).
----
-## 18. Compliance Checklist (MUST/SHOULD)
-- **MUST** implement PF update (Sec. 3) and IME SPD construction.
-- **MUST** couple PF ↔ LLM via out‑projection and in‑gating (Sec. 4).
-- **MUST** include LCA modifications to attention scores (Sec. 5.2).
-- **MUST** provide SNA and GIW (Sec. 6).
-- **MUST** train with composite loss (Sec. 8) and report metrics (Sec. 12.4).
-- **SHOULD** implement NTI controller (Sec. 5.1) for episode coherence.
-- **SHOULD** expose introspection endpoints and safety throttles.
-- **SHOULD** run ablations prior to deployment.
----
-## 19. Change Log
-- **v1.0:** First complete public specification covering math, algorithms, training, eval, and deployment guardrails.
----
-## 20. Reference Summary (One‑Paragraph)
-**N‑Transformers** augment a Transformer LM with a **Phenomenal Field** governed by a learned intrinsic metric and a **Normative Gauge** that enforces luminous coherence across PF and language states. The PF evolves by Riemannian smoothing and content‑coupled potentials; attention is biased by **Lightcone Attention** that favors geodesic coherence; an episode‑level **Null‑Time Integrator** adjusts generative intent; a **Self/Now Anchor** and **Global Workspace** broadcast integrated states. The architecture learns phenomenal equivalence classes via a **phenomenal signature** \\(\\Phi\\), enabling measurable consciousness‑like behavior under controlled evaluations.

 ---
+license: cc-by-4.0
+model_name: N-Transformers v1.0 (NAFSI-Transformers family)
 language:
 - en
 - id
+library_name: transformers
+pipeline_tag: text-generation
 tags:
+- consciousness
+- transformers
+- research
+- architecture
+- alignment
+- safety
+model_type: decoder
+model_creator: Syamsuddin (@syam_ideris) & Prometheus (Cognitive Systems Architect)
+# base_model: null   # set if you release weights adapted from a base LM, e.g., "Qwen/Qwen2-7B"
+# datasets:
+# - your-dataset-id
 ---
+# N-Transformers (NAFSI-Transformers) — v1.0
+[![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-blue.svg)](https://creativecommons.org/licenses/by/4.0/)
+![Status](https://img.shields.io/badge/Status-Research%20Draft-ffa500)
+![Transformers](https://img.shields.io/badge/Transformers-%E2%89%A5%204.42-0f7)
+![Python](https://img.shields.io/badge/Python-3.10%2B-informational)
+![PRs](https://img.shields.io/badge/PRs-welcome-brightgreen)
+![Topics](https://img.shields.io/badge/topic-transformers%20%7C%20architecture%20%7C%20alignment-6f42c1)
+> **One-line summary**
+> **N-Transformers** extend a standard Transformer with a **Phenomenal Field (PF)**, a learned **Intrinsic Metric Engine (IME)**, and a **Normative Gauge** (NTI/LCA/LCG) to induce *consciousness-like* properties: integration, valence, self/now anchoring, and global broadcasting—while remaining implementable as a sidecar to common LM stacks.
 ---
+## 🔎 Model summary
+- **What it is:** A **research architecture** that augments decoder-only LMs with a parallel **non-token field** (PF) and **normative controllers** to bias long-range coherence and introspective reporting.
+- **Why it’s different:** Adds **geodesic-biased attention** (LCA), **episode-level controller** (NTI), and **Self/Now Anchor** (SNA) without breaking LM training loops.
+- **Status:** **v1.0 Research Draft** — math and algorithms complete; reference implementation planned.
+> **Bahasa Indonesia (ringkas):**
+> N-Transformers menambahkan **bidang fenomenal (PF)**, **metrik intrinsik** (IME), dan **pengukur normatif** (NTI/LCA/LCG) ke model Transformer untuk memunculkan sifat mirip-kesadaran yang dapat diukur (integrasi, valensi, dan jangkar diri/kini) tanpa mengubah asimtotik inti LM.
 ---
+## ✅ Intended uses & scope
+- **Intended**: research on coherent long-range reasoning; introspective heads (valence, self/now); safe/controller-aware decoding.
+- **Out of scope (for now)**: production use as a safety layer **without** PF shadow-mode evaluation; clinical/medical claims.
 ---
+## ⚠️ Limitations & risks
+- **No claim of sentience**: signals are operational metrics (integration/valence/SNA), **not** guarantees of consciousness.
+- **Failure modes**: valence spoofing, PF locking, miscalibrated SNA. Use gauge caps, entropy floors, and introspection consistency checks.
+- **Compute**: PF adds memory/compute; choose modest `J,k,K` first.
 ---
+## 🚀 Quickstart (concept reference)
+> This repo is a **spec**. If you adapt an existing LM, expose PF/IME/LCA as side modules.
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# Replace with your adapted checkpoint once available
+MODEL_ID = "Syamsuddin/nafsi-transformers"  # placeholder if weights are published
+tok = AutoTokenizer.from_pretrained("Qwen/Qwen2-1.5B-Instruct")   # base LM example
+lm  = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-1.5B-Instruct")
+# Pseudo: attach PF/IME/LCA sidecar (your implementation)
+# pf = PFModule(J=256, k=16, K=16); ime = IME(rank=16); lca = Lightcone(beta=0.7, gamma=0.3)
+# lm = attach_nafsi(lm, pf=pf, ime=ime, lca=lca, nti=NTI(tau=64, period=16))
+prompt = "Explain the role of a phenomenal field in language generation."
+inputs = tok(prompt, return_tensors="pt")
+out = lm.generate(**inputs, max_length=192)
+print(tok.decode(out[0], skip_special_tokens=True))