| --- |
| license: apache-2.0 |
| datasets: |
| - karpathy/fineweb-edu-100B-gpt2-token-shards |
| language: |
| - en |
| - ja |
| - es |
| metrics: |
| - accuracy |
| base_model: |
| - Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled |
| new_version: Drjkedwards/Recursive-Transformer-Model |
| library_name: transformers |
| --- |
| **# Model Card for Recursive Transformer Model (RTM) / ERS PyTorch Implementation** |
|
|
| <!-- Provide a quick summary of what the model is/does. --> |
|
|
| This is the official PyTorch implementation of the **Recursive Transformer Model (RTM)**, a novel architecture that augments standard Transformer-based systems with **recursive memory reconsideration**, **temporal decay mechanisms**, and **Persistent Memory Logic Loops (PMLL)**. It addresses "nostalgic incorrectness" (the tendency of stateless AI to retain outdated or contradictory beliefs) by maintaining coherent, self-correcting state across inference sessions. The production-grade reference implementation is the **Enhanced Reconsideration System (ERS)** library, which includes PyTorch components for embeddings, lattice-based tensor routing, multi-petal attention, and knowledge-graph integration.<grok:render card_id="0ccb01" card_type="citation_card" type="render_inline_citation"><argument name="citation_id">1</argument></grok:render><grok:render card_id="f06887" card_type="citation_card" type="render_inline_citation"><argument name="citation_id">20</argument></grok:render> |
|
|
| The Kaggle-hosted PyTorch model provides the core RTM/ERS runtime (including `PMLLLattice`, `MemoryBlock`, temporal decay, consensus, and contradiction detection) for integration with any LLM/transformer stack. It is **not** a standalone pretrained language model but a stateful memory layer/framework. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| ### Model Description |
|
|
| The Recursive Transformer Model (RTM) extends the classic Transformer architecture with: |
| - **Adaptive temporal decay** on memory confidence. |
| - **Multi-dimensional consensus** via embedding-space geometry and knowledge graphs. |
| - **Vector-based contradiction detection** with integrated rewrite capabilities. |
| - **Persistent Memory Logic Loops (PMLL)**: a lattice-based DAG for compressed, low-rank tensor routing and recursive passes over memory slots. |
|
|
| Key innovations solve the stateless limitation of standard transformers by enabling iterative, multi-pass reconsideration of beliefs during inference. The Enhanced Reconsideration System (ERS) is the complete, production-ready Python/PyTorch reference implementation.<grok:render card_id="c29ff8" card_type="citation_card" type="render_inline_citation"><argument name="citation_id">17</argument></grok:render><grok:render card_id="7edf74" card_type="citation_card" type="render_inline_citation"><argument name="citation_id">43</argument></grok:render> |
|
|
| - **Developed by:** Dr. Josef “Q.” Edwards (Josef Kurk Edwards / josefedwards / drQedwards), University of Colorado Boulder |
| - **Funded by [optional]:** U.S. Department of Defense (funder identifier 100000005) |
| - **Shared by [optional]:** Josef Edwards (via Kaggle and GitHub) |
| - **Model type:** Recursive Transformer extension / stateful memory framework (PMLL + ERS) |
| - **Language(s) (NLP):** Language-agnostic (works with any text/embedding-based input; primarily demonstrated on English factual/knowledge-base tasks) |
| - **License:** MIT (see ERS repository) |
| - **Finetuned from model [optional]:** Not finetuned; augments any base Transformer (integrates with sentence-transformers, LangChain, etc.) |
|
|
| ### Model Sources [optional] |
|
|
| - **Repository:** [Kaggle Model](https://www.kaggle.com/models/josefedwards/recursive-transformer-model/pyTorch) • [GitHub ERS (primary implementation)](https://github.com/drqedwards/ERS) • [GitHub PMLL_archive](https://github.com/drqedwards/PMLL_archive) |
| - **Paper:** Edwards, J. K. (2025). *The Recursive Transformer Model: Architecture, Theory, and Implementation with Persistent Memory Logic Loops*. TechRxiv. DOI: 10.36227/techrxiv.176118936.69886233/v1 (October 23, 2025)<grok:render card_id="a04172" card_type="citation_card" type="render_inline_citation"><argument name="citation_id">20</argument></grok:render> |
| - **Demo [optional]:** See ERS README quick-start example (async memory reconsideration loop) |
|
|
| ## Uses |
|
|
| ### Direct Use |
|
|
| Use as a drop-in memory layer for any Transformer/LLM pipeline: |
| - Add factual or conversational memories. |
| - Run recursive reconsideration loops (temporal decay → consensus → contradiction detection → optional rewrite). |
| - Persist state across sessions via JSON + safetensors. |
|
|
| Ideal for agents, chatbots, or knowledge-intensive applications that require long-term coherence. |
|
|
| ### Downstream Use [optional] |
|
|
| - Integrate with LangChain agents or any LLM stack via Graphiti/Mem0 knowledge graphs. |
| - Extend base models (e.g., Llama, Mistral) with stateful recursive passes. |
| - Use in production AI systems needing self-correction and belief updating. |
|
|
| ### Out-of-Scope Use |
|
|
| - Not intended as a standalone generative LLM. |
| - Not suitable for real-time low-latency inference without hardware acceleration (multiple recursive passes add compute). |
| - Avoid use in safety-critical systems without additional ethical/guardrail layers (rewrites can be LLM-guided). |
|
|
| ## Bias, Risks, and Limitations |
|
|
| - **Technical limitations:** Recursive loops increase inference-time compute; performance depends on embedding quality and KG backend (Neo4j recommended for Graphiti). |
| - **Sociotechnical risks:** Automated memory rewrites could propagate or amplify biases present in the underlying LLM or knowledge graph. Contradiction detection relies on embedding geometry and may miss subtle nuances. |
| - **Nostalgic incorrectness mitigation:** The core goal is to *reduce* outdated beliefs, but incorrect source data or poor consensus thresholds can still lead to erroneous updates. |
|
|
| ### Recommendations |
|
|
| Users should: |
| - Monitor rewrite logs and confidence deltas. |
| - Use high-quality, verified knowledge graphs. |
| - Apply domain-specific safety policies before committing rewrites. |
| - Test with synthetic contradictory memory scenarios to validate behavior. |
|
|
| ## How to Get Started with the Model |
|
|
| ```python |
| # Via Kaggle (PyTorch model) or direct from ERS GitHub |
| # Install dependencies (from ERS README) |
| # pip install torch sentence-transformers safetensors mem0-ai graphiti-core langchain langchain-community |
| |
| import asyncio |
| from ERS import EnhancedReconsiderationSystem, MemoryBlock, ERSPromise # or load from Kaggle PyTorch weights |
| |
| async def main(): |
| ers = EnhancedReconsiderationSystem() # loads saved state if present |
| |
| await ers.add_memory("Paris is the capital of France") |
| await ers.add_memory("Paris is the largest city in France") # contradictory example |
| |
| await ers.reconsider_deferred() |
| await ers.recursive_loop_check() # performs RTM-style multi-pass reconsideration |
| await ers.close() |
| |
| asyncio.run(main()) |
| ``` |
|
|
| Full usage and configuration in the [ERS GitHub README](https://github.com/drqedwards/ERS). The Kaggle PyTorch model loads the core `PMLLLattice` and related tensors. |
|
|
| ## Training Details |
|
|
| ### Training Data |
|
|
| None (this is an architectural extension/framework, not a pretrained LLM). It operates on top of any Transformer embeddings (e.g., via `sentence-transformers`). Memory content is user-provided or agent-generated. |
|
|
| ### Training Procedure |
|
|
| #### Preprocessing [optional] |
|
|
| Memory blocks are created with embeddings (via sentence-transformers), timestamps, confidence scores, and SHA-256 hashes. Optional KG indexing via Graphiti/Mem0. |
|
|
| #### Training Hyperparameters |
|
|
| - **Training regime:** Not applicable (no end-to-end training). Runtime inference uses PyTorch (fp32/bf16 supported via torch). |
| - Configuration options (RTM integration): `passes: 2`, `early_stop_cosine_delta: 0.002`, `max_rewrites_per_slot: 1`, `decay_alpha: 0.95`, adaptive λ decay rates, similarity threshold τ_sim, etc. (fully configurable in ERS).<grok:render card_id="355dea" card_type="citation_card" type="render_inline_citation"><argument name="citation_id">43</argument></grok:render> |
| |
| #### Speeds, Sizes, Times [optional] |
| |
| Real-time performance demonstrated in ERS (production-grade). Exact throughput depends on hardware, number of recursive passes, and KG backend. Lattice uses low-rank compression for scalability. |
| |
| ## Evaluation |
| |
| ### Testing Data, Factors & Metrics |
| |
| No public benchmark datasets or quantitative results published in the preprint. Evaluation is qualitative/conceptual via synthetic contradictory memory scenarios (e.g., Paris facts example) and convergence metrics (confidence delta, rewrite count, cosine similarity shifts). |
| |
| #### Factors |
| |
| - Memory age, source quality, domain volatility, embedding similarity. |
| |
| #### Metrics |
| |
| - Nostalgic Incorrectness (NI) metric defined in paper. |
| - Consensus score, contradiction score, confidence update delta. |
| |
| ### Results |
| |
| [More Information Needed] — Paper focuses on theoretical framework and architectural feasibility rather than large-scale empirical benchmarks. ERS demonstrates real-time recursive reconsideration. |
| |
| #### Summary |
| |
| The model successfully maintains coherent state and resolves contradictions in controlled memory scenarios. |
| |
| ## Model Examination [optional] |
| |
| Interpretability is built-in: per-pass logs of embedding shifts, confidence changes, rewrite proposals, and KG updates. Visualize memory graph evolution (planned roadmap feature). |
| |
| ## Environmental Impact |
| |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
| |
| - **Hardware Type:** [More Information Needed] (tested on standard CPU/GPU with PyTorch) |
| - **Hours used:** [More Information Needed] |
| - **Cloud Provider:** [More Information Needed] |
| - **Compute Region:** [More Information Needed] |
| - **Carbon Emitted:** [More Information Needed] |
| |
| ## Technical Specifications [optional] |
| |
| ### Model Architecture and Objective |
| |
| - Base: Transformer stack with augmented embedding layer and reconsideration head. |
| - Key equations: temporal decay \( \text{conf}_i(t) = \text{conf}_i(0) \cdot e^{-\lambda_i (t - t_i)} \cdot \dots \), consensus scoring, integrated confidence update, PMLL lattice (DAG with quantization and low-rank compression). |
| - Objective: Stateful, self-correcting memory across sessions. |
| |
| ### Compute Infrastructure |
| |
| #### Hardware |
| |
| Standard PyTorch-compatible (CPU/GPU). |
| |
| #### Software |
| |
| Python 3.8+, PyTorch, sentence-transformers, safetensors, mem0-ai, graphiti-core, LangChain. |
| |
| ## Citation [optional] |
| |
| **BibTeX:** |
| ```bibtex |
| @article{edwards2025recursive, |
| author = {Edwards, Josef Kurk}, |
| title = {The Recursive Transformer Model: Architecture, Theory, and Implementation with Persistent Memory Logic Loops}, |
| journal = {TechRxiv}, |
| year = {2025}, |
| month = {October}, |
| doi = {10.36227/techrxiv.176118936.69886233/v1}, |
| url = {https://www.techrxiv.org/users/856117/articles/1345789} |
| } |
| ``` |
| |
| **APA:** |
| Edwards, J. K. (2025). The Recursive Transformer Model: Architecture, Theory, and Implementation with Persistent Memory Logic Loops. TechRxiv. https://doi.org/10.36227/techrxiv.176118936.69886233/v1 |
| |
| ## Glossary [optional] |
| |
| - **PMLL**: Persistent Memory Logic Loop — lattice-based memory compression and routing. |
| - **ERS**: Enhanced Reconsideration System — production Python/PyTorch library. |
| - **Nostalgic Incorrectness**: Retention of outdated/conflicting beliefs in stateless models. |
| |
| ## More Information [optional] |
| |
| - Full paper and math: TechRxiv preprint. |
| - Live implementation: [ERS GitHub](https://github.com/drqedwards/ERS). |
| - Related work: Hybrid TRM-RTM model, PMLL P=NP proof paper (separate preprint). |
| |
| ## Model Card Authors [optional] |
| |
| Compiled by Dr Q based on public sources from Josef Edwards / Dr. Q. |
| |
| ## Model Card Contact |
| |
| Josef Edwards (Kaggle: josefedwards, GitHub: drqedwards, Email: joed6834@colorado.edu) or open an issue on the ERS repository. |