🌌 RTH-LM: A Fractal Temporal Convolutional Language Model

RTH Logo

RTH-LM is an experimental 25B parameter language model built on a Fractal Gated Causal Temporal Convolutional Network (TCN). It is a strictly non-Transformer architecture designed for linear-time inference and extreme compute efficiency.

πŸ’Ž Quantization & Efficiency

This repository includes the 2-bit quantized variant (zeta25b_2bit.qulp), demonstrating the architecture's extreme resilience to low-bit serialization. The 120B variant is projected to fit within a single 80GB GPU.

πŸš€ Key Technical Highlights

  • Architecture: Fractal Gated Causal TCN (No-Attention).
  • Modularity: Separated Genome (frozen core) and Soul (trainable adapters).
  • Efficiency: Linear-time inference in sequence length; O(1) state memory during streaming.
  • 2-bit Ready: Designed for ultra-low precision quantization (evaluated 120B variant fits on a single 80GB GPU).

πŸ“„ Official Paper & Citation

The full technical paper is available on Zenodo: Read the Paper on Zenodo (DOI: 10.5281/zenodo.18622610)

@techreport{deluca2026rthlm,
  author = {De Luca, Christian Quintino},
  title = {RTH-LM: A Fractal Temporal Convolutional Language Model},
  institution = {RTH Italia (Research & Technology Hub)},
  year = {2026},
  url = {https://github.com/rthgit/ZetaGrid},
  doi = {10.5281/zenodo.18622610}
}

πŸ“ˆ Training Evidence

  • Dataset: 1.5GB curated scientific/narrative mix.
  • Step: 15,000
  • Training Loss: β‰ˆ 1.0
  • Perplexity: β‰ˆ 2.8
  • Hardware: Single NVIDIA A40 (24h loop).

πŸ› οΈ How to Run

RTH-LM uses a custom inference engine. You can run it using the provided ZETAGRID_INFERENCE.py script.

1. Requirements

pip install torch numpy

2. Loading the Model (Python)

# Run interactive inference
python ZETAGRID_INFERENCE.py

πŸ¦™ 3. Ollama & Native Inference (Beta)

RTH-LM now supports native GGUF serialization and can be integrated into the Ollama ecosystem via our custom TCN kernels.

  • Model File: rth_lm_25b_v1.gguf (15.6 GB - Native binary)
  • Setup Guide: OLLAMA_PATCH_GUIDE.md
  • C++ Kernels: rth_tcn_ops.cpp / .h (Custom kernels for llama.cpp)

To run, use the provided Modelfile_RTH-LM:

ollama create rth-lm -f Modelfile_RTH-LM
ollama run rth-lm

Note: This requires applying the provided source patch to your Ollama/llama.cpp build.



πŸ“œ License


πŸ›°οΈ Roadmap & Vision

  • Scale: Scaling to 120B and 1T variants.
  • Infinite Context: Testing Genome-tiling for 256k+ sequence lengths.
  • Domain Specialization: Release of specialized "Souls" for coding and legal analysis.

Join the Discussion: Head over to the Community tab to share your feedback!

Downloads last month
42
GGUF
Model size
7B params
Architecture
rth_tcn
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for RthItalia/Rth-lm-25b

Unable to build the model tree, the base model loops to the model itself. Learn more.

Space using RthItalia/Rth-lm-25b 1