π RTH-LM: A Fractal Temporal Convolutional Language Model
RTH-LM is an experimental 25B parameter language model built on a Fractal Gated Causal Temporal Convolutional Network (TCN). It is a strictly non-Transformer architecture designed for linear-time inference and extreme compute efficiency.
π Quantization & Efficiency
This repository includes the 2-bit quantized variant (zeta25b_2bit.qulp), demonstrating the architecture's extreme resilience to low-bit serialization. The 120B variant is projected to fit within a single 80GB GPU.
π Key Technical Highlights
- Architecture: Fractal Gated Causal TCN (No-Attention).
- Modularity: Separated Genome (frozen core) and Soul (trainable adapters).
- Efficiency: Linear-time inference in sequence length; O(1) state memory during streaming.
- 2-bit Ready: Designed for ultra-low precision quantization (evaluated 120B variant fits on a single 80GB GPU).
π Official Paper & Citation
The full technical paper is available on Zenodo: Read the Paper on Zenodo (DOI: 10.5281/zenodo.18622610)
@techreport{deluca2026rthlm,
author = {De Luca, Christian Quintino},
title = {RTH-LM: A Fractal Temporal Convolutional Language Model},
institution = {RTH Italia (Research & Technology Hub)},
year = {2026},
url = {https://github.com/rthgit/ZetaGrid},
doi = {10.5281/zenodo.18622610}
}
π Training Evidence
- Dataset: 1.5GB curated scientific/narrative mix.
- Step: 15,000
- Training Loss: β 1.0
- Perplexity: β 2.8
- Hardware: Single NVIDIA A40 (24h loop).
π οΈ How to Run
RTH-LM uses a custom inference engine. You can run it using the provided ZETAGRID_INFERENCE.py script.
1. Requirements
pip install torch numpy
2. Loading the Model (Python)
# Run interactive inference
python ZETAGRID_INFERENCE.py
π¦ 3. Ollama & Native Inference (Beta)
RTH-LM now supports native GGUF serialization and can be integrated into the Ollama ecosystem via our custom TCN kernels.
- Model File:
rth_lm_25b_v1.gguf(15.6 GB - Native binary) - Setup Guide: OLLAMA_PATCH_GUIDE.md
- C++ Kernels:
rth_tcn_ops.cpp/.h(Custom kernels forllama.cpp)
To run, use the provided Modelfile_RTH-LM:
ollama create rth-lm -f Modelfile_RTH-LM
ollama run rth-lm
Note: This requires applying the provided source patch to your Ollama/llama.cpp build.
π License
- Research & Non-Commercial: CC BY-NC 4.0
- Commercial Use: Requires a paid license from RTH Italia.
- Contact: info@rthitalia.com
π°οΈ Roadmap & Vision
- Scale: Scaling to 120B and 1T variants.
- Infinite Context: Testing Genome-tiling for 256k+ sequence lengths.
- Domain Specialization: Release of specialized "Souls" for coding and legal analysis.
Join the Discussion: Head over to the Community tab to share your feedback!
- Downloads last month
- 42
We're not able to determine the quantization variants.
Model tree for RthItalia/Rth-lm-25b
Unable to build the model tree, the base model loops to the model itself. Learn more.