Text Generation
GGUF
rth_tcn
code-generation
non-transformer
tcn
fractal
lora
genome
rth-code
zetagrid
Instructions to use RthItalia/Rth-lm-code-25b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use RthItalia/Rth-lm-code-25b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="RthItalia/Rth-lm-code-25b", filename="rth_lm_25b_code.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use RthItalia/Rth-lm-code-25b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RthItalia/Rth-lm-code-25b # Run inference directly in the terminal: llama-cli -hf RthItalia/Rth-lm-code-25b
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RthItalia/Rth-lm-code-25b # Run inference directly in the terminal: llama-cli -hf RthItalia/Rth-lm-code-25b
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf RthItalia/Rth-lm-code-25b # Run inference directly in the terminal: ./llama-cli -hf RthItalia/Rth-lm-code-25b
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf RthItalia/Rth-lm-code-25b # Run inference directly in the terminal: ./build/bin/llama-cli -hf RthItalia/Rth-lm-code-25b
Use Docker
docker model run hf.co/RthItalia/Rth-lm-code-25b
- LM Studio
- Jan
- vLLM
How to use RthItalia/Rth-lm-code-25b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "RthItalia/Rth-lm-code-25b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "RthItalia/Rth-lm-code-25b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/RthItalia/Rth-lm-code-25b
- Ollama
How to use RthItalia/Rth-lm-code-25b with Ollama:
ollama run hf.co/RthItalia/Rth-lm-code-25b
- Unsloth Studio new
How to use RthItalia/Rth-lm-code-25b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RthItalia/Rth-lm-code-25b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RthItalia/Rth-lm-code-25b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for RthItalia/Rth-lm-code-25b to start chatting
- Docker Model Runner
How to use RthItalia/Rth-lm-code-25b with Docker Model Runner:
docker model run hf.co/RthItalia/Rth-lm-code-25b
- Lemonade
How to use RthItalia/Rth-lm-code-25b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull RthItalia/Rth-lm-code-25b
Run and chat with the model
lemonade run user.Rth-lm-code-25b-{{QUANT_TAG}}List all available models
lemonade list
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,161 +1,104 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
-
- it
|
| 6 |
-
- py
|
| 7 |
-
- js
|
| 8 |
-
- cpp
|
| 9 |
tags:
|
| 10 |
-
-
|
|
|
|
| 11 |
- tcn
|
| 12 |
- fractal
|
| 13 |
-
-
|
| 14 |
-
|
| 15 |
-
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
---
|
| 19 |
|
| 20 |
-
#
|
| 21 |
-
|
| 22 |
-
> **"L'intelligenza Γ¨ nell'architettura, non nelle GPU."**
|
| 23 |
-
> Questa Γ¨ la **Soul Specialista per il Codice** dell'ecosistema RTH-LM (V4 Architecture).
|
| 24 |
-
> Stesso Genome (7B) di base, ma con una "anima" addestrata per programmare (basata su V4 Expanded).
|
| 25 |
-
|
| 26 |
-
β οΈ **PROOF OF CONCEPT** β οΈ
|
| 27 |
-
Questa Γ¨ una versione **BASE** creata per dimostrare l'efficienza scalare dell'architettura RTH-LM.
|
| 28 |
-
- **Tempo di Training:** Solo **8 ore** su singola A40.
|
| 29 |
-
- **Dataset:** Solo **5GB** di codice misto (Python, JS, C++, Go).
|
| 30 |
-
- **Obiettivo:** Dimostrare che un Genome congelato puΓ² apprendere skills verticali complesse in tempi record.
|
| 31 |
-
|
| 32 |
-
---
|
| 33 |
|
| 34 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
|
| 37 |
-
Invece di scaricare un modello da 30GB per ogni task, mantieni il **Genome congelato (7B)** e cambi solo la Soul (**~3.8GB**).
|
| 38 |
|
| 39 |
-
|
| 40 |
-
- **Python** (Data Science, Backend, Torch)
|
| 41 |
-
- **JavaScript/TypeScript** (React, Node)
|
| 42 |
-
- **C/C++** (Systems programming)
|
| 43 |
-
- **Rust/Go**
|
| 44 |
-
|
| 45 |
-
```mermaid
|
| 46 |
-
graph TD
|
| 47 |
-
G["Genome 7B<br/>(Frozen Core)"]
|
| 48 |
-
G --> SC["πΉ Soul CODE<br/>Specialista V4 (25B)"]
|
| 49 |
-
G --> SG["Soul Generalista<br/>Chat & Knowledge V4"]
|
| 50 |
-
G --> SL["Soul Legal/Medical<br/>(Future)"]
|
| 51 |
-
```
|
| 52 |
-
|
| 53 |
-
Basta **swappare** i file `.pt` (o usare il GGUF unificato) e il tuo modello passa da "filosofo" a "senior engineer" in millisecondi.
|
| 54 |
|
| 55 |
---
|
| 56 |
|
| 57 |
-
## π
|
| 58 |
|
| 59 |
-
|
|
| 60 |
-
|---|---|
|
| 61 |
-
| **
|
| 62 |
-
| **
|
| 63 |
-
| **
|
| 64 |
-
| **
|
| 65 |
-
| **
|
| 66 |
-
| **
|
| 67 |
-
| **
|
| 68 |
-
| **Hardware** | Addestrato su singola NVIDIA A40 (48GB) |
|
| 69 |
|
| 70 |
---
|
| 71 |
|
| 72 |
-
##
|
| 73 |
|
| 74 |
-
###
|
| 75 |
-
|
| 76 |
|
| 77 |
```bash
|
| 78 |
-
#
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
# Oppure con Ollama (crea Modelfile)
|
| 82 |
-
# FROM ./rth_lm_25b_code.gguf
|
| 83 |
-
# SYSTEM "You are an expert coding assistant."
|
| 84 |
```
|
| 85 |
|
| 86 |
-
###
|
| 87 |
-
|
| 88 |
|
| 89 |
```python
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
# Carica il Genome base
|
| 93 |
-
model = ZetaGrid25B("zetagrid_25b_production.npy")
|
| 94 |
|
| 95 |
-
#
|
| 96 |
-
model
|
| 97 |
|
| 98 |
-
|
|
|
|
|
|
|
| 99 |
```
|
| 100 |
|
| 101 |
-
--
|
| 102 |
-
|
| 103 |
-
## π§ͺ Performance & Capability
|
| 104 |
-
|
| 105 |
-
RTH-Code eccelle in:
|
| 106 |
-
1. **Code Completion**: Autocompletamento intelligente di funzioni e classi.
|
| 107 |
-
2. **Refactoring**: Riscrittura di codice legacy in clean code.
|
| 108 |
-
3. **Docstrings**: Generazione automatica di documentazione.
|
| 109 |
-
4. **Unit Tests**: Scrittura di test `pytest`/`unittest`.
|
| 110 |
|
| 111 |
-
|
|
|
|
|
|
|
| 112 |
|
| 113 |
---
|
| 114 |
|
| 115 |
-
##
|
| 116 |
-
|
| 117 |
-
> **ATTENZIONE: QUESTO MODELLO NON Γ OPEN SOURCE COMPLETO.**
|
| 118 |
-
> Γ rilasciato sotto licenza **CC BY-NC 4.0 (Creative Commons Non-Commercial)**.
|
| 119 |
|
| 120 |
-
|
| 121 |
-
-
|
| 122 |
-
-
|
| 123 |
-
-
|
| 124 |
-
-
|
| 125 |
-
|
| 126 |
-
### β Cosa NON PUOI fare (Senza Licenza Commerciale):
|
| 127 |
-
- **Usare il modello in azienda** per qualsiasi scopo (interno o esterno).
|
| 128 |
-
- Integrare il modello in prodotti o servizi a pagamento.
|
| 129 |
-
- Offrire API o servizi cloud basati su questo modello.
|
| 130 |
-
- Qualsiasi attivitΓ che generi revenue diretta o indiretta.
|
| 131 |
-
|
| 132 |
-
π **PER USO COMMERCIALE (Enterprise / Startup):**
|
| 133 |
-
Devi ottenere una licenza commerciale da **RTH Italia**.
|
| 134 |
-
Contatto diretto: [**info@rthitalia.com**](mailto:info@rthitalia.com)
|
| 135 |
|
| 136 |
---
|
| 137 |
|
| 138 |
-
##
|
| 139 |
-
|
| 140 |
-
Prodotto da **RTH Italia** (Research & Technology Hub).
|
| 141 |
-
Autore: *Christian Quintino De Luca*.
|
| 142 |
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
@techreport{deluca2026rthlm,
|
| 148 |
-
author = {De Luca, Christian Quintino},
|
| 149 |
-
title = {RTH-LM: A Fractal Temporal Convolutional Language Model},
|
| 150 |
-
institution = {RTH Italia (Research & Technology Hub)},
|
| 151 |
-
year = {2026},
|
| 152 |
-
url = {https://github.com/rthgit/ZetaGrid},
|
| 153 |
-
doi = {10.5281/zenodo.18622610},
|
| 154 |
-
note = {Non-commercial license. Contact RTH Italia for commercial use.}
|
| 155 |
-
}
|
| 156 |
-
```
|
| 157 |
|
| 158 |
---
|
| 159 |
|
| 160 |
-
|
| 161 |
-
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
language: en
|
| 3 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
tags:
|
| 5 |
+
- zetagrid
|
| 6 |
+
- cpu-da
|
| 7 |
- tcn
|
| 8 |
- fractal
|
| 9 |
+
- 25b
|
| 10 |
+
datasets:
|
| 11 |
+
- custom
|
| 12 |
+
metrics:
|
| 13 |
+
- loss
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# π Model Card: RTH-LM (25B)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
## Model Details
|
| 19 |
+
- **Name:** RTH-LM (25B)
|
| 20 |
+
- **Architecture:** Fractal Gated Causal TCN (Temporal Convolutional Network)
|
| 21 |
+
- **Parameters:** 7B (Physical) / 25B (Effective Fractal Capacity)
|
| 22 |
+
- **Author:** Christian Quintino De Luca (RTH Italia)
|
| 23 |
+
- **Release Date:** February 2026
|
| 24 |
+
- **License:** CC BY-NC 4.0 (Research) / Commercial (Enterprise)
|
| 25 |
+
- **Paper (Figshare):** https://doi.org/10.6084/m9.figshare.31376560
|
| 26 |
|
| 27 |
+
RTH-LM (25B) is a **Fractal TCN (Temporal Convolutional Network)** Language Model, designed for high-efficiency inference on CPU/Consumer Hardware and massive scalability on GPUs.
|
|
|
|
| 28 |
|
| 29 |
+
Unlike Traditional Transformers, ZetaGrid uses a **Gated Causal TCN backbone** with **Fractal Scaling**, allowing it to model long-range dependencies with significantly lower memory overhead during inference.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
---
|
| 32 |
|
| 33 |
+
## π Model Specs
|
| 34 |
|
| 35 |
+
| Feature | Specification |
|
| 36 |
+
| :--- | :--- |
|
| 37 |
+
| **Parameters** | 25 Billion (25B) |
|
| 38 |
+
| **Architecture** | Fractal Gated TCN (Non-Transformer) |
|
| 39 |
+
| **Layers** | 32 (Phase 2) |
|
| 40 |
+
| **Context Window** | 256 - 1024 (Fractal Expansion Capable) |
|
| 41 |
+
| **Training Data** | 1.48 GB Cleaned Text (Wiki/Books) |
|
| 42 |
+
| **Final Loss** | **1.0675** (Phase 2) |
|
| 43 |
+
| **Quantization** | QULP 2-bit (Supported) |
|
|
|
|
| 44 |
|
| 45 |
---
|
| 46 |
|
| 47 |
+
## π Usage (Inference)
|
| 48 |
|
| 49 |
+
### Prerequisites
|
| 50 |
+
You need the `cpu_da` framework or the Python inference script.
|
| 51 |
|
| 52 |
```bash
|
| 53 |
+
# Clone the repo
|
| 54 |
+
git clone https://github.com/rth-italia/cpu-da
|
| 55 |
+
cd cpu-da
|
|
|
|
|
|
|
|
|
|
| 56 |
```
|
| 57 |
|
| 58 |
+
### Running the Model (Python)
|
| 59 |
+
Ensure you have `zeta25b_step15000.pt` (Weights) and `zetagrid_25b_production.npy` (Genome).
|
| 60 |
|
| 61 |
```python
|
| 62 |
+
import torch
|
| 63 |
+
from ZETAGRID_INFERENCE import load_model, generate
|
|
|
|
|
|
|
| 64 |
|
| 65 |
+
# Load 25B Model
|
| 66 |
+
model = load_model("zeta25b_step15000.pt", genome="zetagrid_25b_production.npy")
|
| 67 |
|
| 68 |
+
# Generate
|
| 69 |
+
text = generate(model, "The future of AI is")
|
| 70 |
+
print(text)
|
| 71 |
```
|
| 72 |
|
| 73 |
+
### QULP 2-bit Inference (Ultra-Low Memory)
|
| 74 |
+
To run on consumer CPUs with <2GB RAM:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
+
```bash
|
| 77 |
+
python QULP_INFERENCE.py --model zeta25b_2bit.qulp
|
| 78 |
+
```
|
| 79 |
|
| 80 |
---
|
| 81 |
|
| 82 |
+
## 𧬠Architecture: The "Fractal Soul"
|
|
|
|
|
|
|
|
|
|
| 83 |
|
| 84 |
+
ZetaGrid is **NOT** a Transformer. It is a TCN-based organism.
|
| 85 |
+
- **Genome:** A fixed 7GB "DNA" bank of weights (`zetagrid_25b_production.npy`).
|
| 86 |
+
- **Phenotype:** The model layers are "grown" from this genome on the fly.
|
| 87 |
+
- **Training:** Only the "Soul" (LoRA Adapters + Norms) is trained (~300MB), making the model extremely portable.
|
| 88 |
+
- **Fractal Scaling:** The 25B model can be fractally expanded to 50B, 100B+ by duplicating layers and adding self-linear noise.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
---
|
| 91 |
|
| 92 |
+
## π Performance
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
+
- **Phase 1 (Evolution):** 200 Generations of Genome Optimization.
|
| 95 |
+
- **Phase 2 (Gradient):** 15,000 Steps of TCN+LoRA Fine-Tuning.
|
| 96 |
+
- **Convergence:** Beat target loss of 1.5, achieving **1.0675**.
|
| 97 |
+
- **Capabilities:** Narrative coherence, English syntax mastery, abstract reasoning.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
|
| 99 |
---
|
| 100 |
|
| 101 |
+
## π License
|
| 102 |
+
CC BY-NC 4.0 (Creative Commons Non-Commercial) for Research.
|
| 103 |
+
**Commercial Use:** Requires a license from **RTH Italia** (Cpu-DA Project).
|
| 104 |
+
For inquiries: info@rthitalia.com
|