Instructions to use RthItalia/Rth-lm-code-25b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RthItalia/Rth-lm-code-25b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="RthItalia/Rth-lm-code-25b",
	filename="rth_lm_25b_code.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use RthItalia/Rth-lm-code-25b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RthItalia/Rth-lm-code-25b
# Run inference directly in the terminal:
llama-cli -hf RthItalia/Rth-lm-code-25b

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RthItalia/Rth-lm-code-25b
# Run inference directly in the terminal:
llama-cli -hf RthItalia/Rth-lm-code-25b

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RthItalia/Rth-lm-code-25b
# Run inference directly in the terminal:
./llama-cli -hf RthItalia/Rth-lm-code-25b

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RthItalia/Rth-lm-code-25b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RthItalia/Rth-lm-code-25b

Use Docker

docker model run hf.co/RthItalia/Rth-lm-code-25b

LM Studio
Jan

vLLM

How to use RthItalia/Rth-lm-code-25b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RthItalia/Rth-lm-code-25b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RthItalia/Rth-lm-code-25b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/RthItalia/Rth-lm-code-25b

Ollama
How to use RthItalia/Rth-lm-code-25b with Ollama:
```
ollama run hf.co/RthItalia/Rth-lm-code-25b
```

Unsloth Studio new

How to use RthItalia/Rth-lm-code-25b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RthItalia/Rth-lm-code-25b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RthItalia/Rth-lm-code-25b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RthItalia/Rth-lm-code-25b to start chatting

Docker Model Runner
How to use RthItalia/Rth-lm-code-25b with Docker Model Runner:
```
docker model run hf.co/RthItalia/Rth-lm-code-25b
```

Lemonade

How to use RthItalia/Rth-lm-code-25b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull RthItalia/Rth-lm-code-25b

Run and chat with the model

lemonade run user.Rth-lm-code-25b-{{QUANT_TAG}}

List all available models

lemonade list

RthItalia commited on 28 days ago

Commit

ddaa4ab

verified ·

1 Parent(s): d994435

Update README.md

Browse files

Files changed (1) hide show

README.md +64 -121

README.md CHANGED Viewed

@@ -1,161 +1,104 @@
 ---
 license: cc-by-nc-4.0
-language:
-- en
-- it
-- py
-- js
-- cpp
 tags:
-- non-transformer
 - tcn
 - fractal
-- lora
-- genome
-- rth-code
-- zetagrid
-pipeline_tag: text-generation
 ---
-# 💻 RTH-Code 25B — Code Specialist Soul
-> **"L'intelligenza è nell'architettura, non nelle GPU."**
-> Questa è la **Soul Specialista per il Codice** dell'ecosistema RTH-LM (V4 Architecture).
-> Stesso Genome (7B) di base, ma con una "anima" addestrata per programmare (basata su V4 Expanded).
-⚠️ **PROOF OF CONCEPT** ⚠️
-Questa è una versione **BASE** creata per dimostrare l'efficienza scalare dell'architettura RTH-LM.
-- **Tempo di Training:** Solo **8 ore** su singola A40.
-- **Dataset:** Solo **5GB** di codice misto (Python, JS, C++, Go).
-- **Obiettivo:** Dimostrare che un Genome congelato può apprendere skills verticali complesse in tempi record.
----
-## ⚡ Che cos'è?
-**RTH-Code 25B** non è un modello a sé stante. È una **Soul intercambiabile**.
-Invece di scaricare un modello da 30GB per ogni task, mantieni il **Genome congelato (7B)** e cambi solo la Soul (**~3.8GB**).
-Questa Soul è stata addestrata specificamente su:
-- **Python** (Data Science, Backend, Torch)
-- **JavaScript/TypeScript** (React, Node)
-- **C/C++** (Systems programming)
-- **Rust/Go**
-```mermaid
-graph TD
-    G["Genome 7B<br/>(Frozen Core)"]
-    G --> SC["🔹 Soul CODE<br/>Specialista V4 (25B)"]
-    G --> SG["Soul Generalista<br/>Chat & Knowledge V4"]
-    G --> SL["Soul Legal/Medical<br/>(Future)"]
-```
-Basta **swappare** i file `.pt` (o usare il GGUF unificato) e il tuo modello passa da "filosofo" a "senior engineer" in millisecondi.
 ---
-## 📊 Specifiche Tecniche
-| **Feature** | **Dettaglio** |
-|---|---|
-| **Architettura** | Fractal Gated Causal TCN (No Attention) - **V4 Enhanced** |
-| **Parametri Totali** | **25B** (Genome + Soul V4 Expanded) |
-| **Dimensione Soul** | **~3.8GB** (LoRA Rank 512, ~950M params) |
-| **Dataset Training** | **5GB** (Misto: Python, JS, C++, Go) |
-| **Tempo Training** | **8 ORE** (Singola Epoch) ⏱️ |
-| **Contesto** | 2048+ (Teoricamente infinito grazie a TCN) |
-| **Loss Finale** | **1.20** ✅ |
-| **Hardware** | Addestrato su singola NVIDIA A40 (48GB) |
 ---
-## 🛠️ Quickstart
-### Opzione 1: GGUF (Consigliata per Ollama/llama.cpp)
-Scarica `rth_lm_25b_code.gguf` da questo repo.
 ```bash
-# Esegui con llama.cpp
-./llama-cli -m rth_lm_25b_code.gguf -p "def fibonacci(n):" -n 200
-# Oppure con Ollama (crea Modelfile)
-# FROM ./rth_lm_25b_code.gguf
-# SYSTEM "You are an expert coding assistant."
 ```
-### Opzione 2: Python (Original PyTorch)
-Se hai già il repo [ZetaGrid](https://github.com/rthgit/ZetaGrid):
 ```python
-from ZETAGRID_INFERENCE import ZetaGrid25B
-# Carica il Genome base
-model = ZetaGrid25B("zetagrid_25b_production.npy")
-# Inserisci la Soul del Codice
-model.load_soul("zeta25b_code_FINAL.pt")
-print(model.generate("def quicksort(arr):"))
 ```
----
-## 🧪 Performance & Capability
-RTH-Code eccelle in:
-1. **Code Completion**: Autocompletamento intelligente di funzioni e classi.
-2. **Refactoring**: Riscrittura di codice legacy in clean code.
-3. **Docstrings**: Generazione automatica di documentazione.
-4. **Unit Tests**: Scrittura di test `pytest`/`unittest`.
-*Nota: Essendo un'architettura No-Attention (TCN), ha un overhead di inferenza bassissimo e scala linearmente O(N) con la lunghezza del contesto.*
 ---
-## 📜 Licenza & Uso Commerciale ⚠️
-> **ATTENZIONE: QUESTO MODELLO NON È OPEN SOURCE COMPLETO.**
-> È rilasciato sotto licenza **CC BY-NC 4.0 (Creative Commons Non-Commercial)**.
-### ✅ Cosa PUOI fare (Gratis):
-- Ricerca accademica e personale.
-- Test e valutazione locale.
-- Uso hobbyistico e no-profit.
-- Condividere i risultati citando l'autore.
-### ❌ Cosa NON PUOI fare (Senza Licenza Commerciale):
-- **Usare il modello in azienda** per qualsiasi scopo (interno o esterno).
-- Integrare il modello in prodotti o servizi a pagamento.
-- Offrire API o servizi cloud basati su questo modello.
-- Qualsiasi attività che generi revenue diretta o indiretta.
-📞 **PER USO COMMERCIALE (Enterprise / Startup):**
-Devi ottenere una licenza commerciale da **RTH Italia**.
-Contatto diretto: [**info@rthitalia.com**](mailto:info@rthitalia.com)
 ---
-## 📄 Citazione
-Prodotto da **RTH Italia** (Research & Technology Hub).
-Autore: *Christian Quintino De Luca*.
-Per citare il paper originale:
-📖 **[RTH-LM: A Fractal Temporal Convolutional Language Model](https://zenodo.org/records/18622610)**
-```bibtex
-@techreport{deluca2026rthlm,
-  author      = {De Luca, Christian Quintino},
-  title       = {RTH-LM: A Fractal Temporal Convolutional Language Model},
-  institution = {RTH Italia (Research & Technology Hub)},
-  year        = {2026},
-  url         = {https://github.com/rthgit/ZetaGrid},
-  doi         = {10.5281/zenodo.18622610},
-  note        = {Non-commercial license. Contact RTH Italia for commercial use.}
-}
-```
 ---
-*Costruito per dimostrare che l'efficienza batte la forza bruta.*
-**RTH Italia**

 ---
+language: en
 license: cc-by-nc-4.0
 tags:
+- zetagrid
+- cpu-da
 - tcn
 - fractal
+- 25b
+datasets:
+- custom
+metrics:
+- loss
 ---
+# 📇 Model Card: RTH-LM (25B)
+## Model Details
+- **Name:** RTH-LM (25B)
+- **Architecture:** Fractal Gated Causal TCN (Temporal Convolutional Network)
+- **Parameters:** 7B (Physical) / 25B (Effective Fractal Capacity)
+- **Author:** Christian Quintino De Luca (RTH Italia)
+- **Release Date:** February 2026
+- **License:** CC BY-NC 4.0 (Research) / Commercial (Enterprise)
+- **Paper (Figshare):** https://doi.org/10.6084/m9.figshare.31376560
+RTH-LM (25B) is a **Fractal TCN (Temporal Convolutional Network)** Language Model, designed for high-efficiency inference on CPU/Consumer Hardware and massive scalability on GPUs.
+Unlike Traditional Transformers, ZetaGrid uses a **Gated Causal TCN backbone** with **Fractal Scaling**, allowing it to model long-range dependencies with significantly lower memory overhead during inference.
 ---
+## 📊  Model Specs
+| Feature | Specification |
+| :--- | :--- |
+| **Parameters** | 25 Billion (25B) |
+| **Architecture** | Fractal Gated TCN (Non-Transformer) |
+| **Layers** | 32 (Phase 2) |
+| **Context Window** | 256 - 1024 (Fractal Expansion Capable) |
+| **Training Data** | 1.48 GB Cleaned Text (Wiki/Books) |
+| **Final Loss** | **1.0675** (Phase 2) |
+| **Quantization** | QULP 2-bit (Supported) |
 ---
+## 🚀 Usage (Inference)
+### Prerequisites
+You need the `cpu_da` framework or the Python inference script.
 ```bash
+# Clone the repo
+git clone https://github.com/rth-italia/cpu-da
+cd cpu-da
 ```
+### Running the Model (Python)
+Ensure you have `zeta25b_step15000.pt` (Weights) and `zetagrid_25b_production.npy` (Genome).
 ```python
+import torch
+from ZETAGRID_INFERENCE import load_model, generate
+# Load 25B Model
+model = load_model("zeta25b_step15000.pt", genome="zetagrid_25b_production.npy")
+# Generate
+text = generate(model, "The future of AI is")
+print(text)
 ```
+### QULP 2-bit Inference (Ultra-Low Memory)
+To run on consumer CPUs with <2GB RAM:
+```bash
+python QULP_INFERENCE.py --model zeta25b_2bit.qulp
+```
 ---
+## 🧬 Architecture: The "Fractal Soul"
+ZetaGrid is **NOT** a Transformer. It is a TCN-based organism.
+- **Genome:** A fixed 7GB "DNA" bank of weights (`zetagrid_25b_production.npy`).
+- **Phenotype:** The model layers are "grown" from this genome on the fly.
+- **Training:** Only the "Soul" (LoRA Adapters + Norms) is trained (~300MB), making the model extremely portable.
+- **Fractal Scaling:** The 25B model can be fractally expanded to 50B, 100B+ by duplicating layers and adding self-linear noise.
 ---
+## 📈 Performance
+- **Phase 1 (Evolution):** 200 Generations of Genome Optimization.
+- **Phase 2 (Gradient):** 15,000 Steps of TCN+LoRA Fine-Tuning.
+- **Convergence:** Beat target loss of 1.5, achieving **1.0675**.
+- **Capabilities:** Narrative coherence, English syntax mastery, abstract reasoning.
 ---
+## 📜 License
+CC BY-NC 4.0 (Creative Commons Non-Commercial) for Research.
+**Commercial Use:** Requires a license from **RTH Italia** (Cpu-DA Project).
+For inquiries: info@rthitalia.com