Instructions to use RthItalia/Rth-lm-code-25b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RthItalia/Rth-lm-code-25b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="RthItalia/Rth-lm-code-25b",
	filename="rth_lm_25b_code.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use RthItalia/Rth-lm-code-25b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RthItalia/Rth-lm-code-25b
# Run inference directly in the terminal:
llama-cli -hf RthItalia/Rth-lm-code-25b

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RthItalia/Rth-lm-code-25b
# Run inference directly in the terminal:
llama-cli -hf RthItalia/Rth-lm-code-25b

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RthItalia/Rth-lm-code-25b
# Run inference directly in the terminal:
./llama-cli -hf RthItalia/Rth-lm-code-25b

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RthItalia/Rth-lm-code-25b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RthItalia/Rth-lm-code-25b

Use Docker

docker model run hf.co/RthItalia/Rth-lm-code-25b

LM Studio
Jan

vLLM

How to use RthItalia/Rth-lm-code-25b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RthItalia/Rth-lm-code-25b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RthItalia/Rth-lm-code-25b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/RthItalia/Rth-lm-code-25b

Ollama
How to use RthItalia/Rth-lm-code-25b with Ollama:
```
ollama run hf.co/RthItalia/Rth-lm-code-25b
```

Unsloth Studio new

How to use RthItalia/Rth-lm-code-25b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RthItalia/Rth-lm-code-25b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RthItalia/Rth-lm-code-25b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RthItalia/Rth-lm-code-25b to start chatting

Docker Model Runner
How to use RthItalia/Rth-lm-code-25b with Docker Model Runner:
```
docker model run hf.co/RthItalia/Rth-lm-code-25b
```

Lemonade

How to use RthItalia/Rth-lm-code-25b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull RthItalia/Rth-lm-code-25b

Run and chat with the model

lemonade run user.Rth-lm-code-25b-{{QUANT_TAG}}

List all available models

lemonade list

RthItalia commited on 26 days ago

Commit

7bb177f

verified ·

1 Parent(s): 75ee6dd

Update README.md

Browse files

Files changed (1) hide show

README.md +140 -67

README.md CHANGED Viewed

@@ -1,107 +1,180 @@
 ---
-language: en
 license: cc-by-nc-4.0
 tags:
-- zetagrid
-- cpu-da
 - tcn
 - fractal
-- 25b
-datasets:
-- custom
-metrics:
-- loss
 ---
-# 📇 Model Card: RTH-LM (25B)
-## Model Details
-- **Name:** RTH-LM (25B)
-- **Architecture:** Fractal Gated Causal TCN (Temporal Convolutional Network)
-- **Parameters:** 7B (Physical) / 25B (Effective Fractal Capacity)
-- **Author:** Christian Quintino De Luca (RTH Italia)
-- **Release Date:** February 2026
-- **License:** CC BY-NC 4.0 (Research) / Commercial (Enterprise)
-- **Paper (Figshare):** https://doi.org/10.6084/m9.figshare.31376560
-RTH-LM (25B) is a **Fractal TCN (Temporal Convolutional Network)** Language Model, designed for high-efficiency inference on CPU/Consumer Hardware and massive scalability on GPUs.
-Unlike Traditional Transformers, ZetaGrid uses a **Gated Causal TCN backbone** with **Fractal Scaling**, allowing it to model long-range dependencies with significantly lower memory overhead during inference.
----
-## 📊  Model Specs
-| Feature | Specification |
-| :--- | :--- |
-| **Parameters** | 25 Billion (25B) |
-| **Architecture** | Fractal Gated TCN (Non-Transformer) |
-| **Layers** | 32 (Phase 2) |
-| **Context Window** | 256 - 1024 (Fractal Expansion Capable) |
-| **Training Data** | 1.48 GB Cleaned Text (Wiki/Books) |
-| **Final Loss** | **1.0675** (Phase 2) |
-| **Quantization** | QULP 2-bit (Supported) |
----
-## 🚀 Usage (Inference)
 ### Prerequisites
-Use the ZetaGrid reference repository and download the model artifacts from this Hugging Face repository.
 ```bash
-# Clone the repo
 git clone https://github.com/rthgit/ZetaGrid
 cd ZetaGrid
 ```
-### Running the Model (Python)
-Place the required artifacts in the working directory, or update the paths in the script:
-- `zeta25b_step15000.pt` - Soul/checkpoint weights
-- `zetagrid_25b_production.npy` - Genome weight bank
-```python
-import torch
-from ZETAGRID_INFERENCE import load_model, generate
-# Load 25B Model
-model = load_model("zeta25b_step15000.pt", genome="zetagrid_25b_production.npy")
-# Generate
-text = generate(model, "The future of AI is")
-print(text)
 ```
-### QULP 2-bit Inference (Ultra-Low Memory)
-If using the QULP artifact, download `zeta25b_2bit.qulp` from the model repository and run the matching local inference script when available:
 ```bash
-python QULP_INFERENCE.py --model zeta25b_2bit.qulp
 ```
----
-## 🧬 Architecture: The "Fractal Soul"
-ZetaGrid is **NOT** a Transformer. It is a TCN-based organism.
-- **Genome:** A fixed 7GB "DNA" bank of weights (`zetagrid_25b_production.npy`).
-- **Phenotype:** The model layers are "grown" from this genome on the fly.
-- **Training:** Only the "Soul" (LoRA Adapters + Norms) is trained (~300MB), making the model extremely portable.
-- **Fractal Scaling:** The 25B model can be fractally expanded to 50B, 100B+ by duplicating layers and adding self-linear noise.
----
-## 📈 Performance
-- **Phase 1 (Evolution):** 200 Generations of Genome Optimization.
-- **Phase 2 (Gradient):** 15,000 Steps of TCN+LoRA Fine-Tuning.
-- **Convergence:** Beat target loss of 1.5, achieving **1.0675**.
-- **Capabilities:** Narrative coherence, English syntax mastery, abstract reasoning.
----
-## 📜 License
-CC BY-NC 4.0 (Creative Commons Non-Commercial) for Research.
-**Commercial Use:** Requires a license from **RTH Italia**.
-For inquiries: info@rthitalia.com

 ---
 license: cc-by-nc-4.0
+language:
+- en
+- it
+- py
+- js
+- cpp
 tags:
+- text-generation
+- code-generation
+- non-transformer
 - tcn
 - fractal
+- lora
+- genome
+- rth-code
+- zetagrid
+pipeline_tag: text-generation
 ---
+# RTH-Code 25B
+RTH-Code 25B is an experimental code-specialist Soul for the RTH-LM / ZetaGrid architecture.
+It is not a standalone Transformer model. It is part of the RTH-LM Genome/Soul system: a shared frozen Genome provides the reusable parameter substrate, while a smaller trainable Soul carries task specialization.
+## Status
+This is an early proof-of-concept research release. It is intended for architecture evaluation, local experimentation, and reproducibility work around non-Transformer language models.
+Do not treat this release as a production coding assistant or as evidence of parity with frontier code models. The current release should be evaluated with fixed prompts, held-out code tasks, and reproducible benchmark harnesses before downstream use.
+## Model Details
+| Field | Value |
+| --- | --- |
+| Model name | RTH-Code 25B |
+| Organization | RTH Italia |
+| Author | Christian Quintino De Luca |
+| Architecture | Fractal Gated Causal TCN (non-Transformer) |
+| System design | Frozen Genome + trainable Soul adapters |
+| Effective capacity | 25B class, via fractal capacity framing |
+| Specialization | Code generation / code completion experiments |
+| Training data | Mixed code corpus, including Python, JavaScript/TypeScript, C/C++, Rust, and Go |
+| Training hardware | Single NVIDIA A40 class run |
+| License | CC BY-NC 4.0 for research/non-commercial use; commercial license required |
+| Paper | https://doi.org/10.6084/m9.figshare.31376560 |
+## Intended Use
+This release is intended for:
+- Research on non-attention language-model architectures.
+- Local experiments with the RTH-LM Genome/Soul design.
+- Code-generation prompt tests under controlled evaluation settings.
+- Comparison against Transformer and state-space baselines.
+- Reproducibility work around quantization and low-memory inference paths.
+This release is not intended for:
+- Production software development without independent validation.
+- Security-critical code generation.
+- Commercial products, paid APIs, or enterprise internal use without a commercial license.
+- Claims of benchmark superiority without published, reproducible benchmark evidence.
+## Architecture Summary
+RTH-Code 25B uses the same high-level ZetaGrid design as RTH-LM:
+- A Fractal Gated Causal Temporal Convolutional Network backbone.
+- No standard self-attention block.
+- A frozen Genome weight bank reused across model variants.
+- Trainable low-rank Soul adapters for specialization.
+- Optional QULP-style quantization path for low-memory experiments.
+The research hypothesis is that domain behavior can be changed by swapping the Soul while keeping the Genome stable. RTH-Code is the code-specialist demonstration of that idea.
+```mermaid
+graph TD
+    G["Frozen Genome<br/>shared parameter substrate"]
+    L["Language Soul<br/>general text behavior"]
+    C["Code Soul<br/>code-specialist behavior"]
+    G --> L
+    G --> C
+```
+## Files
+Typical artifacts for this release may include:
+| File | Role |
+| --- | --- |
+| `rth_lm_25b_code.gguf` | Unified GGUF artifact for local runtime experiments |
+| `zeta25b_code_FINAL.pt` | Code-specialist Soul checkpoint |
+| `zetagrid_25b_production.npy` | Shared Genome weight bank |
+| `config.json` | Architecture metadata |
+| `ZETAGRID_INFERENCE.py` | Reference Python inference script |
+File availability may differ by release channel. Large artifacts are hosted on Hugging Face rather than in the GitHub source repository.
+## Quickstart
 ### Prerequisites
+Use the ZetaGrid reference repository and download the Code artifacts from this Hugging Face repository.
 ```bash
 git clone https://github.com/rthgit/ZetaGrid
 cd ZetaGrid
 ```
+For the Code release, the relevant artifacts are:
+- `zeta25b_code_FINAL.pt` - Code-specialist Soul/checkpoint
+- `zetagrid_25b_production.npy` - shared Genome weight bank
+- `rth_lm_25b_code.gguf` - unified Code GGUF artifact, when using a compatible runtime
+- `config.json` - architecture metadata
+### Python reference path
+Place `zeta25b_code_FINAL.pt` and `zetagrid_25b_production.npy` in the ZetaGrid working directory, then use the local reference inference script as the starting point:
+```bash
+python ZETAGRID_INFERENCE.py
 ```
+The current Python script is research-oriented. Check the checkpoint selection/path before running and point it explicitly to `zeta25b_code_FINAL.pt` for the Code Soul.
+### GGUF path
+If a compatible runtime build is available for the RTH TCN operators:
 ```bash
+./llama-cli -m rth_lm_25b_code.gguf -p "def fibonacci(n):" -n 200
 ```
+Compatibility depends on runtime support for the custom RTH TCN architecture. Standard Transformer-only GGUF runners may not execute this architecture without additional kernels.
+## Evaluation Notes
+The strongest current evidence for this release is architectural and training-process evidence, not broad benchmark coverage. Before citing capability claims, run:
+- Deterministic code-completion prompts.
+- HumanEval or MBPP-style tasks, with exact pass@k settings.
+- Syntax-validity checks.
+- Repetition and invalid-token checks.
+- Comparisons against small open code models under the same decoding settings.
+Published benchmark results should include prompts, decoding parameters, commit hash, artifact hashes, and hardware.
+## Limitations
+- Early proof-of-concept model.
+- Not instruction tuned to the level of mainstream coding assistants.
+- Quality may vary strongly with decoding settings.
+- Runtime support for custom non-Transformer GGUF artifacts may require patched kernels.
+- Public claims should distinguish training loss, memory estimates, and actual task performance.
+## License and Commercial Use
+RTH-Code 25B is released under CC BY-NC 4.0 for research and non-commercial use.
+Commercial use requires a separate license from RTH Italia. Commercial use includes paid products, hosted APIs, enterprise internal development, integration into commercial developer tools, and any revenue-generating deployment.
+Contact: info@rthitalia.com
+## Citation
+```bibtex
+@techreport{deluca2026rthlm,
+  author      = {De Luca, Christian Quintino},
+  title       = {RTH-LM: A Fractal Temporal Convolutional Language Model},
+  institution = {RTH Italia (Research & Technology Hub)},
+  year        = {2026},
+  url         = {https://github.com/rthgit/ZetaGrid},
+  doi         = {10.6084/m9.figshare.31376560},
+  note        = {Non-commercial license. Contact RTH Italia for commercial use.}
+}
+```