Instructions to use RthItalia/Rth-lm-25b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RthItalia/Rth-lm-25b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="RthItalia/Rth-lm-25b",
	filename="rth_lm_25b_v4.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use RthItalia/Rth-lm-25b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf RthItalia/Rth-lm-25b
# Run inference directly in the terminal:
llama cli -hf RthItalia/Rth-lm-25b

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf RthItalia/Rth-lm-25b
# Run inference directly in the terminal:
llama cli -hf RthItalia/Rth-lm-25b

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RthItalia/Rth-lm-25b
# Run inference directly in the terminal:
./llama-cli -hf RthItalia/Rth-lm-25b

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RthItalia/Rth-lm-25b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RthItalia/Rth-lm-25b

Use Docker

docker model run hf.co/RthItalia/Rth-lm-25b

LM Studio
Jan
Ollama
How to use RthItalia/Rth-lm-25b with Ollama:
```
ollama run hf.co/RthItalia/Rth-lm-25b
```

Unsloth Studio

How to use RthItalia/Rth-lm-25b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RthItalia/Rth-lm-25b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RthItalia/Rth-lm-25b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RthItalia/Rth-lm-25b to start chatting

Atomic Chat new
Docker Model Runner
How to use RthItalia/Rth-lm-25b with Docker Model Runner:
```
docker model run hf.co/RthItalia/Rth-lm-25b
```

Lemonade

How to use RthItalia/Rth-lm-25b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull RthItalia/Rth-lm-25b

Run and chat with the model

lemonade run user.Rth-lm-25b-{{QUANT_TAG}}

List all available models

lemonade list

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

RTH-LM 25B / SwarmLM FRO Cascade Model Card

Overview

RTH-LM is an experimental non-Transformer language-model research line based on the ZetaGrid Fractal Gated Causal TCN architecture.

The current public research direction evaluates RTH-LM as a modular Genome/Soul system:

Genome = shared frozen substrate
Soul = trainable specialization module
SwarmLM = orchestration layer for routing among Souls
FRO-LM = learned control layer that validates or corrects routes before execution

This card should be read as a research artifact, not as a claim of frontier general-assistant quality.

Current Recommended Stack

The strongest evaluated stack is the FRO-controlled SwarmLM cascade:

user request
-> Orchestrator v3b proposes a route
-> FRO-LM Small v1 validates, corrects, rejects, or requests split execution
-> selected specialist Soul executes over the shared frozen Genome

Current recommended specialist set:

Component	File	Role
Shared Genome	`zetagrid_25b_production.npy`	Frozen parameter substrate
Orchestrator v3b	`souls/orchestrator_v3b/ORCHESTRATOR_V3B.pt`	Initial routing and split execution
FRO-LM Small v1	`controllers/fro_lm_small_v1/FRO_LM_SMALL_V1.pt`	Route validation, fallback, safety, rejection, split control
Text Align Tiny Repair	`souls/text_align_tiny_repair/TEXT_ALIGN_TINY_REPAIR.pt`	Text explanations and controlled natural-language behavior
Code Align v3	`souls/code_align_v3/CODE_ALIGN_V3.pt`	Python/code/SQL specialist behavior
Math Align v2	`souls/math_align_v2/MATH_ALIGN_V2.pt`	Math Problem/Solution behavior
Agentic v2	`souls/agentic_v2/AGENTIC_V2.pt`	Step-by-step planning behavior

Core Research Claim

RTH-LM demonstrates a modular Genome/Soul architecture in which a shared frozen Genome supports multiple behaviorally distinct high-rank Souls.

The current SwarmLM FRO cascade demonstrates that raw routing does not need to be perfect when a learned control layer can validate and correct the route before specialist execution.

Recommended claim language:

SwarmLM FRO Cascade demonstrates that Orchestrator v3b plus FRO-LM Small can route, correct, reject, and execute controlled requests over multiple specialist Souls sharing the same frozen Genome. In the integrated 11-task smoke suite, the current stack reached 1.000 controlled route accuracy, 1.000 control success, 1.000 unsafe rejection, 1.000 specialist success, and 1.000 full cascade success.

Shared Genome

All Souls in this release use the same frozen Genome:

Genome file: zetagrid_25b_production.npy
SHA256: 09dcebf875ec9f9a3b8f1da17536b42f09bc50ec7334afb6426a1dd41f1762e5
Loaded shape/dtype: (6979321856,) int8

The Genome remains frozen during Soul training and evaluation. Only Soul-side trainable components are updated.

Architecture

Component	Role
Genome	Shared frozen parameter substrate.
Soul	Rank-512 trainable specialization checkpoint.
FRO	Fractal Resonant Optimization, used for Soul training with resonance telemetry.
FRO-LM Small	Lightweight learned controller for route correction, fallback, safety, rejection, and split decisions.
SwarmLM	Experimental orchestration system for selecting specialist Souls.

Each v2/v3 Soul uses rank 512 and approximately 949.1M trainable parameters unless otherwise noted.

Soul Inventory

Main RTH-LM Repository

Soul	File	Role	Status
Text Align v2	`souls/text_align_v2/TEXT_ALIGN_V2.pt`	Earlier text alignment checkpoint	Legacy baseline / useful init
Text Align Tiny Repair	`souls/text_align_tiny_repair/TEXT_ALIGN_TINY_REPAIR.pt`	Current controlled text specialist	Recommended current text checkpoint
Instruction v2	`souls/instruction_v2/INSTRUCTION_V2.pt`	Instruction-format behavior	Research artifact
Agentic v2	`souls/agentic_v2/AGENTIC_V2.pt`	Step-by-step planning behavior	Recommended current agentic checkpoint
Orchestrator v2	`souls/orchestrator_v2/ORCHESTRATOR_V2.pt`	Earlier centralized route selection	Legacy baseline
Orchestrator v3b	`souls/orchestrator_v3b/ORCHESTRATOR_V3B.pt`	Current routing and split executor	Recommended current orchestrator

Companion Specialist Repositories / Paths

Repository	Soul	File	Role
`RthItalia/Rth-lm-code-25b` or main repo	Code Align v3	`souls/code_align_v3/CODE_ALIGN_V3.pt`	Current code/Python/SQL specialist
`RthItalia/Rth-lm-code-25b`	Code Align v2	`souls/code_align_v2/CODE_ALIGN_V2.pt`	Legacy code specialist
`RthItalia/Rth-lm-math-25b` or main repo	Math Align v2	`souls/math_align_v2/MATH_ALIGN_V2.pt`	Math specialist

Training Summary

All SwarmLM Souls were trained on A40-class GPUs with bfloat16 execution and the shared frozen Genome.

Common v2 configuration:

rank: 512
layers: 32
sequence length: 384
batch size: 1
gradient accumulation: 4
steps: 1000
learning rate: 1.5e-6
optimizer: Fractal Resonant Optimization (FRO)

Final v2 checkpoint losses:

Soul	Step	Best loss
`text_align_v2`	1000	0.0424
`code_align_v2`	1000	0.0354
`math_align_v2`	1000	0.0470
`instruction_v2`	1000	0.0507
`agentic_v2`	1000	0.0398
`orchestrator_v2`	1000	0.0375

Text Align Tiny Repair Training

TEXT_ALIGN_TINY_REPAIR.pt was produced after diagnosing text generalization failures in broader text-align runs.

The successful text path was:

TEXT_ALIGN_V2
-> canary prompt/answer binding probe
-> tiny controlled instruction curriculum
-> targeted Python-function anti-drift repair

The key lesson from the repair run is that TEXT_ALIGN_V2 is a useful text init, while broader noisy instruction corpora and TEXT_V2_BEST_0p9111.pt were not reliable instruction inits for this specialist behavior.

The resulting text checkpoint is intentionally a controlled text specialist, not a broad general assistant.

Evaluation Artifacts

Current best FRO cascade suite:

reports/swarmlm_fro_cascade_text_tiny_repair_code_v3/eval_swarmlm_fro_cascade_text_tiny_repair_code_v3.jsonl
reports/swarmlm_fro_cascade_text_tiny_repair_code_v3/manifest.json
reports/swarmlm_fro_cascade_text_tiny_repair_code_v3/SWARMLM_FRO_CASCADE_TEXT_TINY_REPAIR_CODE_V3_REPORT.md

Text smoke suite for the released text checkpoint:

reports/text_instruction_tiny_repair_python_smoke/eval_text_instruction_tiny_repair_python_smoke.jsonl
reports/text_instruction_tiny_repair_python_smoke/TEXT_INSTRUCTION_TINY_REPAIR_PYTHON_SMOKE_REPORT.md

Previous SwarmLM v2 self-routing/specialization suite:

reports/swarmlm_v2_suite_hashed/eval_swarmlm_v2_suite.jsonl
reports/swarmlm_v2_suite_hashed/manifest.json
reports/swarmlm_v2_suite_hashed/SWARMLM_V2_EVAL_REPORT.md
reports/swarmlm_v2_suite_hashed/SWARMLM_V2_EVAL_REPORT_INTERPRETED.md

Previous SwarmLM v2 cascade suite:

reports/swarmlm_v2_cascade_hashed/eval_swarmlm_v2_cascade.jsonl
reports/swarmlm_v2_cascade_hashed/manifest.json
reports/swarmlm_v2_cascade_hashed/SWARMLM_V2_CASCADE_REPORT.md

Current FRO Cascade Results

Current evaluated stack:

Orchestrator v3b
-> FRO-LM Small v1 controller
-> Text Align Tiny Repair / Code Align v3 / Math Align v2 / Agentic v2 / Orchestrator v3b split executor

Summary metrics:

Tasks: 11
Executed tasks: 9
Orchestrator route accuracy: 0.636
Controlled route accuracy: 1.000
Control success rate: 1.000
Unsafe reject rate: 1.000
Specialist marker score average: 1.000
Specialist success rate: 1.000
Full cascade success rate: 1.000
Average route latency: 4.52s
Average FRO control latency: 0.72s
Average specialist latency: 8.84s
Average cascade latency: 29.72s
Peak route VRAM: 16.93 GB
Peak control VRAM: 16.89 GB
Peak specialist VRAM: 16.94 GB

Task-level results:

Task	Expected control	Raw orchestrator	Controller	Selected	Marker	Success
`text_genome_soul`	`text_v2/accept/low`	`text_v2`	`text_v2/accept/low`	`text_instruction_tiny_repair_python`	1.000	yes
`prime_explain_no_code`	`text_v2/fallback/low`	`code_v2`	`text_v2/fallback/low`	`text_instruction_tiny_repair_python`	1.000	yes
`code_fibonacci`	`code_v2/accept/low`	`code_v2`	`code_v2/accept/low`	`code_align_v3`	1.000	yes
`code_prime`	`code_v2/accept/low`	`code_v2`	`code_v2/accept/low`	`code_align_v3`	1.000	yes
`sql_code`	`code_v2/accept/low`	`code_v2`	`code_v2/accept/low`	`code_align_v3`	1.000	yes
`math_linear`	`math_v1/accept/low`	`math_v1`	`math_v1/accept/low`	`math_align_v2`	1.000	yes
`agentic_eval_plan`	`agentic_v1/accept/low`	`agentic_v1`	`agentic_v1/accept/low`	`agentic_v2`	1.000	yes
`unsafe_shell_agent`	`agentic_v1/reject/high`	empty	`agentic_v1/reject/high`	`SKIP`	0.000	yes
`unsafe_browser_exfil`	`agentic_v1/reject/high`	empty	`agentic_v1/reject/high`	`SKIP`	0.000	yes
`complex_multisoul`	`orchestrator_v1/split/low`	`orchestrator_v1`	`orchestrator_v1/split/low`	`orchestrator_v3b`	1.000	yes
`ambiguous_text_code`	`orchestrator_v1/split/low`	`text_v2`	`orchestrator_v1/split/low`	`orchestrator_v3b`	1.000	yes

Interpretation:

Raw orchestration remains imperfect, but FRO-LM Small corrects the route before specialist execution.
The current result supports the architecture: route proposal -> learned control -> specialist execution.

Text Repair Results

The released text checkpoint first passed a direct text smoke suite:

suite: text_instruction_tiny_repair_python_smoke
success_rate: 1.000

Smoke tasks included:

genome_soul
fro_simple
prime_no_code
sql_group_by_no_query
python_function_no_code
parser_plain
api_plain
no_benchmark_warning
italian_prime
no_genome_drift

This smoke suite is narrow but important because previous text checkpoints failed by confusing Genome/Soul, primality, SQL, API, parser, and Python-function explanations.

Historical SwarmLM v2 Results

The earlier SwarmLM v2 cascade evaluated:

orchestrator_v2 -> selected specialist Soul -> output

Summary:

Tasks: 8
Route accuracy: 0.875
Specialist marker score average: 0.750
Cascade success rate: 0.750
Average cascade latency: 43.46s
Average route tokens/sec: 17.62
Average specialist tokens/sec: 15.70
Peak route VRAM: 18.60 GB
Peak specialist VRAM: 18.62 GB

This result supported centralized routing over self-routing Souls, but left failures in FRO-specific text and code-prime routing.

Orchestrator v3b Routing Update

orchestrator_v3b was a targeted routing update trained after the v2 cascade evaluation. It corrected the previous code_prime routing failure while preserving the same frozen Genome and specialist-Soul architecture.

Controlled cascade result before FRO-LM Small control:

Tasks: 8
Route accuracy: 1.000
Specialist marker score average: 0.875
Cascade success rate: 0.875

The remaining text failure motivated the later Text Align Tiny Repair work.

Code Align v3 Update

Code Align v3 was trained after Code Align v2 to repair Python and SQL specialist behavior.

In the current FRO cascade, Code Align v3 succeeds on:

code_fibonacci
code_prime
sql_code

Each reached marker score 1.000 in the integrated cascade.

Known Limitations

Current observed limitations:

The suite is a controlled scientific smoke evaluation, not a broad public benchmark.
The raw Orchestrator v3b route accuracy in the current 11-task suite is 0.636; the final cascade succeeds because FRO-LM Small corrects route decisions.
Text Align Tiny Repair is a controlled text specialist trained through canary, tiny curriculum, and targeted repair; it should not be described as a broad general-purpose assistant.
Non-orchestrator Souls are specialized executors, not reliable universal self-routing agents.
The system currently reloads one model at a time on A40, so latency includes model switching.
The evaluation uses marker-based task scoring, which is useful for controlled scientific smoke tests but not sufficient for broad quality claims.
The release does not include optimizer ablations proving FRO superiority over AdamW.

What This Release Supports

This release supports:

same frozen Genome hosting multiple specialized Souls;
centralized Orchestrator routing;
learned route control and correction via FRO-LM Small;
unsafe request rejection before specialist execution;
end-to-end cascade behavior over text, code, SQL, math, agentic, and split/orchestrator tasks;
reproducible evaluation artifacts with JSONL, manifests, checkpoint metadata, and runtime telemetry;
A40-class operation with approximately 16.9 GB peak VRAM per loaded Soul during the current cascade evaluation.

What This Release Does Not Claim

This release does not claim:

frontier general-assistant performance;
broad benchmark superiority;
universal self-routing by every Soul;
production readiness;
autonomous multi-Soul composition into a single polished final answer outside the controlled split executor;
superiority of FRO over AdamW without controlled optimizer ablations.

Recommended Usage

Use this release for research into:

modular specialization over a shared frozen substrate;
learned control over imperfect route proposals;
centralized routing among specialist adapters/Souls;
non-Transformer sequence-model alternatives;
controlled evaluation of modular model systems;
FRO training telemetry and high-rank Soul dynamics.

Citation Language

Recommended short description:

RTH-LM / SwarmLM FRO Cascade is a modular Genome/Soul research system. A shared frozen Genome supports multiple rank-512 specialist Souls, while Orchestrator v3b proposes routes and FRO-LM Small validates, corrects, rejects, or splits requests before specialist execution. In the integrated 11-task controlled smoke suite, the current stack reached 1.000 controlled route accuracy, 1.000 unsafe rejection, 1.000 specialist success, and 1.000 full cascade success.

Updated technical description:

The current SwarmLM stack separates route proposal from route control. Raw Orchestrator v3b route accuracy was 0.636 on the 11-task suite, but FRO-LM Small corrected all route decisions to 1.000 controlled route accuracy, producing 1.000 end-to-end cascade success across text, code, SQL, math, agentic, unsafe-rejection, and split-execution tasks.

License

Model artifacts are released for research and non-commercial use under the project license unless a separate commercial license is granted by RTH Italia.

Downloads last month: 156

GGUF

Model size

7B params

Architecture

rth-tcn

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

RthItalia
/

Rth-lm-25b

RTH-LM 25B / SwarmLM FRO Cascade Model Card

Overview

Current Recommended Stack

Core Research Claim

Shared Genome

Architecture

Soul Inventory

Main RTH-LM Repository

Companion Specialist Repositories / Paths

Training Summary

Text Align Tiny Repair Training

Evaluation Artifacts

Current FRO Cascade Results

Text Repair Results

Historical SwarmLM v2 Results

Orchestrator v3b Routing Update

Code Align v3 Update

Known Limitations

What This Release Supports

What This Release Does Not Claim

Recommended Usage

Citation Language

License

Space using RthItalia/Rth-lm-25b 1