Instructions to use amaye15/ttm-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use amaye15/ttm-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="amaye15/ttm-gguf",
	filename="ttm-rs/gguf/ttm-f16.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use amaye15/ttm-gguf with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf amaye15/ttm-gguf:F16
# Run inference directly in the terminal:
llama cli -hf amaye15/ttm-gguf:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf amaye15/ttm-gguf:F16
# Run inference directly in the terminal:
llama cli -hf amaye15/ttm-gguf:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf amaye15/ttm-gguf:F16
# Run inference directly in the terminal:
./llama-cli -hf amaye15/ttm-gguf:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf amaye15/ttm-gguf:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf amaye15/ttm-gguf:F16

Use Docker

docker model run hf.co/amaye15/ttm-gguf:F16

LM Studio
Jan
Ollama
How to use amaye15/ttm-gguf with Ollama:
```
ollama run hf.co/amaye15/ttm-gguf:F16
```

Unsloth Studio

How to use amaye15/ttm-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for amaye15/ttm-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for amaye15/ttm-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for amaye15/ttm-gguf to start chatting

Atomic Chat new
Docker Model Runner
How to use amaye15/ttm-gguf with Docker Model Runner:
```
docker model run hf.co/amaye15/ttm-gguf:F16
```

Lemonade

How to use amaye15/ttm-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull amaye15/ttm-gguf:F16

Run and chat with the model

lemonade run user.ttm-gguf-F16

List all available models

lemonade list

ttm-rs

Pure Rust converter and inference engine for ibm-granite/granite-timeseries-ttm-r2.

Pre-converted GGUF files are available at amaye15/ttm-gguf. Produces GGUF v3 files and runs native forecasting — no Python required.

Build

cargo build --release

Convert

Downloads the model from HuggingFace and writes a GGUF file:

# F16 (recommended)
./target/release/ttm-rs convert --model ibm-granite/granite-timeseries-ttm-r2 --dtype f16 --output gguf/ttm-f16.gguf

# Q8_0 (smallest)
./target/release/ttm-rs convert --dtype q8 --output gguf/ttm-q8.gguf

# F32 (full precision)
./target/release/ttm-rs convert --dtype f32 --output gguf/ttm-f32.gguf

To convert all dtypes at once:

./scripts/convert_all.sh

HuggingFace token (optional for public models):

HF_TOKEN=hf_... ./scripts/convert_all.sh

Inspect tensors

Print all tensor names and shapes from a safetensors file:

./target/release/ttm-rs inspect-tensors models/model.safetensors

Infer

Run forecasting from comma-separated context values. The context must be at least patch_length steps (defined in config.json):

echo '{"context": [1.0, 1.2, 1.5, 1.3, 1.8, 2.0, 1.9, 2.1], "horizon": 96}' \
  | ./target/release/ttm-rs infer \
      --gguf gguf/ttm-f16.gguf \
      --config models/config.json

Output is JSON in an OpenAI-compatible forecast format:

{
  "id": "forecast-000001932b7a1234",
  "object": "forecast",
  "created": 1749686400,
  "model": "ttm",
  "choices": [{
    "index": 0,
    "forecast": {
      "point": [2.1, 2.3, 2.5, "..."],
      "quantiles": {}
    },
    "finish_reason": "stop"
  }],
  "usage": {"context_length": 8, "forecast_length": 96}
}

Batch inference — pass multiple series as a nested array to get one Choice per series:

echo '{"context": [[1.0, 1.2, 1.5], [2.0, 2.2, 2.5]], "horizon": 96}' \
  | ./target/release/ttm-rs infer \
      --gguf gguf/ttm-f16.gguf \
      --config models/config.json

Python bindings

Install with maturin inside a virtual environment:

python -m venv .venv && source .venv/bin/activate
pip install maturin
maturin develop --features python

import ttm_rs

model = ttm_rs.Ttm("gguf/ttm-f16.gguf", "models/config.json")

result = model.forecast([1.0, 1.2, 1.5, 1.3, 1.8, 2.0], horizon=96)
point  = result["choices"][0]["forecast"]["point"]

# Batch — one Choice per series
result = model.forecast([[1.0, 1.2, 1.5], [2.0, 2.2, 2.5]], horizon=96)

forecast returns a Python dict in the same OpenAI-compatible format as the CLI.

Architecture notes

TTM (Tiny Time-Mixer) is a compact encoder-decoder model:

Encoder: Multi-layer mixer blocks operating across the patch dimension; adaptive patching levels allow different temporal resolutions simultaneously
Decoder: Lightweight projection from encoder representations to the forecast horizon
Scale: ~1M parameters — orders of magnitude smaller than transformer-based foundation models, competitive on short-horizon benchmarks
Config: context_length, prediction_length, patch_length, patch_stride, d_model, num_layers, decoder_num_layers, and adaptive_patching_levels are loaded from config.json