Instructions to use thuml/Thoth-30B-A3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use thuml/Thoth-30B-A3B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="thuml/Thoth-30B-A3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("thuml/Thoth-30B-A3B")
model = AutoModelForCausalLM.from_pretrained("thuml/Thoth-30B-A3B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use thuml/Thoth-30B-A3B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "thuml/Thoth-30B-A3B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thuml/Thoth-30B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/thuml/Thoth-30B-A3B

SGLang

How to use thuml/Thoth-30B-A3B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "thuml/Thoth-30B-A3B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thuml/Thoth-30B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "thuml/Thoth-30B-A3B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thuml/Thoth-30B-A3B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use thuml/Thoth-30B-A3B with Docker Model Runner:
```
docker model run hf.co/thuml/Thoth-30B-A3B
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Thoth: Mid-Training Bridges LLMs to Time Series Understanding

📄 Introduction

While Large Language Models (LLMs) demonstrate exceptional proficiency in general reasoning, they often exhibit a fundamental limitation in capturing intricate temporal dependencies. To bridge this gap, Thoth introduces the first family of mid-trained LLMs that transcend the constraints of task-specific Supervised Fine-Tuning (SFT) through a task- and domain-agnostic mid-training stage. By leveraging an automated synthesis pipeline to achieve bidirectional alignment between time-series-to-text and text-to-time-series generation, Thoth equips models with an intrinsic and foundational understanding of temporal dynamics. This internalized comprehension enables the model to effectively address and enhance performance across a wide range of complex, knowledge-intensive time series reasoning downstream tasks in real-world scenarios.

Thoth-30B-A3B is a full-parameter fine-tuned version based on the Qwen3-30B-A3B-Instruct-2507. For more details, please check our paper.

✨ Quickstart

pip install transformers==4.57.1

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "thuml/Thoth-30B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", dtype=torch.bfloat16, trust_remote_code=True).eval()

# A simple time series anomaly detection task
question = """The following data represents the hourly electricity consumption (in kWh) of an office building over a 24-hour period, starting from midnight (00:00).
Data: [12.5, 11.8, 12.1, 11.5, 12.2, 11.9, 15.6, 32.4, 35.1, 34.8, 36.2, 65.5, 37.0, 35.5, 34.2, 33.9, 35.1, 31.8, 18.2, 14.5, 13.1, 12.8, 12.4, 11.9]
Task: 1. Specify the hour (0-23) when the anomaly occurs. 2. Provide a brief reasoning why you consider it an anomaly."""

messages = [
    {"role": "system", "content": "You are an expert in time series understanding and reasoning."},
    {"role": "user", "content": question}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate reasoning output
generated_ids = model.generate(**model_inputs, max_new_tokens=512, temperature=0.7)

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

For detailed evaluation, please visit our GitHub repository: https://github.com/thuml/Thoth.

🚀 Release Progress

Thoth-30B-A3B model weights
public benchmark evaluation pipeline
KnoTS benchmark
KnoTS evaluation code

📜 Citation

If you find our work useful, please cite our paper as:

@article{lin2026thoth,
  title={Thoth: Mid-Training Bridges LLMs to Time Series Understanding},
  author={Lin, Jiafeng and Wang, Yuxuan and Wu, Jialong and Luo, Huakun and Pei, Zhongyi and Wang, Jianmin},
  journal={arXiv preprint arXiv:2603.01042},
  year={2026}
}