Instructions to use EricSpencer00/chattla-20b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use EricSpencer00/chattla-20b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="EricSpencer00/chattla-20b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("EricSpencer00/chattla-20b")
model = AutoModelForCausalLM.from_pretrained("EricSpencer00/chattla-20b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use EricSpencer00/chattla-20b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="EricSpencer00/chattla-20b",
	filename="gguf/chattla-20b-v10-Q8_0.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use EricSpencer00/chattla-20b with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EricSpencer00/chattla-20b:Q8_0
# Run inference directly in the terminal:
llama cli -hf EricSpencer00/chattla-20b:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf EricSpencer00/chattla-20b:Q8_0
# Run inference directly in the terminal:
llama cli -hf EricSpencer00/chattla-20b:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf EricSpencer00/chattla-20b:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf EricSpencer00/chattla-20b:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf EricSpencer00/chattla-20b:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf EricSpencer00/chattla-20b:Q8_0

Use Docker

docker model run hf.co/EricSpencer00/chattla-20b:Q8_0

LM Studio
Jan

vLLM

How to use EricSpencer00/chattla-20b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "EricSpencer00/chattla-20b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EricSpencer00/chattla-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/EricSpencer00/chattla-20b:Q8_0

SGLang

How to use EricSpencer00/chattla-20b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "EricSpencer00/chattla-20b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EricSpencer00/chattla-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "EricSpencer00/chattla-20b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "EricSpencer00/chattla-20b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use EricSpencer00/chattla-20b with Ollama:
```
ollama run hf.co/EricSpencer00/chattla-20b:Q8_0
```

Unsloth Studio

How to use EricSpencer00/chattla-20b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EricSpencer00/chattla-20b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for EricSpencer00/chattla-20b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for EricSpencer00/chattla-20b to start chatting

How to use EricSpencer00/chattla-20b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EricSpencer00/chattla-20b:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "EricSpencer00/chattla-20b:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use EricSpencer00/chattla-20b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf EricSpencer00/chattla-20b:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default EricSpencer00/chattla-20b:Q8_0

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use EricSpencer00/chattla-20b with Docker Model Runner:
```
docker model run hf.co/EricSpencer00/chattla-20b:Q8_0
```

Lemonade

How to use EricSpencer00/chattla-20b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull EricSpencer00/chattla-20b:Q8_0

Run and chat with the model

lemonade run user.chattla-20b-Q8_0

List all available models

lemonade list

ChatTLA-20b (v22)

ChatTLA is a fine-tuned version of openai/gpt-oss-20b specialised in generating TLA+ formal specifications — the language used by AWS, Microsoft, and Intel to mathematically verify distributed systems.

Given a plain-English description of a concurrent or distributed system, ChatTLA outputs a complete, syntactically valid TLA+ module including Init, Next, Spec, TypeOK, and domain invariants, together with a TLC model-checker configuration block. The default prover training corpus is 1330 rows and already includes all 205 canonical FormaLLM benchmark examples.

Benchmark Results (v22, 20-problem full suite)

SANY pass: 14/20 (70%) · TLC pass: 9/20 (45%) · Avg structural: 0.69

Auto-updated from benchmark_results_fc128best_full_20260701_163601.csv (full benchmark suite; mode: self-correct+best-of-5).

Single-shot on the same suite (attempts=1, no self-correction): 4/20 SANY, 1/20 TLC. The 20 benchmark problems are not present in this checkpoint's training corpora, so these are generalization numbers; every TLC-pass row re-verifies from the published benchmark CSV alone via the repo's replay gate. Publish readiness is gated by checked-in manifests that record the exact source CSV, freshness window, and inference mode. Separately, the machine-checked proof artifact chattla-tla-prover-108-108 reproduces TLAPS proofs for 18 known modules (299/299 obligations) — a coverage result over known specs, not a generalization claim.

Archived 30-spec Eval Results (v15, 3-shot self-correct)

Evaluated on a 30-spec held-out suite spanning communication protocols, concurrency primitives, consensus, data structures, memory/caches, mutual exclusion, classical puzzles, scheduling, transactions, and workflow state machines. Each spec gets up to 3 self-correction attempts using TLC error feedback. Tiers are defined by what the spec actually does under SANY and TLC, not just whether it parses:

Tier	Meaning
💎 Diamond	Gold and TLC explores ≥1 distinct state, has a non-trivial invariant, and the invariant catches a mutation
🥇 Gold	SANY parses and TLC model-checks clean
🥈 Silver	SANY parses, TLC finds violation or timeout
Bronze	SANY parse failure

Diamond is the headline metric: it's the only tier that proves the spec is semantically useful rather than just syntactically valid.

Per-spec results (30-spec holdout)

#	Batch	Module	Tier	Diamond
1	communication_protocols	AlternatingBit	Bronze
2	communication_protocols	Arp	Bronze
3	communication_protocols	AtomicRegister	Bronze
4	concurrency_primitives	BinarySemaphore	Bronze
5	concurrency_primitives	Channel	Bronze
6	concurrency_primitives	CountDownLatch	Bronze
7	consensus_election	AtomicCommit	Bronze
8	consensus_election	BullyElection	🥇 Gold	💎
9	consensus_election	ByzantineQuorum	Bronze
10	data_structures	BinaryHeap	Bronze
11	data_structures	BloomCounter	🥇 Gold	💎
12	data_structures	BloomFilter	⏱ Timeout
13	memory_caches	ArenaAllocator	🥇 Gold	💎
14	memory_caches	BuddyAllocator	Bronze
15	memory_caches	CopyingGc	Bronze
16	mutual_exclusion	AdaptiveMutex	🥇 Gold	💎
17	mutual_exclusion	AndersonMutex	🥇 Gold	💎
18	mutual_exclusion	AravindMutex	⏱ Timeout
19	puzzles_classical	BlocksWorld	Bronze
20	puzzles_classical	ChessKingMoves	Bronze
21	puzzles_classical	ColoredHats	Bronze
22	scheduling_resources	AdmissionControl	🥇 Gold	💎
23	scheduling_resources	BackpressureChannel	🥇 Gold	💎
24	scheduling_resources	Bankers	⏱ Timeout
25	transactions_databases	ChainReplication	⏱ Timeout
26	transactions_databases	DistributedLock	Bronze
27	transactions_databases	FencingToken	Bronze
28	workflows_state_machines	ContentModeration	🥇 Gold	💎
29	workflows_state_machines	DocumentApproval	🥇 Gold	💎
30	workflows_state_machines	EmailVerification	Bronze

Diamond: 9/30 (30%) · Gold: 9/30 (30%)

Per-domain breakdown

Domain	Diamond
communication_protocols	0/3
concurrency_primitives	0/3
consensus_election	1/3
data_structures	1/3
memory_caches	1/3
mutual_exclusion	2/3
puzzles_classical	0/3
scheduling_resources	2/3
transactions_databases	0/3
workflows_state_machines	2/3

Version history

Version	Suite	SANY	TLC	Diamond / Notes
v6	20-problem handcraft	4/20 (20%)	1/20 (5%)	—
v7	20-problem handcraft	6/20 (30%)	1/20 (5%)	—
v8	20-problem handcraft	8/20 (40%)	1/20 (5%)	—
v9	20-problem handcraft	6/20 (30%)	3/20 (15%)	—
v9 best-of-5 + self-correct	20-problem handcraft	16/20 (80%)	5/20 (25%)	—
v10	20-problem handcraft	6/20 (30%)	2/20 (10%)	—
v11	20-problem handcraft	6/20 (30%)	2/20 (10%)	—
v13 (SFT + DPO)	20-problem handcraft	9/20 (45%)	5/20 (25%)	not measured (trivial invariants counted as Gold)
v14 (Diamond SFT)	30-spec holdout (single-shot)	16/30 (53%)	5/30 (17%)	4/30 (13%)
v15 (Repair GRPO)	30-spec holdout (3-shot)	9/30 (30%)	9/30 (30%)	9/30 (30%)
v22 (fc128best, penalty-free sampling)	20-problem full suite (best-of-5 + self-correct)	14/20 (70%)	9/20 (45%)	single-shot: 4/20 SANY, 1/20 TLC; all 9 TLC rows replay-verified

v15 applies repair-based GRPO (Group Relative Policy Optimization) on top of v14's Diamond SFT weights. The model learns to fix its own broken specs by training on (broken → repaired) trajectory pairs with TLC-graded improvement reward. v15 eval uses 3-shot self-correction with TLC error feedback, matching realistic usage; v14 was evaluated single-shot, so SANY/TLC rates are not directly comparable. Diamond is the metric to track going forward.

Quick Start

Ollama (recommended)

# Pull and run directly
ollama run EricSpencer00/chattla-20b

# Or use the bundled Modelfile
curl -L https://huggingface.co/EricSpencer00/chattla-20b/resolve/main/gguf/Modelfile -o Modelfile
ollama create chattla:20b -f Modelfile
ollama run chattla:20b "Write a TLA+ spec for a token ring with N nodes."

Python (transformers)

from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="EricSpencer00/chattla-20b",
    device_map="auto",
)

prompt = (
    "Write a complete TLA+ specification for a two-phase commit protocol "
    "with one coordinator and N participants."
)
result = pipe([{"role": "user", "content": prompt}], max_new_tokens=1024, return_full_text=False)
print(result[0]["generated_text"])

llama.cpp / GGUF

# Download GGUF
huggingface-cli download EricSpencer00/chattla-20b \
    gguf/chattla-20b-v22-Q8_0.gguf \
    --local-dir ./chattla

# Run with llama.cpp
./llama-cli -m chattla/gguf/chattla-20b-v22-Q8_0.gguf \
    -n 1024 --temp 0.4 \
    -p "Write a TLA+ spec for mutual exclusion with N processes."

Model Details

Property	Value
Base model	openai/gpt-oss-20b
Parameters	20.9B
Architecture	GptOss (sliding + full attention)
Fine-tuning method	Diamond SFT (LoRA) → Repair GRPO (LoRA) → merged
Context length	2048 (trained) / 131072 (base)
GGUF quantisation	Q8_0 (~22 GB)
Training date	April 2026

System prompt

The model is prompted with:

You are ChatTLA, an expert at writing verified TLA+ formal specifications.
When asked to write a TLA+ spec, follow these rules exactly:
1. Start the module with ---- MODULE <ModuleName> ----
2. End with ====
3. Include EXTENDS, VARIABLES, Init, Next, and Spec operators
4. After the TLA+ module, append a TLC configuration block:
   SPECIFICATION Spec
   INVARIANT TypeOK   (if TypeOK is defined)
5. Output only valid TLA+ code. No markdown fences, no explanation outside the spec.

Training

Phase 1: Diamond SFT (v14)

v14 was produced by the Diamond curation pipeline: candidate TLA+ specs are generated by an earlier checkpoint, then graded by a tlc_validator that checks SANY parsing, TLC state-space exploration, non-trivial invariants, and mutation-test sensitivity. Specs that survive grading are LLM-judged for chain-of-thought quality, leaving a curated training pool (209 raw → 73 curated for the v14 SFT round). The model is fine-tuned with LoRA on this pool and merged.

Phase 2: Repair GRPO (v15)

v15 applies repair-based GRPO (Group Relative Policy Optimization) on top of the v14 checkpoint. The key insight: instead of training on gold-standard specs alone, the model learns to fix broken specs using TLC error feedback as reward signal.

Pipeline:

Trajectory collection — the v14 model generates specs for 398 problems with up to 6 repair iterations each, producing (broken, repaired) pairs scored by a multi-stage validator (SANY → TLC → Apalache → TLAPS).
Dataset filtering — pairs are filtered to keep the "learnable middle": min_before_score=0.10 (drop unparseable) and max_before_score=0.80 (drop already-good), yielding ~430 gradable pairs centered on score ≈ 0.45.
GRPO training — 300 steps, 4 generations per prompt, max 384 completion tokens. The reward is the improvement delta: after_score - before_score, normalized by group. Learning rate 3e-6, KL penalty β=0.02, temperature 0.5.
LoRA merge — best checkpoint (around step 140–160 where reward peaked) merged back into full weights.

Reward peaked at steps 140–160 with reward_std ≈ 0.25 (vs 0.0 in prior full-spec GRPO attempts that had zero variance). This was the first successful RL run on TLA+ spec generation.

R2 regression and R3 (in progress). A second flywheel round (R2) continued GRPO from v15's merged weights on a freshly harvested dataset and regressed to 6/30 (20%). Post-mortem: the Phase 2 merge deduped pairs on (nl[:80], round(before_score, 1)), a score-bucket width of 0.1 that collapsed most of the learnable-middle band; combined with a raised min_before_score = 0.10, the usable training set fell from 433 → 179 pairs, shifted hard (mean before_score 0.26 → 0.42), and the model overtrained past its 150-step peak over 300 steps. Regressions concentrated in mutual_exclusion and workflows_state_machines (2/3 lost each). R3 pulls only the data and step-budget levers: dedup key widened to (nl[:120], round(before_score, 2)), score floor restored to 0.02, --max-iters raised 6 → 9 to grow the raw pool, and --max-steps cut to 175 with a checkpoint picker that selects the save closest to step 150. v15 remains the production checkpoint until R3 beats 9/30.

DPO/KTO refinement was used in v11–v13 but was deprecated in the Diamond overhaul: 0/484 specs from those preference-trained checkpoints actually passed Diamond, indicating the model had learned TLA+ syntax without learning semantics.

Training configuration

Setting	Value
SFT method	LoRA (lora_dropout=0)
GRPO method	LoRA, 4 generations, 384 max completion
GRPO learning rate	3e-6
GRPO KL β	0.02
GRPO steps	300 (best checkpoint ~150)
Max sequence length	2048
TRL	0.28.0
Transformers	5.2.0
PyTorch	2.10.0
Hardware	2× Quadro RTX 8000 (48 GB each)

Files

EricSpencer00/chattla-20b
├── config.json              # Model architecture
├── tokenizer.json           # Tokenizer
├── tokenizer_config.json
├── chat_template.jinja      # Chat template
├── pytorch_model.bin        # Full BF16 weights (39 GB)
├── generation_config.json
└── gguf/
    ├── chattla-20b-v22-Q8_0.gguf   # Quantised GGUF for Ollama / llama.cpp
    └── Modelfile                    # Ollama Modelfile

Intended Use

ChatTLA is designed for:

Rapid prototyping of TLA+ specifications from natural-language system descriptions
Educational exploration of formal methods
Assisting engineers who are learning TLA+

Not intended for: safety-critical or production verification without human review. Always validate generated specs with SANY and TLC before relying on them.

Citation

@misc{chattla2026,
  title   = {ChatTLA: Fine-tuned LLM for TLA+ Formal Specification Generation},
  author  = {Spencer, Eric},
  year    = {2026},
  url     = {https://huggingface.co/EricSpencer00/chattla-20b},
}

Downloads last month: 831

Safetensors

Model size

21B params

Tensor type

BF16

Model tree for EricSpencer00/chattla-20b

Base model

openai/gpt-oss-20b

Quantized

(213)

this model