TaoNet-mini-T2 / README.md
StarMist0012's picture
Add files using upload-large-folder tool
e2bfccc verified
metadata
license: mit
language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - taonet
  - taotern
  - ssm
  - state-space-model
  - dplr
  - pytorch
  - transformers
  - custom_code
  - text-generation
  - experimental
datasets:
  - TaoData

TaoNet-mini-T2

TaoNet-mini-T2 is an experimental 196M-parameter TaoNet language model using a Taotern/Gamma DPLR state-space model (SSM) sequence core instead of attention. The repository includes the full training handoff package, but the recommended inference path is now Hugging Face transformers remote code:

AutoModelForCausalLM.from_pretrained("TaoTern/TaoNet-mini-T2", trust_remote_code=True)

The default transformers loader downloads model/pretrain_final_model.pt and applies the RepoBridge chat-quality fix: ssm_finite_tail_correction=True and ssm_kernel_mode="recurrent".

Quick Start

Install runtime dependencies:

pip install torch transformers sentencepiece huggingface_hub pydantic pydantic-settings pyyaml numpy

For the private review repo, log in first:

hf auth login

Run generation from Python:

import time
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_NAME = "TaoTern/TaoNet-mini-T2"

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    trust_remote_code=True,
    torch_dtype=dtype,
).to(device)


def generate_text(prompt, max_new_tokens=64, temperature=0.7, top_p=0.85):
    inputs = tokenizer(prompt, return_tensors="pt")
    inputs = {key: value.to(device) for key, value in inputs.items()}

    start_time = time.time()
    with torch.inference_mode():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=top_p,
            repetition_penalty=1.2,
            do_sample=True,
            use_cache=False,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    elapsed_time = time.time() - start_time

    new_tokens = outputs.shape[1] - inputs["input_ids"].shape[1]
    tokens_per_second = new_tokens / elapsed_time if elapsed_time > 0 else 0.0
    completion = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return completion, tokens_per_second


if __name__ == "__main__":
    text, tps = generate_text("Fruit is now expensive so we should")
    print(text)
    print(f"\nTokens per second: {tps:.2f}")

To load the SFT final checkpoint instead of the default pretrain checkpoint:

model = AutoModelForCausalLM.from_pretrained(
    "TaoTern/TaoNet-mini-T2",
    trust_remote_code=True,
    checkpoint_name="final_model.pt",
)

Model Details

Field Value
Architecture taonet_ssm
Candidate pure_ssm_196m_branch_rms_only
Parameters 196,573,128
Hidden dimension 1024
Layers 18
FFN dimension 3072
Sequence length 512
Tokenizer TaoData pilot SentencePiece 8k
SSM core DPLR
SSM hidden dimension 32
SSM mixer dimension 256
SSM lanes 2 split lanes
SSM gate Channel gate
Local shift Enabled, per-channel
Branch RMS norm Enabled

Repository Layout

config.json
configuration_taonet_mini_t2.py
modeling_taonet_mini_t2.py
tokenization_taonet_mini_t2.py
tokenizer.model
model/
  final_model.pt                  # SFT final checkpoint
  pretrain_final_model.pt         # default checkpoint for HF inference
tokenizer/
  tokenizer.model
  tokenizer.vocab
code/
  TaoTrain/
  Taotern_SSM/
  Taotern_LLM_Experiments/
artifacts/
  configs/
  diagnostics/
chat_ssm_fixed.py                 # legacy local fixed-chat CLI
eval_lm_eval.py                   # local lm-eval harness wrapper

Upload Notes

This repo contains two multi-GB checkpoint files, so prefer the resumable large-folder uploader instead of the normal single-commit upload command:

hf upload-large-folder TaoTern/TaoNet-mini-T2 . --repo-type model --private

On Windows, from the repo folder:

powershell -ExecutionPolicy Bypass -File .\upload_large_folder.ps1

Inference Notes

The training config used ssm_finite_tail_correction=False and ssm_kernel_mode="conv". That path is fast for full-sequence training/evaluation but produced poor chat samples in the recovered workflow.

The transformers wrapper defaults to:

ssm_finite_tail_correction=True
ssm_kernel_mode=recurrent
checkpoint=model/pretrain_final_model.pt

For fast benchmark scoring, use the included eval_lm_eval.py script with --ssm-kernel-mode conv --finite-tail.

LM Evaluation Harness Benchmark

Settings:

library=lm-eval-harness
checkpoint=model/pretrain_final_model.pt
num_fewshot=0
limit=100
ssm_kernel_mode=conv
ssm_finite_tail_correction=true
eval_batch_size=8

Results:

Task Primary score
HellaSwag 0.3300
ARC Easy 0.3400
ARC Challenge 0.2200
PIQA 0.4400
Winogrande 0.5300
Mean primary score 0.3720

These are limit-100 smoke benchmark numbers for review, not full leaderboard results.

Training Summary

Run ID:

taotern-200m-branch-only-chat-20260514
Stage Value
Pretrain token positions 4,000,000,000
Pretrain steps 976,563
SFT steps 50,000
Batch size 8
Sequence length 512
Pretrain LR 8e-4
SFT LR 5e-5

Compact post-run statistics:

Metric Value
Pretrain first loss 9.26
Pretrain last loss 2.64
Pretrain tail-100 mean 2.3351
SFT first loss 3.20
SFT last loss 1.08
SFT tail-100 mean 0.9585
Activation probe loss 2.8460
Final block RMS 45.97
Final block max abs 2560.03

Intended Use

This model is intended for:

  • Taotern/TaoNet SSM research
  • checkpoint backup and reproducibility
  • deployment experiments with custom Hugging Face remote code
  • studying recurrent SSM inference behavior

Limitations

  • Experimental model quality; validate outputs before use.
  • Requires trust_remote_code=True because the architecture is not part of upstream transformers.
  • The recommended chat path depends on an inference-time SSM override.
  • CPU inference is expected to be very slow.
  • English-focused pilot data/tokenizer.

Citation

@software{taonet_mini_t2_2026,
  title = {TaoNet-mini-T2: TaoNet SSM Language Model Checkpoint},
  author = {TaoTern},
  year = {2026},
  url = {https://huggingface.co/TaoTern/TaoNet-mini-T2}
}

Related