--- license: mit language: - en library_name: transformers pipeline_tag: text-generation tags: - taonet - taotern - ssm - state-space-model - dplr - pytorch - transformers - custom_code - text-generation - experimental datasets: - TaoData --- # TaoNet-mini-T2 TaoNet-mini-T2 is an experimental 196M-parameter TaoNet language model using a Taotern/Gamma DPLR state-space model (SSM) sequence core instead of attention. The repository includes the full training handoff package, but the recommended inference path is now Hugging Face `transformers` remote code: ```python AutoModelForCausalLM.from_pretrained("TaoTern/TaoNet-mini-T2", trust_remote_code=True) ``` The default `transformers` loader downloads `model/pretrain_final_model.pt` and applies the RepoBridge chat-quality fix: `ssm_finite_tail_correction=True` and `ssm_kernel_mode="recurrent"`. ## Quick Start Install runtime dependencies: ```bash pip install torch transformers sentencepiece huggingface_hub pydantic pydantic-settings pyyaml numpy ``` For the private review repo, log in first: ```bash hf auth login ``` Run generation from Python: ```python import time import torch from transformers import AutoModelForCausalLM, AutoTokenizer MODEL_NAME = "TaoTern/TaoNet-mini-T2" device = "cuda" if torch.cuda.is_available() else "cpu" dtype = torch.bfloat16 if device == "cuda" else torch.float32 tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( MODEL_NAME, trust_remote_code=True, torch_dtype=dtype, ).to(device) def generate_text(prompt, max_new_tokens=64, temperature=0.7, top_p=0.85): inputs = tokenizer(prompt, return_tensors="pt") inputs = {key: value.to(device) for key, value in inputs.items()} start_time = time.time() with torch.inference_mode(): outputs = model.generate( **inputs, max_new_tokens=max_new_tokens, temperature=temperature, top_p=top_p, repetition_penalty=1.2, do_sample=True, use_cache=False, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) elapsed_time = time.time() - start_time new_tokens = outputs.shape[1] - inputs["input_ids"].shape[1] tokens_per_second = new_tokens / elapsed_time if elapsed_time > 0 else 0.0 completion = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) return completion, tokens_per_second if __name__ == "__main__": text, tps = generate_text("Fruit is now expensive so we should") print(text) print(f"\nTokens per second: {tps:.2f}") ``` To load the SFT final checkpoint instead of the default pretrain checkpoint: ```python model = AutoModelForCausalLM.from_pretrained( "TaoTern/TaoNet-mini-T2", trust_remote_code=True, checkpoint_name="final_model.pt", ) ``` ## Model Details | Field | Value | |---|---:| | Architecture | `taonet_ssm` | | Candidate | `pure_ssm_196m_branch_rms_only` | | Parameters | 196,573,128 | | Hidden dimension | 1024 | | Layers | 18 | | FFN dimension | 3072 | | Sequence length | 512 | | Tokenizer | TaoData pilot SentencePiece 8k | | SSM core | DPLR | | SSM hidden dimension | 32 | | SSM mixer dimension | 256 | | SSM lanes | 2 split lanes | | SSM gate | Channel gate | | Local shift | Enabled, per-channel | | Branch RMS norm | Enabled | ## Repository Layout ```text config.json configuration_taonet_mini_t2.py modeling_taonet_mini_t2.py tokenization_taonet_mini_t2.py tokenizer.model model/ final_model.pt # SFT final checkpoint pretrain_final_model.pt # default checkpoint for HF inference tokenizer/ tokenizer.model tokenizer.vocab code/ TaoTrain/ Taotern_SSM/ Taotern_LLM_Experiments/ artifacts/ configs/ diagnostics/ chat_ssm_fixed.py # legacy local fixed-chat CLI eval_lm_eval.py # local lm-eval harness wrapper ``` ## Upload Notes This repo contains two multi-GB checkpoint files, so prefer the resumable large-folder uploader instead of the normal single-commit upload command: ```bash hf upload-large-folder TaoTern/TaoNet-mini-T2 . --repo-type model --private ``` On Windows, from the repo folder: ```powershell powershell -ExecutionPolicy Bypass -File .\upload_large_folder.ps1 ``` ## Inference Notes The training config used `ssm_finite_tail_correction=False` and `ssm_kernel_mode="conv"`. That path is fast for full-sequence training/evaluation but produced poor chat samples in the recovered workflow. The `transformers` wrapper defaults to: ```text ssm_finite_tail_correction=True ssm_kernel_mode=recurrent checkpoint=model/pretrain_final_model.pt ``` For fast benchmark scoring, use the included `eval_lm_eval.py` script with `--ssm-kernel-mode conv --finite-tail`. ## LM Evaluation Harness Benchmark Settings: ```text library=lm-eval-harness checkpoint=model/pretrain_final_model.pt num_fewshot=0 limit=100 ssm_kernel_mode=conv ssm_finite_tail_correction=true eval_batch_size=8 ``` Results: | Task | Primary score | |---|---:| | HellaSwag | 0.3300 | | ARC Easy | 0.3400 | | ARC Challenge | 0.2200 | | PIQA | 0.4400 | | Winogrande | 0.5300 | | Mean primary score | 0.3720 | These are limit-100 smoke benchmark numbers for review, not full leaderboard results. ## Training Summary Run ID: ```text taotern-200m-branch-only-chat-20260514 ``` | Stage | Value | |---|---:| | Pretrain token positions | 4,000,000,000 | | Pretrain steps | 976,563 | | SFT steps | 50,000 | | Batch size | 8 | | Sequence length | 512 | | Pretrain LR | 8e-4 | | SFT LR | 5e-5 | Compact post-run statistics: | Metric | Value | |---|---:| | Pretrain first loss | 9.26 | | Pretrain last loss | 2.64 | | Pretrain tail-100 mean | 2.3351 | | SFT first loss | 3.20 | | SFT last loss | 1.08 | | SFT tail-100 mean | 0.9585 | | Activation probe loss | 2.8460 | | Final block RMS | 45.97 | | Final block max abs | 2560.03 | ## Intended Use This model is intended for: - Taotern/TaoNet SSM research - checkpoint backup and reproducibility - deployment experiments with custom Hugging Face remote code - studying recurrent SSM inference behavior ## Limitations - Experimental model quality; validate outputs before use. - Requires `trust_remote_code=True` because the architecture is not part of upstream `transformers`. - The recommended chat path depends on an inference-time SSM override. - CPU inference is expected to be very slow. - English-focused pilot data/tokenizer. ## Citation ```bibtex @software{taonet_mini_t2_2026, title = {TaoNet-mini-T2: TaoNet SSM Language Model Checkpoint}, author = {TaoTern}, year = {2026}, url = {https://huggingface.co/TaoTern/TaoNet-mini-T2} } ``` ## Related - [TaoTern/TaoNet-pico-T1](https://huggingface.co/TaoTern/TaoNet-pico-T1)