Text Generation
Transformers
PyTorch
English
taonet_mini_t2
taonet
taotern
ssm
state-space-model
dplr
custom_code
experimental
Instructions to use TaoTern/TaoNet-mini-T2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TaoTern/TaoNet-mini-T2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TaoTern/TaoNet-mini-T2", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("TaoTern/TaoNet-mini-T2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TaoTern/TaoNet-mini-T2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TaoTern/TaoNet-mini-T2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TaoTern/TaoNet-mini-T2
- SGLang
How to use TaoTern/TaoNet-mini-T2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-T2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-T2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TaoTern/TaoNet-mini-T2 with Docker Model Runner:
docker model run hf.co/TaoTern/TaoNet-mini-T2
| # Taotern LLM Experiments | |
| This repo is the experiment ledger for building a TaoNet-style LLM whose sequence core is the Taotern SSM instead of attention. It keeps compact, reviewable artifacts: run summaries, metric CSVs, exact RepoBridge configs, and the current conclusion. Source code stays in the source repos. | |
| ## Current Decision | |
| As of `2026-05-14`, the next expensive chatbot attempt should use the pure SSM branch-only model, not the earlier SSM-first hybrid. | |
| Selected candidate: | |
| ```text | |
| architecture=taonet_ssm | |
| candidate=pure_ssm_196m_branch_rms_only | |
| params=196,573,128 | |
| hidden_dim=1024 | |
| num_layers=18 | |
| num_heads=8 | |
| hidden_dim_ff=3072 | |
| ssm_core=dplr | |
| ssm_hidden_dim=32 | |
| ssm_mixer_dim=256 | |
| ssm_num_lanes=2 | |
| ssm_lane_mode=split | |
| ssm_split_mix=none | |
| ssm_lane_combine=channel | |
| ssm_gate_type=channel | |
| ssm_local_shift=true | |
| ssm_local_shift_per_channel=true | |
| ssm_branch_rms_norm=true | |
| block_residual_rms_norm=false | |
| finite_tail_correction=false | |
| ``` | |
| The 100M-token branch-only gate completed successfully: | |
| | Metric | Value | | |
| |---|---:| | |
| | Eval loss | 3.1667 | | |
| | Eval token accuracy | 38.92% | | |
| | Forward+backward throughput | 53.0k tok/s | | |
| | Peak allocated memory | 9.17 GB | | |
| | SFT tiny-overfit | 3.3831 -> 0.0107 | | |
| | Final block RMS | 57.50 | | |
| | Final block max abs | 1605.34 | | |
| Launch scripts for the next full attempt are implemented in TaoTrain: | |
| ```text | |
| scripts/remote/run_200m_branch_only_chat.sh | |
| scripts/remote/submit_200m_branch_only_chat.sh | |
| ``` | |
| The planned run is `4B` pretrain token positions followed by `50k` corrected response-only SFT steps. See: | |
| ```text | |
| experiments/runs/2026-05-14_branch_only_100m_gate/summary.md | |
| experiments/runs/2026-05-14_200m_branch_only_4b_sft_ready/summary.md | |
| ``` | |
| ## Previous Hybrid Failure And Stabilization Trail | |
| The current SSM-only LLM is built in TaoTrain as `taonet_ssm`. It keeps the TaoNet outer shape and replaces the attention/MLA sequence mixer with a DPLR SSM mixer from `Taotern_SSM`. | |
| The earlier best LLM candidate was `taonet_hybrid`, which keeps the same TaoNet dimensions but alternates attention and SSM blocks. The selected 200M-class deployment candidate for the first long chat run was `hybrid_ssm_first_199m`, which uses 16 layers: | |
| ```text | |
| SSM -> attention -> SSM -> attention -> ... -> SSM -> attention | |
| ``` | |
| The long run `taotern-200m-hybrid-chat-20260512` on the remote RTX 5090 server completed, but the SFT checkpoint is not yet a good chatbot. RepoBridge Model Chat works, but generation quality is poor and follow-up diagnostics show the issue is in model trainability rather than the GUI. | |
| ```text | |
| /home/student/YouZheng/jobs/taotern/taotern-200m-hybrid-chat-20260512/checkpoints/sft/final_model.pt | |
| ``` | |
| Current diagnosis: the 4B-token pretrain remained high-loss, and the 50k-step SFT stage did not improve fixed SFT response loss. Tiny SFT overfit probes show huge SSM/residual gradients, and activation hooks show the residual stream growing to tens of millions by late layers. The checkpoint should be used as a failure/diagnostic artifact, not as the final deployable chat model. Follow-up code now adds SSM branch RMS normalization, optional SSM branch clamping, block residual RMS normalization, and benchmark gradient telemetry. See: | |
| ```text | |
| experiments/runs/2026-05-13_200m_chat_diagnosis/summary.md | |
| ``` | |
| For the layer-by-layer structure, DPLR equations, matrix shapes, and parameter inventory, see: | |
| ```text | |
| docs/CURRENT_SSM_LLM_ARCHITECTURE.md | |
| ``` | |
| Previous 200M real-token candidate: | |
| - Architecture: `taonet_hybrid` | |
| - SSM core: DPLR | |
| - Parameter count: `199,480,928` | |
| - Layers: 16 | |
| - SSM layers: `0,2,4,6,8,10,12,14` | |
| - Attention layers: `1,3,5,7,9,11,13,15` | |
| - Hidden dimension: `1024` | |
| - FFN dimension: `3072` | |
| - Mixer projection: `ssm_mixer_dim=256` | |
| - SSM hidden/state dimension: `ssm_hidden_dim=32` | |
| - DPLR rank: `1` | |
| - Kernel mode: `conv` | |
| - Local memory branch: enabled | |
| - Local shift gain: per-channel | |
| - Hybrid pattern: `ssm_first` | |
| - SSM gate type: channel | |
| - SSM lanes: `2` | |
| - Lane mode: split | |
| - Split mix: none | |
| - Lane combine: channel for full lanes; concatenation for split lanes | |
| - Finite-tail correction: disabled for the current best speed/quality point | |
| - Tokenizer: pilot TaoData SentencePiece 8k | |
| - Last 200M training target: TaoData JSONL next-token prediction, seq 512, batch 8, 4B base token positions, then 50k-step SFT | |
| Superseded stabilized pure-SSM candidate before the branch-only 100M gate: | |
| - Architecture: `taonet_ssm` | |
| - SSM core: DPLR | |
| - SSM hidden/state dimension: `ssm_hidden_dim=32` | |
| - Mixer projection: `ssm_mixer_dim=128` | |
| - SSM lanes: `2` | |
| - Lane mode: split | |
| - Gate type: channel | |
| - Local memory branch: enabled, per-channel | |
| - Finite-tail correction: disabled for speed | |
| - SSM branch RMS norm: enabled | |
| - SSM branch clamp: `1.0` | |
| - Block residual RMS norm: enabled | |
| - Gradient clipping: `1.0` | |
| - Learning rate: `8e-4` | |
| Latest stabilized small-token benchmark summary: | |
| | Run | Best SSM-bearing model | Eval loss | Eval accuracy | Forward+backward tok/s | Notes | | |
| |---|---|---:|---:|---:|---| | |
| | scale-control pattern sweep | hybrid single_ssm_middle | 4.6620 | 0.2188 | 1.005M | Pure SSM h16/m128 also beat attention loss on the 500-step smoke; `ssm_first` hybrid failed badly. | | |
| | stabilized pure-SSM capacity sweep | pure SSM h32/m128 | 4.5311 | 0.2492 | 0.676M | Best pure-SSM accuracy; h32/m256 has fractionally lower loss but worse speed/accuracy. | | |
| | stabilized pure-SSM LR sweep | pure SSM h32/m128 lr8e-4 | 4.5311 | 0.2492 | 0.677M | Higher LR worsens loss; keep lr8e-4. | | |
| Detailed records: | |
| ```text | |
| experiments/runs/2026-05-13_scale_control_pattern_sweep/summary.md | |
| experiments/runs/2026-05-13_stabilized_ssm_capacity_sweep/summary.md | |
| experiments/runs/2026-05-13_stabilized_ssm_lr_sweep/summary.md | |
| ``` | |
| Latest completed large SentencePiece benchmark on `/home/student/Data/TaoData/pretrain.jsonl`: | |
| | Batch | Model | Pattern | Lanes/mode | Params | Eval loss | Eval accuracy | Forward+backward tok/s | | |
| |---:|---|---|---|---:|---:|---:|---:| | |
| | 32 | attention TaoNet | - | - | 8.197M | 3.4164 | 0.3619 | 1.367M | | |
| | 32 | SSM TaoNet h16/m128 | - | 1 full | 7.630M | 3.6565 | 0.3229 | 1.151M | | |
| | 32 | SSM TaoNet h16/m128 | - | 2 full | 7.648M | 3.6342 | 0.3255 | 0.887M | | |
| | 32 | SSM TaoNet h16/m128 | - | 2 split | 7.630M | 3.6409 | 0.3249 | 0.935M | | |
| | 32 | hybrid TaoNet | ssm_first | 1 full | 7.913M | 3.3673 | 0.3665 | 1.234M | | |
| | 32 | hybrid TaoNet | ssm_first | 2 full | 7.922M | 3.3368 | 0.3716 | 1.068M | | |
| | 32 | hybrid TaoNet | ssm_first | 2 split Hadamard | 7.913M | 3.3345 | 0.3719 | 1.118M | | |
| | 32 | hybrid TaoNet | single_ssm_middle | 2 split | 8.055M | 3.3808 | 0.3649 | 1.258M | | |
| | 64 | attention TaoNet | - | - | 8.197M | 3.3946 | 0.3592 | 1.447M | | |
| | 64 | SSM TaoNet h16/m128 | - | 1 full | 7.630M | 3.5722 | 0.3331 | 1.230M | | |
| | 64 | SSM TaoNet h16/m128 | - | 2 full | 7.648M | 3.5446 | 0.3355 | 1.020M | | |
| | 64 | SSM TaoNet h16/m128 | - | 2 split | 7.630M | 3.5515 | 0.3345 | 1.152M | | |
| | 64 | hybrid TaoNet | ssm_first | 1 full | 7.913M | 3.2673 | 0.3793 | 1.325M | | |
| | 64 | hybrid TaoNet | ssm_first | 2 full | 7.922M | 3.2411 | 0.3834 | 1.190M | | |
| | 64 | hybrid TaoNet | ssm_first | 2 split | 7.913M | 3.2368 | 0.3835 | 1.271M | | |
| | 64 | hybrid TaoNet | single_ssm_middle | 2 split | 8.055M | 3.2708 | 0.3785 | 1.365M | | |
| Full per-variant results, including `attention_first`, `single_ssm_middle`, and `single_ssm_late`, are recorded in `experiments/runs/2026-05-10_split_lane_ssm_highscale/summary.md`. | |
| Pure SSM replacement is not solved yet: in the 8000-step high-scale runs, two SSM lanes improved pure SSM loss and accuracy at both batch sizes, but the pure SSM model still trails attention. Split lanes recover throughput and memory compared with full two-lane duplication, but are slightly weaker for pure SSM quality. A fixed Hadamard add/subtract cross-lane mix was tested and is mixed: it helps the batch-32 hybrid but not pure SSM or the batch-64 best point. Channel gates remain the deployment-friendly default. Exact finite-tail correction was checked earlier and was not better overall, so the approximate path remains the current hybrid default. | |
| Interpretation: | |
| - At batch 32, the best Hadamard split-lane `ssm_first` hybrid improves eval loss over attention by about `0.083` and accuracy by about `0.010`, while retaining about `82%` of attention forward+backward throughput. | |
| - At batch 64, the best plain split-lane `ssm_first` hybrid improves eval loss over attention by about `0.154` and accuracy by about `0.024`, while retaining about `88%` of attention forward+backward throughput. | |
| - Pure SSM two-lane improves over pure SSM one-lane, confirming that extra SSM capacity helps quality. | |
| - Split-lane SSM is cheaper than naive lane duplication; fixed Hadamard mixing is too rigid, so the next improvement direction should use a small learnable but ternary-friendly post-split mixer. | |
| ## Important Repos | |
| | Repo | Role | | |
| |---|---| | |
| | `StarMists/gamma_SSM_S4_enhanced` | SSM model, DPLR implementation, SSM-specific records | | |
| | `lobakkang/TaoTrain` | TaoNet, `taonet_ssm`, token benchmarks | | |
| | `lobakkang/TaoData` | Data extraction/preprocessing | | |
| | `RepoBridge` | Remote execution tool only | | |
| | `StarMists/Taotern_LLM_Experiments` | Compact experiment ledger and current conclusions | | |
| ## Layout | |
| ```text | |
| experiments/ | |
| index.csv # searchable run ledger | |
| README.md # artifact rules and workflow | |
| legacy_repobridge_configs/ # exact pre-ledger RepoBridge configs | |
| resources/ | |
| tokenizers/ # tokenizer configs, not large tokenizer binaries | |
| runs/ | |
| <run_id>/ | |
| manifest.yaml # purpose, commits, status, paths | |
| summary.md # human-readable result | |
| metrics.csv # compact metrics snapshot when available | |
| repobridge.config.json # exact remote run config | |
| docs/ | |
| WORKFLOW.md # how future runs should be recorded | |
| DIRECTORY_NAVIGATION.md # where code/data/run pieces live across repos | |
| CURRENT_SSM_LLM_ARCHITECTURE.md | |
| showcase/ # R&D showcase report, DOCX, and scaling notes | |
| ``` | |
| ## Next Action | |
| The 100M branch-only gate passed. The next full run is ready to launch with the pure SSM branch-only model: | |
| ```text | |
| 4B pretrain token positions | |
| 50k corrected response-only SFT steps | |
| final deployable checkpoint expected at checkpoints/sft/final_model.pt | |
| ``` | |
| Keep activation diagnostics after pretraining, because the residual RMS still grows across depth even though it is far below the previous failure regime. | |