Text Generation
Transformers
PyTorch
English
taonet_mini_t2
taonet
taotern
ssm
state-space-model
dplr
custom_code
experimental
Instructions to use TaoTern/TaoNet-mini-T2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TaoTern/TaoNet-mini-T2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TaoTern/TaoNet-mini-T2", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("TaoTern/TaoNet-mini-T2", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use TaoTern/TaoNet-mini-T2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TaoTern/TaoNet-mini-T2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TaoTern/TaoNet-mini-T2
- SGLang
How to use TaoTern/TaoNet-mini-T2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-T2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TaoTern/TaoNet-mini-T2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TaoTern/TaoNet-mini-T2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TaoTern/TaoNet-mini-T2 with Docker Model Runner:
docker model run hf.co/TaoTern/TaoNet-mini-T2
| # Gamma Space Model | |
| Gamma Space Model is a PyTorch codebase for experimenting with a Gamma-structured state space model (SSM) and an S4-inspired enhanced Gamma SSM while preserving a fixed lower-bidiagonal ternary-friendly transition matrix. | |
| This repository is now organized as an installable Python package: | |
| 1. `gamma_space_model/` contains the models we are actively developing. | |
| 2. `csrc/` contains optional pure-Python acceleration helpers used when the runtime supports them. | |
| 3. `output/jupyter-notebook/` contains the benchmark notebooks and saved Colab runs. | |
| Earlier versions of this repository included a copied `s4-main/` reference tree. That vendored copy has been removed so the project can behave like a normal package. S4 remains an external theory/reference dependency, cited below, rather than source code we ship inside this repo. | |
| The current project direction is: | |
| - keep the Gamma / ternary transition structure for deployment compatibility | |
| - borrow the most useful ideas from S4 for stability and full-sequence efficiency | |
| - benchmark both training-time full-sequence behavior and deployment-time recurrent behavior | |
| ## Repository Layout | |
| ```text | |
| gamma_ssm_s4_v2/ | |
| |-- gamma_space_model/ | |
| | |-- modules/ | |
| | | |-- ssm_gamma.py # original Gamma SSM core | |
| | | |-- block.py # original residual Gamma block | |
| | | |-- ssm_gamma_s4.py # S4-inspired enhanced Gamma SSM core | |
| | | |-- block_s4.py # enhanced Gamma blocks | |
| | | `-- normalization.py # LayerNorm, RMSNorm | |
| | |-- ops/ | |
| | | `-- selective_scan_interface.py | |
| | `-- __init__.py | |
| |-- csrc/ | |
| | `-- tilelang/ # optional acceleration helpers | |
| |-- output/jupyter-notebook/ | |
| | |-- gamma-s4-sinewave-benchmark.ipynb | |
| | |-- gamma-s4-research-benchmark.ipynb | |
| | |-- gamma-s4-challenge-benchmark.ipynb | |
| | `-- *_rN.ipynb # saved Colab benchmark runs | |
| |-- scripts/ | |
| | |-- generate_gamma_benchmark_notebook.py | |
| | `-- generate_gamma_challenge_benchmark_notebook.py | |
| |-- tests/ | |
| |-- EXPERIMENT_RECORD.md # run-by-run experiment history | |
| |-- pyproject.toml # package metadata for pip install | |
| |-- setup.py # compatibility shim for editable installs | |
| `-- Gamma Distributed Ternary HiPPO.pdf | |
| ``` | |
| ## What We Are Modeling | |
| At the highest level, all variants in this repo are state space models. The standard continuous-time form is | |
| ```math | |
| \dot{h}(t) = A h(t) + B u(t), \qquad y(t) = C h(t) + D u(t) | |
| ``` | |
| where: | |
| - `u(t)` is the input sequence | |
| - `h(t)` is the latent state | |
| - `y(t)` is the output sequence | |
| - `A` controls the state dynamics | |
| - `B` injects the input into the state | |
| - `C` reads the state back out | |
| - `D` is a direct skip term | |
| The main design choice in this project is to keep `A` structured and deployment-friendly. | |
| ## The Gamma Transition Matrix | |
| The original Gamma SSM in this repo uses a fixed lower-bidiagonal matrix: | |
| ```math | |
| A_{n,n} = -1, \qquad A_{n,n-1} = 1 | |
| ``` | |
| and zero everywhere else. | |
| In matrix form: | |
| ```text | |
| [-1 0 0 ...] | |
| [ 1 -1 0 ...] | |
| [ 0 1 -1 ...] | |
| [ . . . . ] | |
| ``` | |
| This is important for the project because: | |
| - it is sparse | |
| - it only uses the values `-1`, `0`, and `1` | |
| - it maps naturally to the ternary computing direction we care about | |
| That structural constraint is the main reason we do not simply replace Gamma with a standard S4 parameterization. | |
| ## The Original Gamma SSM | |
| The original implementation is in [`gamma_space_model/modules/ssm_gamma.py`](./gamma_space_model/modules/ssm_gamma.py). | |
| It uses Euler discretization: | |
| ```math | |
| h_{t+1} = h_t + \Delta t \, (A h_t + B u_t) | |
| ``` | |
| and output readout: | |
| ```math | |
| y_t = C h_{t+1} | |
| ``` | |
| Key properties of the baseline: | |
| - fixed Gamma `A` | |
| - learned `B` and `C` | |
| - scalar fixed `delta_t` | |
| - recurrent forward pass | |
| - optional TileLang/Triton accelerated scan path when available | |
| This version is the simplest and most deployment-aligned baseline in the repo. | |
| ## The S4-Inspired Enhanced Gamma SSM | |
| The enhanced implementation is in [`gamma_space_model/modules/ssm_gamma_s4.py`](./gamma_space_model/modules/ssm_gamma_s4.py). | |
| The goal is not to become full S4. The goal is to keep the Gamma transition structure while borrowing ideas that made S4 powerful and stable: | |
| - learned positive timestep `dt` | |
| - stable discretization | |
| - optional direct skip term `D` | |
| - full-sequence kernel view for parallel training/inference | |
| - recurrent stepping for deployment | |
| - Mamba-inspired input selection before the Gamma SSM | |
| ### Discretization | |
| The enhanced model supports three discretizations: | |
| #### Euler | |
| ```math | |
| \bar{A} = I + \Delta t A, \qquad \bar{B} = \Delta t B | |
| ``` | |
| #### Bilinear (Tustin) | |
| ```math | |
| \bar{A} = (I - \tfrac{1}{2}\Delta t A)^{-1}(I + \tfrac{1}{2}\Delta t A) | |
| ``` | |
| ```math | |
| \bar{B} = (I - \tfrac{1}{2}\Delta t A)^{-1}\Delta t B | |
| ``` | |
| #### ZOH | |
| ```math | |
| \bar{A} = e^{\Delta t A} | |
| ``` | |
| ```math | |
| \bar{B} = A^{-1}(\bar{A} - I)B | |
| ``` | |
| The default practical choice is bilinear discretization, because it has been the most stable and effective in our current experiments. | |
| ### Kernel View | |
| For a linear time-invariant discrete SSM, the output can also be written as a causal convolution: | |
| ```math | |
| y_t = \sum_{\ell=0}^{t} K_{\ell} u_{t-\ell} + D u_t | |
| ``` | |
| with kernel | |
| ```math | |
| K_{\ell} = C \bar{A}^{\ell} \bar{B} | |
| ``` | |
| This gives us two useful execution styles for the same model: | |
| - recurrent stepping: useful for deployment and streaming | |
| - full-sequence kernel/convolution view: useful for parallel whole-sequence computation | |
| In this repo, the enhanced model exposes: | |
| - `kernel_mode="recurrent"` | |
| - `kernel_mode="conv"` | |
| - `kernel_mode="auto"` | |
| The `auto` mode switches to the kernel path only when the sequence is long enough to justify it. | |
| ## Why This Is Not "Full S4" | |
| This is the central design tradeoff of the project. | |
| S4 gets its strongest speedups from a structured parameterization of `A` that can be diagonalized or reduced to efficient kernel computations such as Cauchy/Vandermonde operations. | |
| Our Gamma model intentionally keeps: | |
| - a fixed lower-bidiagonal `A` | |
| - ternary-friendly values | |
| - deployment-oriented structure | |
| So we borrow selected S4 ideas, but we do not inherit the entire original S4 kernel machinery. | |
| That means: | |
| - we can improve stability and full-sequence performance substantially | |
| - but the exact original S4 fast-kernel theory does not transfer directly | |
| ## Block Design | |
| ### Original Block | |
| [`gamma_space_model/modules/block.py`](./gamma_space_model/modules/block.py) defines `GammaSingleBlock`. | |
| Its structure is: | |
| ```text | |
| x | |
| -> LayerNorm (if prenorm) | |
| -> Gamma SSM | |
| -> Dropout | |
| -> Residual add | |
| -> optional postnorm | |
| ``` | |
| ### Enhanced Block | |
| [`gamma_space_model/modules/block_s4.py`](./gamma_space_model/modules/block_s4.py) defines `GammaS4Block`. | |
| Its structure is: | |
| ```text | |
| x | |
| -> LayerNorm | |
| -> optional input-selection gate | |
| -> SSMGammaS4 | |
| -> activation | |
| -> optional gate | |
| -> optional output linear | |
| -> layer scale | |
| -> dropout | |
| -> residual add | |
| ``` | |
| In equations, the enhanced block is roughly: | |
| ```math | |
| \tilde{x} = \mathrm{Norm}(x) | |
| ``` | |
| ```math | |
| u = \tilde{x} \odot \sigma(W_{in\_gate}\tilde{x} + b_{in\_gate}) | |
| ``` | |
| ```math | |
| s = \mathrm{SSMGammaS4}(u) | |
| ``` | |
| ```math | |
| z = \mathrm{OutputLinear}(\sigma(s) \odot \mathrm{Gate}(\tilde{x})) | |
| ``` | |
| ```math | |
| \mathrm{Block}(x) = x + \alpha z | |
| ``` | |
| where: | |
| - `sigma` is the chosen activation | |
| - the input-selection gate is a per-token channel gate inspired by Mamba's selective input flow | |
| - `Gate` may be absent | |
| - `alpha` is a learnable layer-scale parameter | |
| The input-selection gate does not change the fixed Gamma transition matrix `A`. It modulates the input before it enters the state, so the model can learn to emphasize information-bearing tokens and suppress blanks/noise while keeping the ternary-friendly state transition intact. | |
| There is also a lighter variant: | |
| - `GammaS4MinimalBlock` | |
| which removes the richer input-gating/output-gating/output-linear pathway and keeps only the core S4-inspired stability changes. | |
| ## Stacked Model Pattern | |
| The benchmark notebooks use a simple stacked forecaster pattern: | |
| ```math | |
| x_0 = W_{in} u | |
| ``` | |
| ```math | |
| x_{\ell+1} = \mathrm{Block}_{\ell}(x_{\ell}) | |
| ``` | |
| ```math | |
| \hat{y} = W_{out} x_L | |
| ``` | |
| This pattern is implemented inside the notebooks rather than as a dedicated package module because we are still iterating rapidly on benchmark design. | |
| ## Full-Sequence vs Recurrent vs Deployment-Lite | |
| The benchmarks report several distinct execution modes. | |
| ### 1. Full-sequence | |
| The whole sequence is passed to `forward(...)` in one call. | |
| This is the relevant mode for: | |
| - standard training | |
| - offline batch inference | |
| - full-sequence throughput comparisons | |
| Important detail: | |
| - full-sequence does not always mean convolution is being used | |
| - it only means the whole sequence is evaluated in one forward pass | |
| ### 2. Recurrent | |
| The model is stepped token by token using `step(...)` while carrying state. | |
| This is the relevant mode for: | |
| - streaming inference | |
| - autoregressive deployment | |
| - hardware-style stateful execution | |
| ### 3. Deployment-lite | |
| This is an experimental recurrent inference mode for `GammaS4Block`. | |
| It uses the same trained weights but simplifies part of the block-time recurrent computation to reduce runtime. The current lite path is intentionally aggressive: it skips the input-selection gate and post-SSM gate/output branch during recurrent stepping. The benchmark reports both: | |
| - speed improvement | |
| - output mismatch versus the standard full-sequence prediction | |
| For the baseline model, deployment and recurrent are effectively the same path, so those numbers match exactly. | |
| ### 4. Balanced deployment | |
| This is a middle-ground recurrent inference mode for `GammaS4Block`. | |
| It keeps the trained output projection, but replaces the input-selection and post-SSM gates with static gates derived from the learned gate biases. This is meant to test whether we can recover more fidelity than deployment-lite while still avoiding the full gate cost at every token. | |
| In the notebooks, these metrics appear as `balanced_deploy_*`. | |
| ## Current Empirical Status | |
| The benchmark notebooks live in: | |
| - [`output/jupyter-notebook/gamma-s4-sinewave-benchmark.ipynb`](./output/jupyter-notebook/gamma-s4-sinewave-benchmark.ipynb) | |
| - [`output/jupyter-notebook/gamma-s4-research-benchmark.ipynb`](./output/jupyter-notebook/gamma-s4-research-benchmark.ipynb) | |
| - [`output/jupyter-notebook/gamma-s4-challenge-benchmark.ipynb`](./output/jupyter-notebook/gamma-s4-challenge-benchmark.ipynb) | |
| The challenge notebook is separate from the quick and research notebooks. It tracks harder capability-style tests: | |
| - permuted MNIST for long-range image-as-sequence memory | |
| - selective copying for sparse content recall | |
| - induction-style recall for key-value association across a sequence | |
| - token-memory curriculum tiers for easy, moderate, and hard selective/induction diagnostics | |
| The research and challenge notebooks also include inference-oriented sections. These separate full-sequence prefill-style inference from recurrent decode-style inference, report deployment-lite and balanced deployment variants where available, and include quality/fidelity columns so speed is not interpreted independently from accuracy or output mismatch. | |
| Saved Colab runs are also committed in the same folder using `_rN` suffixes. | |
| ### Experiment record guide | |
| The run-by-run history is summarized in [`EXPERIMENT_RECORD.md`](./EXPERIMENT_RECORD.md). | |
| Use that file when you want to answer: | |
| - what changed between `_r1`, `_r2`, ..., `_r10` | |
| - which model variants were tested in each run | |
| - which notebook and task configuration produced each result | |
| - which results are complete, partial, or not recorded | |
| - what we learned from each run | |
| The saved notebooks under `output/jupyter-notebook/` are the raw experiment artifacts. They include the actual Colab outputs, plots, printed metrics, and configuration cells. The experiment record is the human-readable index over those artifacts. | |
| The naming convention is: | |
| - `gamma_s4_sinewave_benchmark_r1.ipynb` and `gamma_s4_sinewave_benchmark_r2.ipynb` | |
| - early single-notebook experiments before the quick/research split | |
| - `gamma-s4-sinewave-benchmark_rN.ipynb` | |
| - saved quick benchmark runs | |
| - use these for fast regression history | |
| - `gamma-s4-research-benchmark_rN.ipynb` | |
| - saved practical/research benchmark runs | |
| - use these for presentation and deeper analysis | |
| - `gamma-s4-challenge-benchmark_rN.ipynb` | |
| - saved challenge benchmark runs | |
| - use these for permuted MNIST, selective copying, and induction-style recall history | |
| For the current project state, the most useful records are: | |
| - `_r9` research benchmark: best presentation artifact | |
| - `_r10` quick/research benchmarks: latest deployment fidelity comparison with balanced deployment metrics | |
| - `_r8` research benchmark: first mature run with token-lite enabled and strong long-context conv results | |
| - `_r2`: early evidence that `gamma_s4_enhanced` could outperform baseline on harder sequence tasks | |
| When presenting to others, start with the README for theory and architecture, then use `EXPERIMENT_RECORD.md` for the experiment timeline, then open the latest `_rN` research notebook for plots and raw outputs. | |
| ### What the quick benchmark is for | |
| The sinewave/quick notebook is the fast regression loop. It is used to answer: | |
| - does the model still train correctly? | |
| - does full-sequence performance remain strong? | |
| - does recurrent/deployment behavior regress? | |
| ### What the research benchmark is for | |
| The research notebook is the more presentable and more practical benchmark. It currently includes: | |
| - `current_reference`: medium practical forecasting task | |
| - `long_context`: harder long-range forecasting task | |
| - `token-lite`: lightweight character-level next-token benchmark | |
| ### Latest recorded highlights | |
| From the latest committed `_r10` GPU runs: | |
| - Quick benchmark: | |
| - enhanced val loss is much lower than baseline on both `simple` and `moderate` | |
| - enhanced full-sequence throughput is higher than baseline | |
| - enhanced recurrent inference is still slower than baseline | |
| - Research benchmark: | |
| - `current_reference` | |
| - enhanced validation loss: `0.019951` | |
| - baseline validation loss: `0.709749` | |
| - `long_context` | |
| - enhanced validation loss: `0.011708` | |
| - baseline validation loss: `27.229956` | |
| - enhanced mean epoch time: `15.86s` | |
| - baseline mean epoch time: `40.35s` | |
| - enhanced full-sequence throughput: `16957 tokens/s` | |
| - baseline full-sequence throughput: `2395 tokens/s` | |
| - enhanced balanced deployment match MSE: `0.001692` | |
| - enhanced deployment-lite match MSE: `0.200325` | |
| - `token-lite` | |
| - enhanced validation CE: `2.4868` | |
| - baseline validation CE: `3.1322` | |
| - enhanced perplexity: `12.02` | |
| - baseline perplexity: `22.92` | |
| The challenge notebook has been added after `_r10`, so its first saved run should be treated as the first challenge-task record. | |
| ### Practical interpretation | |
| The current picture is: | |
| - the enhanced Gamma SSM is now clearly the stronger model for full-sequence training and harder tasks | |
| - the long-context conv/full-sequence path is finally showing meaningful advantages | |
| - recurrent deployment remains the main remaining weakness | |
| - challenge recall tasks remain near random, so the next question is whether the model can solve easier curriculum tiers before investing in harder selective-memory variants | |
| ## Installation | |
| The distribution package is named `gamma-ssm-s4-enhanced`; the Python import package remains `gamma_space_model`. | |
| ### Install From This Checkout | |
| ```bash | |
| pip install -e . | |
| ``` | |
| ### Install From GitHub | |
| ```bash | |
| pip install "git+https://github.com/StarMists/gamma_SSM_S4_enhanced.git" | |
| ``` | |
| ### Install From A Private GitHub Repo | |
| Use a GitHub personal access token with read access to the private repository. In Colab or a shell, prefer keeping the token in an environment variable instead of hard-coding it into notebooks: | |
| ```bash | |
| export GITHUB_TOKEN="ghp_your_token_here" | |
| pip install "git+https://${GITHUB_TOKEN}@github.com/StarMists/gamma_SSM_S4_enhanced.git" | |
| ``` | |
| In Google Colab: | |
| ```python | |
| import os | |
| os.environ["GITHUB_TOKEN"] = "ghp_your_token_here" | |
| !pip install "git+https://${GITHUB_TOKEN}@github.com/StarMists/gamma_SSM_S4_enhanced.git" | |
| ``` | |
| ### Install Optional Extras | |
| For development: | |
| ```bash | |
| pip install -e ".[dev]" | |
| ``` | |
| For benchmark notebooks: | |
| ```bash | |
| pip install -e ".[notebook]" | |
| ``` | |
| For optional performance dependencies: | |
| ```bash | |
| pip install -e ".[performance]" | |
| ``` | |
| For a private GitHub install with extras, use the PEP 508 form: | |
| ```bash | |
| pip install "gamma-ssm-s4-enhanced[notebook,performance] @ git+https://${GITHUB_TOKEN}@github.com/StarMists/gamma_SSM_S4_enhanced.git" | |
| ``` | |
| Notes: | |
| - CUDA / Triton / TileLang acceleration is optional | |
| - the code automatically falls back to pure PyTorch paths when those kernels are unavailable | |
| - installing from GitHub gives downstream LLM code access to the latest committed package without cloning the repo manually | |
| ## Quick Start | |
| ### Original Gamma block | |
| ```python | |
| import torch | |
| from gamma_space_model import GammaSingleBlock | |
| block = GammaSingleBlock( | |
| d_model=64, | |
| hidden_dim=128, | |
| delta_t=0.1, | |
| prenorm=True, | |
| ) | |
| x = torch.randn(2, 128, 64) | |
| y, h_T = block(x) | |
| print(y.shape, h_T.shape) | |
| ``` | |
| ### Enhanced Gamma block | |
| ```python | |
| import torch | |
| from gamma_space_model import GammaS4Block | |
| block = GammaS4Block( | |
| d_model=64, | |
| hidden_dim=128, | |
| discretization="bilinear", | |
| kernel_mode="auto", | |
| kernel_threshold=384, | |
| input_gate=True, | |
| gate=True, | |
| use_D=True, | |
| ) | |
| x = torch.randn(2, 512, 64) | |
| y, h_T = block(x) | |
| print(y.shape, h_T.shape) | |
| ``` | |
| ### Export discretized matrices for deployment | |
| ```python | |
| from gamma_space_model import SSMGammaS4 | |
| ssm = SSMGammaS4(state_dim=64, hidden_dim=128) | |
| deployment_mats = ssm.export_inference_matrices() | |
| print(deployment_mats.keys()) | |
| ``` | |
| ## Tests | |
| The main test files are: | |
| - [`tests/test_ssm_gamma.py`](./tests/test_ssm_gamma.py) | |
| - [`tests/test_block.py`](./tests/test_block.py) | |
| - [`tests/test_ssm_gamma_s4.py`](./tests/test_ssm_gamma_s4.py) | |
| These cover: | |
| - forward and step behavior | |
| - state initialization | |
| - recurrent/full-sequence agreement | |
| - enhanced conv vs recurrent consistency | |
| - deployment cache paths | |
| ## Roadmap / Next Improvement Points | |
| These are the main next-step items identified from the latest benchmark runs. They are intentionally listed as a to-do record, not yet implemented in this README update. | |
| 1. Improve enhanced recurrent inference further. | |
| - Full-sequence behavior is now strong. | |
| - Recurrent deployment remains the largest performance gap. | |
| 2. Improve deployment-lite fidelity. | |
| - It speeds up recurrent inference, but its output mismatch on `long_context` is still too large. | |
| 3. Improve structured kernel generation further. | |
| - The current conv path is much better than before, but it still uses a direct kernel-building loop rather than the strongest possible structured kernel derivation. | |
| 4. Strengthen the token benchmark. | |
| - The current token-lite task is useful for relative comparison, but still too small to serve as a final language-model benchmark. | |
| 5. Add more presentation-oriented result summaries. | |
| - The research notebook now includes task previews and prediction/error plots. | |
| - A future pass could add automated summary tables or figure exports for slide decks. | |
| ## References | |
| ### Internal project note | |
| - `Gamma Distributed Ternary HiPPO.pdf` in the repository root | |
| ### External references | |
| - Albert Gu, Karan Goel, Christopher Re. *Efficiently Modeling Long Sequences with Structured State Spaces*. ICLR 2022. | |
| - [arXiv](https://arxiv.org/abs/2111.00396) | |
| - [official S4 repository](https://github.com/state-spaces/s4) | |
| - Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, Christopher Re. *HiPPO: Recurrent Memory with Optimal Polynomial Projections*. NeurIPS 2020. | |
| - [arXiv](https://arxiv.org/abs/2008.07669) | |
| - Albert Gu, Tri Dao. *Mamba: Linear-Time Sequence Modeling with Selective State Spaces*. 2023. | |
| - [arXiv](https://arxiv.org/abs/2312.00752) | |
| ## Citation | |
| Yes, adding citations is good practice here. This repo builds directly on ideas from HiPPO and S4, and the README should make that explicit. | |
| If you use this repository in academic or technical writing, please cite the upstream S4, HiPPO, and Mamba papers above, and mention that this codebase studies a Gamma-structured SSM with S4-inspired enhancements. | |
| Example BibTeX entries: | |
| ```bibtex | |
| @inproceedings{gu2022s4, | |
| title={Efficiently Modeling Long Sequences with Structured State Spaces}, | |
| author={Gu, Albert and Goel, Karan and Re, Christopher}, | |
| booktitle={International Conference on Learning Representations}, | |
| year={2022} | |
| } | |
| @inproceedings{gu2020hippo, | |
| title={HiPPO: Recurrent Memory with Optimal Polynomial Projections}, | |
| author={Gu, Albert and Dao, Tri and Ermon, Stefano and Rudra, Atri and Re, Christopher}, | |
| booktitle={Advances in Neural Information Processing Systems}, | |
| year={2020} | |
| } | |
| @article{gu2023mamba, | |
| title={Mamba: Linear-Time Sequence Modeling with Selective State Spaces}, | |
| author={Gu, Albert and Dao, Tri}, | |
| journal={arXiv preprint arXiv:2312.00752}, | |
| year={2023} | |
| } | |
| ``` | |