# Directory Navigation Guide This guide maps the project by responsibility. Use it when a new thread needs to find the SSM code, LLM wrapper, data pipeline, train/test scripts, or remote-run artifacts quickly. ## Local Workspace Current local workspace root: ```text C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern ``` Main local repos: | Purpose | Local path | GitHub | |---|---|---| | Experiment ledger | `C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\Taotern_LLM_Experiments` | `https://github.com/StarMists/Taotern_LLM_Experiments` | | SSM model | `C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\Taotern_SSM` | `https://github.com/StarMists/gamma_SSM_S4_enhanced` | | TaoTrain LLM code | `C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\TaoTrain` | `https://github.com/lobakkang/TaoTrain` | | Remote run tool | `C:\Users\YouZheng\Documents\LYZ\MyContent\MyComp\RepoBridge` | local tool repo | | TaoData scripts | not currently cloned under this workspace | `https://github.com/lobakkang/TaoData` | ## Experiment Ledger Path: ```text C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\Taotern_LLM_Experiments ``` Important files: | File | Purpose | |---|---| | `README.md` | Current SSM LLM status and attention TaoNet comparison | | `experiments/index.csv` | Searchable run ledger | | `experiments/runs//manifest.yaml` | Run purpose, commits, data, status | | `experiments/runs//summary.md` | Human-readable result | | `experiments/runs//metrics.csv` | Compact metric snapshot | | `experiments/runs//repobridge.config.json` | Exact remote-run config | | `experiments/resources/tokenizers/` | Small tokenizer configs only | | `docs/WORKFLOW.md` | How future runs should be recorded | | `docs/CURRENT_SSM_LLM_ARCHITECTURE.md` | Current TaoNet-SSM layers, equations, matrices, parameters | Rule: keep this repo compact. Commit summaries/configs/CSV metrics, not raw output trees or checkpoints. ## SSM Model Repo Path: ```text C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\Taotern_SSM ``` Main code locations: | Area | File or directory | Notes | |---|---|---| | DPLR SSM core | `gamma_space_model/modules/s4_ternary_dplr_ssm.py` | Current main SSM core for TaoNet-SSM | | Gamma S4 core | `gamma_space_model/modules/ssm_gamma_s4.py` | Older Gamma/S4-style core | | Baseline Gamma core | `gamma_space_model/modules/ssm_gamma.py` | Baseline/reference SSM | | SSM blocks | `gamma_space_model/modules/block*.py` | Standalone SSM block wrappers | | TileLang/Triton fallback area | `csrc/tilelang/` | Capability detection and fallback code | | Selective scan op wrapper | `gamma_space_model/ops/selective_scan_interface.py` | SSM op interface | | DPLR profiler | `scripts/profile_dplr_frequency_path.py` | Profiles DPLR frequency path | | TileLang diagnosis | `scripts/diagnose_tilelang_acceleration.py` | Reports real vs fallback acceleration | | SSM variant benchmark | `scripts/benchmark_ssm_variants.py` | Standalone SSM benchmarks | | SSM tests | `tests/test_s4_ternary_dplr_ssm.py`, `tests/test_ssm_gamma*.py` | Core correctness tests | | Historical record | `EXPERIMENT_RECORD.md` | Older narrative record; new LLM records should be mirrored into this experiment ledger | When improving the SSM model itself, start from: ```text gamma_space_model/modules/s4_ternary_dplr_ssm.py ``` When working on hardware acceleration, start from: ```text csrc/tilelang/ scripts/profile_dplr_frequency_path.py scripts/diagnose_tilelang_acceleration.py ``` Remote SSM path used by RepoBridge runs: ```text /home/student/YouZheng/gamma_ssm_repo ``` ## TaoTrain LLM Repo Path: ```text C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\TaoTrain ``` Main code locations: | Area | File or directory | Notes | |---|---|---| | Attention TaoNet baseline | `src/taoTrain/models/taonet.py` | Reference model for comparisons | | SSM TaoNet wrapper | `src/taoTrain/models/taonet_ssm.py` | Replaces attention core with SSM mixer | | Model config schema | `src/taoTrain/config.py` | SSM flags live here: hidden dim, mixer dim, shift, kernel mode | | Model registry | `src/taoTrain/models/registry.py` | Architecture registration | | Token/data utilities | `src/taoTrain/data/` | JSONL and tokenization data paths | | Tokenizer trainer | `src/taoTrain/tokenizers/trainer.py` | SentencePiece training path | | Training loop | `src/taoTrain/training/trainer.py` | Full trainer implementation | | CLI | `src/taoTrain/cli.py` | TaoTrain command entry | | Real-token benchmark | `scripts/benchmark_taonet_real_tokens.py` | Main attention vs SSM benchmark for TaoData token tasks | | Synthetic token benchmark | `scripts/benchmark_taonet_token_variants.py` | Previous/increment/random token probes | | TaoData pilot tokenizer config | `configs/tokenizer_taodata_pilot.yaml` | Generated pilot 8k SentencePiece tokenizer | | SSM pretrain config | `configs/ssm_pretrain.yaml` | Config path for SSM pretraining experiments | | SSM wrapper tests | `tests/test_taonet_ssm.py` | Shape and config behavior tests | Current real-token benchmark entry point: ```text scripts/benchmark_taonet_real_tokens.py ``` Current SSM wrapper entry point: ```text src/taoTrain/models/taonet_ssm.py ``` Current attention baseline: ```text src/taoTrain/models/taonet.py ``` Remote TaoTrain path used by RepoBridge: ```text /home/student/YouZheng/repo ``` ## TaoData GitHub: ```text https://github.com/lobakkang/TaoData ``` Current local status: ```text No local TaoData checkout was found under C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase as of 2026-04-30. ``` Remote data path used in current benchmarks: ```text /home/student/Data/TaoData/pretrain.jsonl.fineweb.jsonl ``` Current pilot tokenizer path on remote: ```text /home/student/YouZheng/tokenizers/taodata_pilot_8k/tokenizer.model /home/student/YouZheng/tokenizers/taodata_pilot_8k/tokenizer.vocab ``` Tokenizer config snapshot in this ledger: ```text experiments/resources/tokenizers/taodata_pilot_8k.yaml ``` When TaoData is cloned locally, update this guide with the exact data download/generation scripts and any preprocessing entry points. ## RepoBridge Path: ```text C:\Users\YouZheng\Documents\LYZ\MyContent\MyComp\RepoBridge ``` Important files: | File or directory | Purpose | |---|---| | `repobridge/core.py` | Sync, SSH, SFTP, run, download implementation | | `repobridge/cli.py` | CLI entry point | | `repobridge/app.py` | GUI | | `CODEX_OPERATOR_GUIDE.md` | Codex remote-run guide | | `PRODUCTION_RUNBOOK.md` | Production checklist | | old `repobridge.*.config.json` files | Historical configs; new experiment configs should live in this ledger | Preferred future location for experiment configs: ```text Taotern_LLM_Experiments\experiments\runs\\repobridge.config.json ``` Remote write root: ```text /home/student/YouZheng ``` Remote output base: ```text /home/student/YouZheng/outputs-taotrain ``` Important operational note: Avoid downloading the whole remote output base if it contains many historical runs. Prefer downloading or copying only the specific run folder. ## Current Best SSM LLM Path To inspect the current best SSM LLM implementation: 1. Open TaoTrain wrapper: ```text C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\TaoTrain\src\taoTrain\models\taonet_ssm.py ``` 2. Follow the DPLR core import into: ```text C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\Taotern_SSM\gamma_space_model\modules\s4_ternary_dplr_ssm.py ``` 3. Compare against attention TaoNet: ```text C:\Users\YouZheng\Documents\LYZ\MyContent\MyLLM\Codebase\Taotern\TaoTrain\src\taoTrain\models\taonet.py ``` 4. Reproduce current best benchmark with: ```text Taotern_LLM_Experiments\experiments\runs\2026-04-29_spm_b32_500step_mixer_sweep\repobridge.config.json ``` 5. Read current conclusion in: ```text Taotern_LLM_Experiments\README.md ```