BWSK T5-small

T5-small (60M params) trained in 6 variants (3 BWSK modes x 2 experiments) on WikiText-2 with full convergence training and early stopping.

This repo contains all model weights, configs, and training results in a single consolidated repository.

What is BWSK?

BWSK is a framework that classifies every neural network operation as S-type (information-preserving, reversible, coordination-free) or K-type (information-erasing, synchronization point) using combinator logic. This classification enables reversible backpropagation through S-phases to save memory, and CALM-based parallelism analysis.

Model Overview

Property	Value
Base Model	google-t5/t5-small
Architecture	Transformer (seq2seq)
Parameters	60M
Dataset	WikiText-2
Eval Metric	Perplexity

S/K Classification

Type	Ratio
S-type (information-preserving)	70.5%
K-type (information-erasing)	29.5%

Fine-tune Results

Mode	Final Loss	Val Perplexity	Test Perplexity	Peak Memory	Time	Epochs
Conventional	3.6739	31.64	30.62	2.2 GB	13.7m	10
BWSK Analyzed	3.7370	31.62	30.60	2.2 GB	14.7m	10
BWSK Reversible	3.5710	31.64	30.60	1.4 GB	16.2m	10

Memory savings (reversible vs conventional): 36.4%

From Scratch Results

Mode	Final Loss	Val Perplexity	Test Perplexity	Peak Memory	Time	Epochs
Conventional	4.7316	236.61	234.27	2.2 GB	14.7m	10
BWSK Analyzed	4.4787	234.60	232.10	2.2 GB	14.8m	10
BWSK Reversible	5.3226	231.85	230.42	1.4 GB	16.2m	10

Memory savings (reversible vs conventional): 36.4%

Repository Structure

├── README.md
├── results.json
├── finetune-conventional/
│   ├── model.safetensors
│   ├── config.json
│   └── training_results.json
├── finetune-bwsk-analyzed/
│   ├── model.safetensors
│   ├── config.json
│   └── training_results.json
├── finetune-bwsk-reversible/
│   ├── model.safetensors
│   ├── config.json
│   └── training_results.json
├── scratch-conventional/
│   ├── model.safetensors
│   ├── config.json
│   └── training_results.json
├── scratch-bwsk-analyzed/
│   ├── model.safetensors
│   ├── config.json
│   └── training_results.json
├── scratch-bwsk-reversible/
│   ├── model.safetensors
│   ├── config.json
│   └── training_results.json

Usage

Load a specific variant:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load fine-tuned conventional variant
model = AutoModelForSeq2SeqLM.from_pretrained(
    "tzervas/bwsk-t5-small", subfolder="finetune-conventional"
)
tokenizer = AutoTokenizer.from_pretrained(
    "tzervas/bwsk-t5-small", subfolder="finetune-conventional"
)

# Load from-scratch BWSK reversible variant
model = AutoModelForSeq2SeqLM.from_pretrained(
    "tzervas/bwsk-t5-small", subfolder="scratch-bwsk-reversible"
)

Training Configuration

Setting	Value
Optimizer	AdamW
LR (fine-tune)	5e-05
LR (from-scratch)	3e-04
LR Schedule	Cosine with warmup
Max Grad Norm	1.0
Mixed Precision	AMP (float16)
Early Stopping	Patience 3
Batch Size	4
Sequence Length	512

Citation

@software{zervas2026bwsk,
  author = {Zervas, Tyler},
  title = {BWSK: Combinator-Typed Neural Network Analysis},
  year = {2026},
  url = {https://github.com/tzervas/ai-s-combinator},
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for tzervas/bwsk-t5-small

Base model

google-t5/t5-small

Finetuned

(2305)

this model

Dataset used to train tzervas/bwsk-t5-small

Evaluation results

perplexity on wikitext
self-reported

30.621
perplexity on wikitext
self-reported

30.600
perplexity on wikitext
self-reported

30.596
perplexity on wikitext
self-reported

234.267
perplexity on wikitext
self-reported

232.105
perplexity on wikitext
self-reported

230.423

tzervas
/

bwsk-t5-small