BWSK T5-small

T5-small (60M params) trained in 6 variants (3 BWSK modes x 2 experiments) on WikiText-2 with full convergence training and early stopping.

This repo contains all model weights, configs, and training results in a single consolidated repository.

What is BWSK?

BWSK is a framework that classifies every neural network operation as S-type (information-preserving, reversible, coordination-free) or K-type (information-erasing, synchronization point) using combinator logic. This classification enables reversible backpropagation through S-phases to save memory, and CALM-based parallelism analysis.

Model Overview

Property Value
Base Model google-t5/t5-small
Architecture Transformer (seq2seq)
Parameters 60M
Dataset WikiText-2
Eval Metric Perplexity

S/K Classification

Type Ratio
S-type (information-preserving) 70.5%
K-type (information-erasing) 29.5%

Fine-tune Results

Mode Final Loss Val Perplexity Test Perplexity Peak Memory Time Epochs
Conventional 3.6739 31.64 30.62 2.2 GB 13.7m 10
BWSK Analyzed 3.7370 31.62 30.60 2.2 GB 14.7m 10
BWSK Reversible 3.5710 31.64 30.60 1.4 GB 16.2m 10

Memory savings (reversible vs conventional): 36.4%

From Scratch Results

Mode Final Loss Val Perplexity Test Perplexity Peak Memory Time Epochs
Conventional 4.7316 236.61 234.27 2.2 GB 14.7m 10
BWSK Analyzed 4.4787 234.60 232.10 2.2 GB 14.8m 10
BWSK Reversible 5.3226 231.85 230.42 1.4 GB 16.2m 10

Memory savings (reversible vs conventional): 36.4%

Repository Structure

β”œβ”€β”€ README.md
β”œβ”€β”€ results.json
β”œβ”€β”€ finetune-conventional/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ config.json
β”‚   └── training_results.json
β”œβ”€β”€ finetune-bwsk-analyzed/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ config.json
β”‚   └── training_results.json
β”œβ”€β”€ finetune-bwsk-reversible/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ config.json
β”‚   └── training_results.json
β”œβ”€β”€ scratch-conventional/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ config.json
β”‚   └── training_results.json
β”œβ”€β”€ scratch-bwsk-analyzed/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ config.json
β”‚   └── training_results.json
β”œβ”€β”€ scratch-bwsk-reversible/
β”‚   β”œβ”€β”€ model.safetensors
β”‚   β”œβ”€β”€ config.json
β”‚   └── training_results.json

Usage

Load a specific variant:

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

# Load fine-tuned conventional variant
model = AutoModelForSeq2SeqLM.from_pretrained(
    "tzervas/bwsk-t5-small", subfolder="finetune-conventional"
)
tokenizer = AutoTokenizer.from_pretrained(
    "tzervas/bwsk-t5-small", subfolder="finetune-conventional"
)

# Load from-scratch BWSK reversible variant
model = AutoModelForSeq2SeqLM.from_pretrained(
    "tzervas/bwsk-t5-small", subfolder="scratch-bwsk-reversible"
)

Training Configuration

Setting Value
Optimizer AdamW
LR (fine-tune) 5e-05
LR (from-scratch) 3e-04
LR Schedule Cosine with warmup
Max Grad Norm 1.0
Mixed Precision AMP (float16)
Early Stopping Patience 3
Batch Size 4
Sequence Length 512

Links

Citation

@software{zervas2026bwsk,
  author = {Zervas, Tyler},
  title = {BWSK: Combinator-Typed Neural Network Analysis},
  year = {2026},
  url = {https://github.com/tzervas/ai-s-combinator},
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tzervas/bwsk-t5-small

Base model

google-t5/t5-small
Finetuned
(2244)
this model

Dataset used to train tzervas/bwsk-t5-small

Evaluation results