File size: 2,429 Bytes
b10634e
 
4e84da6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: apache-2.0
tags:
  - build-small-hackathon
  - pgsm
  - exactstate-memory
  - non-transformer
  - language-model
  - surprisal
  - fineweb-edu
  - tiny-model
  - tiny-titan
  - well-tuned
datasets:
  - HuggingFaceFW/fineweb-edu
---

# PGSM Text Surprisal Editor Model

This repository contains the trained model weights used by the Hugging Face Space:

https://huggingface.co/spaces/build-small-hackathon/pgsm-text-surprisal-editor

## Model Summary

PGSM Text Surprisal Editor is powered by a compact non-Transformer language model based on a custom ExactState Memory / PGSM architecture.

The model is used to score whole-word surprisal by evaluating how predictable each removed word is from its left and right context.

## Architecture

- Architecture: PGSM / ExactState Memory
- Transformer blocks: 0
- Self-attention layers: 0
- Parameters: approximately 4 million
- Vocabulary: approximately 2k tokens
- Model file: `final_infer.pt`

This model does not use Transformer self-attention. Context is propagated through learned state transitions rather than pairwise attention computations.

## Training

The model was fully trained by the author on approximately 19 billion tokens from FineWeb-Edu.

Training details:

- Training source: FineWeb-Edu
- Training scale: approximately 19B tokens
- Training type: full custom training by the author
- Base architecture: PGSM / ExactState Memory
- Off-the-shelf Transformer checkpoint used: none
- Final inference weights: `final_infer.pt`

## Intended Use

This model is intended for the PGSM Text Surprisal Editor Space, where it powers whole-word surprisal heatmaps for pasted text.

The model is designed for experimentation, visualization, and language-analysis demos rather than production writing assistance or factual generation.

## Limitations

- Very small model size compared with mainstream LLMs
- Compact vocabulary
- Designed for surprisal visualization, not general-purpose chat
- Outputs should be treated as model-analysis signals, not factual judgments
- Training and evaluation details are summarized here for hackathon review

## Hackathon Context

This model supports the Hugging Face Build Small Hackathon submission:

- Track: Thousand Token Wood
- Badges: Tiny Titan, Well-Tuned, Off the Grid, Field Notes

The key goal is to demonstrate a very small, fully trained, non-Transformer language model running locally inside a Hugging Face Space.