nilmeruo
/

SurpriseLensModel

build-small-hackathon

exactstate-memory

non-transformer

Model card Files Files and versions

nilmeruo commited on 16 days ago

Commit

4e84da6

·

verified ·

1 Parent(s): ae33cb4

Upload README.md

Files changed (1) hide show

README.md +73 -3

README.md CHANGED Viewed

@@ -1,5 +1,75 @@
 ---
 license: apache-2.0
-language:
-- en
----

 ---
 license: apache-2.0
+tags:
+  - build-small-hackathon
+  - pgsm
+  - exactstate-memory
+  - non-transformer
+  - language-model
+  - surprisal
+  - fineweb-edu
+  - tiny-model
+  - tiny-titan
+  - well-tuned
+datasets:
+  - HuggingFaceFW/fineweb-edu
+---
+# PGSM Text Surprisal Editor Model
+This repository contains the trained model weights used by the Hugging Face Space:
+https://huggingface.co/spaces/build-small-hackathon/pgsm-text-surprisal-editor
+## Model Summary
+PGSM Text Surprisal Editor is powered by a compact non-Transformer language model based on a custom ExactState Memory / PGSM architecture.
+The model is used to score whole-word surprisal by evaluating how predictable each removed word is from its left and right context.
+## Architecture
+- Architecture: PGSM / ExactState Memory
+- Transformer blocks: 0
+- Self-attention layers: 0
+- Parameters: approximately 4 million
+- Vocabulary: approximately 2k tokens
+- Model file: `final_infer.pt`
+This model does not use Transformer self-attention. Context is propagated through learned state transitions rather than pairwise attention computations.
+## Training
+The model was fully trained by the author on approximately 19 billion tokens from FineWeb-Edu.
+Training details:
+- Training source: FineWeb-Edu
+- Training scale: approximately 19B tokens
+- Training type: full custom training by the author
+- Base architecture: PGSM / ExactState Memory
+- Off-the-shelf Transformer checkpoint used: none
+- Final inference weights: `final_infer.pt`
+## Intended Use
+This model is intended for the PGSM Text Surprisal Editor Space, where it powers whole-word surprisal heatmaps for pasted text.
+The model is designed for experimentation, visualization, and language-analysis demos rather than production writing assistance or factual generation.
+## Limitations
+- Very small model size compared with mainstream LLMs
+- Compact vocabulary
+- Designed for surprisal visualization, not general-purpose chat
+- Outputs should be treated as model-analysis signals, not factual judgments
+- Training and evaluation details are summarized here for hackathon review
+## Hackathon Context
+This model supports the Hugging Face Build Small Hackathon submission:
+- Track: Thousand Token Wood
+- Badges: Tiny Titan, Well-Tuned, Off the Grid, Field Notes
+The key goal is to demonstrate a very small, fully trained, non-Transformer language model running locally inside a Hugging Face Space.