nilmeruo commited on
Commit
4e84da6
·
verified ·
1 Parent(s): ae33cb4

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -3
README.md CHANGED
@@ -1,5 +1,75 @@
1
  ---
2
  license: apache-2.0
3
- language:
4
- - en
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ tags:
4
+ - build-small-hackathon
5
+ - pgsm
6
+ - exactstate-memory
7
+ - non-transformer
8
+ - language-model
9
+ - surprisal
10
+ - fineweb-edu
11
+ - tiny-model
12
+ - tiny-titan
13
+ - well-tuned
14
+ datasets:
15
+ - HuggingFaceFW/fineweb-edu
16
+ ---
17
+
18
+ # PGSM Text Surprisal Editor Model
19
+
20
+ This repository contains the trained model weights used by the Hugging Face Space:
21
+
22
+ https://huggingface.co/spaces/build-small-hackathon/pgsm-text-surprisal-editor
23
+
24
+ ## Model Summary
25
+
26
+ PGSM Text Surprisal Editor is powered by a compact non-Transformer language model based on a custom ExactState Memory / PGSM architecture.
27
+
28
+ The model is used to score whole-word surprisal by evaluating how predictable each removed word is from its left and right context.
29
+
30
+ ## Architecture
31
+
32
+ - Architecture: PGSM / ExactState Memory
33
+ - Transformer blocks: 0
34
+ - Self-attention layers: 0
35
+ - Parameters: approximately 4 million
36
+ - Vocabulary: approximately 2k tokens
37
+ - Model file: `final_infer.pt`
38
+
39
+ This model does not use Transformer self-attention. Context is propagated through learned state transitions rather than pairwise attention computations.
40
+
41
+ ## Training
42
+
43
+ The model was fully trained by the author on approximately 19 billion tokens from FineWeb-Edu.
44
+
45
+ Training details:
46
+
47
+ - Training source: FineWeb-Edu
48
+ - Training scale: approximately 19B tokens
49
+ - Training type: full custom training by the author
50
+ - Base architecture: PGSM / ExactState Memory
51
+ - Off-the-shelf Transformer checkpoint used: none
52
+ - Final inference weights: `final_infer.pt`
53
+
54
+ ## Intended Use
55
+
56
+ This model is intended for the PGSM Text Surprisal Editor Space, where it powers whole-word surprisal heatmaps for pasted text.
57
+
58
+ The model is designed for experimentation, visualization, and language-analysis demos rather than production writing assistance or factual generation.
59
+
60
+ ## Limitations
61
+
62
+ - Very small model size compared with mainstream LLMs
63
+ - Compact vocabulary
64
+ - Designed for surprisal visualization, not general-purpose chat
65
+ - Outputs should be treated as model-analysis signals, not factual judgments
66
+ - Training and evaluation details are summarized here for hackathon review
67
+
68
+ ## Hackathon Context
69
+
70
+ This model supports the Hugging Face Build Small Hackathon submission:
71
+
72
+ - Track: Thousand Token Wood
73
+ - Badges: Tiny Titan, Well-Tuned, Off the Grid, Field Notes
74
+
75
+ The key goal is to demonstrate a very small, fully trained, non-Transformer language model running locally inside a Hugging Face Space.