Peter Organisciak commited on
Commit
053a22c
·
1 Parent(s): 719b027

Initial deploy: OCS Semantic Scoring HF Space

Browse files

- MOTES 100k model as default, auto-downloaded from HF Hub
- GloVe 840B noted as available for local/self-hosted use (too large to host)
- Single and batch CSV scoring with configurable options
- Fix Gradio 6 theme deprecation (move to launch())

Files changed (7) hide show
  1. .gitattributes +0 -1
  2. .gitignore +9 -0
  3. README.md +85 -6
  4. app.py +270 -0
  5. assets/idf-vals.parquet +3 -0
  6. requirements.txt +9 -0
  7. scoring.py +336 -0
.gitattributes CHANGED
@@ -25,7 +25,6 @@
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
  *.tgz filter=lfs diff=lfs merge=lfs -text
31
  *.wasm filter=lfs diff=lfs merge=lfs -text
 
25
  *.safetensors filter=lfs diff=lfs merge=lfs -text
26
  saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
  *.tar.* filter=lfs diff=lfs merge=lfs -text
 
28
  *.tflite filter=lfs diff=lfs merge=lfs -text
29
  *.tgz filter=lfs diff=lfs merge=lfs -text
30
  *.wasm filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ .venv/
4
+ *.egg-info/
5
+ .beads/
6
+ hf-models/
7
+ models/
8
+ push_to_hf.sh
9
+ AGENTS.md
README.md CHANGED
@@ -1,12 +1,91 @@
1
  ---
2
- title: Ocs Semantic Scoring
3
- emoji: 🏃
4
- colorFrom: indigo
5
- colorTo: indigo
6
  sdk: gradio
7
- sdk_version: 6.9.0
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: OCS Semantic Scoring
3
+ emoji: "\U0001F9E0"
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: "6.9.0"
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # OCS Semantic Scoring
14
+
15
+ Score creativity of divergent thinking responses using **semantic distance** in word embedding space. This tool measures how original a response is by computing the cosine distance between the prompt and response in a selectable embedding model. The current default is MOTES 100k; GloVe 840B support is available for local/self-hosted deployments.
16
+
17
+ This is part of the [Open Creativity Scoring](https://openscoring.du.edu) project.
18
+
19
+ ## How It Works
20
+
21
+ 1. **Word embeddings**: Each word in the prompt and response is looked up in the selected pre-trained word vectors (default: MOTES 100k children's embeddings; optional: GloVe 840B when available)
22
+ 2. **Cosine similarity**: The cosine similarity between the prompt vector and each response word vector is computed
23
+ 3. **Distance = Originality**: The score is `1 - similarity`, so higher values indicate more semantically distant (more original) responses
24
+ 4. **Aggregation**: Word-level scores are averaged (optionally IDF-weighted) into a single originality score
25
+
26
+ For example, for the prompt "brick":
27
+ - "doorstop" -> lower originality (semantically close to brick)
28
+ - "modern art sculpture" -> higher originality (semantically distant from brick)
29
+
30
+ ## Options
31
+
32
+ | Option | Description |
33
+ |--------|-------------|
34
+ | **Stopword filtering** | Skip common functional words (the, and, is, etc.) |
35
+ | **Term weighting (IDF)** | Weight words by inverse document frequency - rarer words matter more |
36
+ | **Exclude target words** | Don't count words from the prompt itself in the response |
37
+ | **Normalize (1-7)** | Map raw scores to a 1-7 scale based on norms from Dumas, Organisciak, & Doherty (2020) |
38
+ | **Elaboration** | Measure response length/complexity (whitespace, stoplist, IDF, or POS-based) |
39
+
40
+ ## Programmatic API
41
+
42
+ This Space provides an API via the Gradio Client. Example usage:
43
+
44
+ ```python
45
+ from gradio_client import Client
46
+
47
+ client = Client("massivetexts/ocs-semantic-scoring")
48
+
49
+ # Score a single response
50
+ result = client.predict(
51
+ prompt="brick",
52
+ response="modern art sculpture",
53
+ stopword=True,
54
+ term_weighting=True,
55
+ exclude_target=True,
56
+ normalize=False,
57
+ elab_method="none",
58
+ api_name="/score_single"
59
+ )
60
+ print(result)
61
+ ```
62
+
63
+ ## LLM-Based Scoring (Recommended)
64
+
65
+ For most use cases, we recommend the newer **LLM-based scoring** approach (OCSAI), which uses fine-tuned language models trained on human creativity judgments. It provides more accurate and nuanced scoring than semantic distance.
66
+
67
+ - **Web interface**: [openscoring.du.edu](https://openscoring.du.edu)
68
+ - **Python library**: [github.com/massivetexts/ocsai](https://github.com/massivetexts/ocsai)
69
+ - **API documentation**: [openscoring.du.edu/docs](https://openscoring.du.edu/docs)
70
+
71
+ ## Models
72
+
73
+ This Space currently defaults to **MOTES 100k** word vectors, hosted on Hugging Face at [`massivetexts/motes-embeddings-100k`](https://huggingface.co/massivetexts/motes-embeddings-100k). The model is downloaded automatically on first use.
74
+
75
+ Support for **GloVe 840B 300d** is included as an option in the app for local/self-hosted deployments. Due to the 5.4 GB model size, it is not hosted on this Space. Download vectors from [Stanford NLP](https://nlp.stanford.edu/projects/glove/) and see [`massivetexts/glove-840b-gensim`](https://huggingface.co/massivetexts/glove-840b-gensim) for Gensim conversion instructions.
76
+
77
+ IDF term weights are from: Organisciak, P. (2016). *Term Frequencies for 235k Language and Literature Texts*. http://hdl.handle.net/2142/89515
78
+
79
+ ## Citations
80
+
81
+ If you use this tool in your research, please cite:
82
+
83
+ > Dumas, D., Organisciak, P., & Doherty, M. D. (2020). Measuring divergent thinking originality with human raters and text-mining models: A psychometric comparison of methods. *Psychology of Aesthetics, Creativity, and the Arts*. https://doi.org/10/ghcsqq
84
+
85
+ > Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models. *Thinking Skills and Creativity*, 49, 101356.
86
+
87
+ ## Source Code
88
+
89
+ - This Space: [github.com/massivetexts/ocs-semantic-hf](https://github.com/massivetexts/ocs-semantic-hf)
90
+ - Original library: [github.com/massivetexts/open-scoring](https://github.com/massivetexts/open-scoring)
91
+ - OCSAI (LLM scoring): [github.com/massivetexts/ocsai](https://github.com/massivetexts/ocsai)
app.py ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ OCS Semantic Scoring - Hugging Face Space
3
+
4
+ Scores creativity of divergent thinking responses using semantic distance
5
+ in word embedding space. Part of the Open Creativity Scoring project.
6
+
7
+ See: https://openscoring.du.edu
8
+ """
9
+
10
+ import gradio as gr
11
+ import pandas as pd
12
+ import tempfile
13
+ import os
14
+ from scoring import SemanticScorer, download_model, ensure_spacy_model, MODELS, DEFAULT_MODEL
15
+
16
+ # Global scorer instances keyed by model name
17
+ scorers = {}
18
+ current_model = DEFAULT_MODEL
19
+
20
+
21
+ def get_scorer(model_name=None):
22
+ """Get or create a scorer for the given model."""
23
+ if model_name is None:
24
+ model_name = current_model
25
+ return scorers.get(model_name)
26
+
27
+
28
+ def load_model(model_name=None, progress=gr.Progress()):
29
+ """Download and load a model."""
30
+ global current_model
31
+ if model_name is None:
32
+ model_name = DEFAULT_MODEL
33
+
34
+ if model_name in scorers:
35
+ current_model = model_name
36
+ return f"{model_name} already loaded."
37
+
38
+ progress(0, desc="Ensuring spaCy model is available...")
39
+ ensure_spacy_model()
40
+
41
+ progress(0.1, desc=f"Downloading {model_name} from Hugging Face Hub...")
42
+ model_path = download_model(model_name)
43
+
44
+ progress(0.5, desc="Loading model into memory (this may take a moment)...")
45
+ scorer = SemanticScorer(model_name=model_name)
46
+ scorer.load_model(model_path)
47
+ scorers[model_name] = scorer
48
+ current_model = model_name
49
+
50
+ progress(1.0, desc="Ready!")
51
+ return f"{model_name} loaded successfully."
52
+
53
+
54
+ def score_single(prompt, response, model_name, stopword, term_weighting, exclude_target,
55
+ normalize, elab_method, progress=gr.Progress()):
56
+ """Score a single prompt-response pair."""
57
+ scorer = get_scorer(model_name)
58
+ if scorer is None:
59
+ load_model(model_name, progress)
60
+ scorer = get_scorer(model_name)
61
+
62
+ if not prompt or not response:
63
+ return "Please provide both a prompt and a response."
64
+
65
+ orig = scorer.originality(
66
+ prompt.strip(), response.strip(),
67
+ stopword=stopword,
68
+ term_weighting=term_weighting,
69
+ exclude_target=exclude_target,
70
+ )
71
+
72
+ if orig is None:
73
+ result = "Could not score - no recognized words found in response."
74
+ else:
75
+ if normalize:
76
+ import numpy as np
77
+ orig = scorer._scaler.transform(np.array([[orig]]))[0, 0]
78
+ result = f"Originality: {orig:.1f} (on 1-7 scale)"
79
+ else:
80
+ result = f"Originality: {orig:.4f} (cosine distance, 0-1 scale)"
81
+
82
+ if elab_method and elab_method != "none":
83
+ elab = scorer.elaboration(response.strip(), method=elab_method)
84
+ result += f"\nElaboration ({elab_method}): {elab}"
85
+
86
+ return result
87
+
88
+
89
+ def score_batch(file, model_name, stopword, term_weighting, exclude_target, normalize,
90
+ elab_method, progress=gr.Progress()):
91
+ """Score a CSV file of prompt-response pairs."""
92
+ scorer = get_scorer(model_name)
93
+ if scorer is None:
94
+ load_model(model_name, progress)
95
+ scorer = get_scorer(model_name)
96
+
97
+ if file is None:
98
+ return None, "Please upload a CSV file."
99
+
100
+ try:
101
+ df = pd.read_csv(file.name)
102
+ except Exception as e:
103
+ return None, f"Error reading CSV: {e}"
104
+
105
+ # Normalize column names
106
+ df.columns = [c.strip().lower() for c in df.columns]
107
+
108
+ if "prompt" not in df.columns or "response" not in df.columns:
109
+ # Try to use first two columns
110
+ if len(df.columns) >= 2:
111
+ df.columns = ["prompt", "response"] + list(df.columns[2:])
112
+ else:
113
+ return None, "CSV must have at least two columns (prompt, response)."
114
+
115
+ elab = elab_method if elab_method != "none" else None
116
+
117
+ progress(0.2, desc=f"Scoring {len(df)} responses...")
118
+ scored = scorer.score_batch(
119
+ df, stopword=stopword, term_weighting=term_weighting,
120
+ exclude_target=exclude_target, normalize=normalize,
121
+ elab_method=elab,
122
+ )
123
+ progress(0.9, desc="Preparing output...")
124
+
125
+ # Save to temp file for download
126
+ output_path = os.path.join(tempfile.gettempdir(), "scored_output.csv")
127
+ scored.to_csv(output_path, index=False)
128
+
129
+ return output_path, scored.head(20).to_string(index=False)
130
+
131
+
132
+ # Citation text
133
+ CITATION_TEXT = """
134
+ **Citations:**
135
+
136
+ Dumas, D., Organisciak, P., & Doherty, M. D. (2020). Measuring divergent thinking
137
+ originality with human raters and text-mining models: A psychometric comparison of
138
+ methods. *Psychology of Aesthetics, Creativity, and the Arts*.
139
+ https://doi.org/10/ghcsqq
140
+
141
+ Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic
142
+ distance: Automated scoring of divergent thinking greatly improves with large
143
+ language models. *Thinking Skills and Creativity*, 49, 101356.
144
+
145
+ **Note:** For LLM-based scoring (the newer, recommended approach), see
146
+ [openscoring.du.edu](https://openscoring.du.edu) and the
147
+ [ocsai library](https://github.com/massivetexts/ocsai).
148
+ """
149
+
150
+ ABOUT_TEXT = """
151
+ # OCS Semantic Scoring
152
+
153
+ Scores creativity of divergent thinking responses (e.g., Alternate Uses Task)
154
+ by measuring **semantic distance** between a prompt and response in word
155
+ embedding space.
156
+
157
+ **How it works:**
158
+ 1. Looks up word vectors for the prompt and response in the selected embedding model
159
+ 2. Computes cosine similarity between them
160
+ 3. Subtracts from 1 to get a distance score (higher = more original)
161
+
162
+ **Available models:**
163
+ - **MOTES 100k** (default): Children's writing embeddings (ages 10–12) from the MOTES study
164
+ - **GloVe 840B** (available for local use): General-purpose embeddings from Common Crawl (Pennington et al. 2014). Due to the 5.4 GB model size, GloVe is not hosted on this Space. For self-hosted deployments, download vectors from [Stanford NLP](https://nlp.stanford.edu/projects/glove/) and see [massivetexts/glove-840b-gensim](https://huggingface.co/massivetexts/glove-840b-gensim) for Gensim conversion instructions.
165
+
166
+ **Options:**
167
+ - **Stopword filtering**: Skip common functional words (the, and, etc.)
168
+ - **Term weighting**: Weight words by IDF (rarer words matter more)
169
+ - **Exclude target**: Don't count prompt words in the response
170
+ - **Normalize**: Map scores to a 1-7 scale (model-specific calibration)
171
+ - **Elaboration**: Measure response length/complexity
172
+ """
173
+
174
+
175
+ # Build UI
176
+ with gr.Blocks(title="OCS Semantic Scoring") as demo:
177
+ gr.Markdown("# OCS Semantic Scoring")
178
+ gr.Markdown("Score divergent thinking originality using semantic distance in word embedding space.")
179
+
180
+ # Model choices for dropdowns
181
+ model_choices = [(MODELS[k]["description"], k) for k in MODELS]
182
+
183
+ # Load model controls
184
+ with gr.Row():
185
+ model_selector = gr.Dropdown(
186
+ label="Model",
187
+ choices=model_choices,
188
+ value=DEFAULT_MODEL,
189
+ )
190
+ load_btn = gr.Button("Load Model", variant="primary")
191
+ load_status = gr.Textbox(label="Model Status", value="Model not loaded yet. Click 'Load Model' or score something to auto-load.", interactive=False)
192
+ load_btn.click(fn=load_model, inputs=model_selector, outputs=load_status)
193
+
194
+ with gr.Tabs():
195
+ with gr.TabItem("Single Score"):
196
+ with gr.Row():
197
+ with gr.Column():
198
+ prompt_input = gr.Textbox(label="Prompt (object)", placeholder="e.g., brick", lines=1)
199
+ response_input = gr.Textbox(label="Response", placeholder="e.g., modern art sculpture", lines=2)
200
+
201
+ with gr.Row():
202
+ stopword = gr.Checkbox(label="Stopword filtering", value=True)
203
+ term_weight = gr.Checkbox(label="Term weighting (IDF)", value=True)
204
+ exclude_tgt = gr.Checkbox(label="Exclude target words", value=True)
205
+ norm = gr.Checkbox(label="Normalize (1-7)", value=False)
206
+
207
+ elab = gr.Dropdown(
208
+ label="Elaboration method",
209
+ choices=["none", "whitespace", "stoplist", "idf", "pos"],
210
+ value="none",
211
+ )
212
+ score_btn = gr.Button("Score", variant="primary")
213
+
214
+ with gr.Column():
215
+ result_output = gr.Textbox(label="Result", lines=4, interactive=False)
216
+
217
+ score_btn.click(
218
+ fn=score_single,
219
+ inputs=[prompt_input, response_input, model_selector, stopword, term_weight, exclude_tgt, norm, elab],
220
+ outputs=result_output,
221
+ )
222
+
223
+ gr.Examples(
224
+ examples=[
225
+ ["brick", "doorstop"],
226
+ ["brick", "modern art sculpture displayed in a gallery"],
227
+ ["paperclip", "emergency lockpick for escaping a submarine"],
228
+ ["shoe", "flower pot for a tiny cactus"],
229
+ ],
230
+ inputs=[prompt_input, response_input],
231
+ )
232
+
233
+ with gr.TabItem("Batch Score (CSV)"):
234
+ gr.Markdown(
235
+ "Upload a CSV with `prompt` and `response` columns. "
236
+ "If no headers, the first two columns are used."
237
+ )
238
+ with gr.Row():
239
+ with gr.Column():
240
+ file_input = gr.File(label="Upload CSV", file_types=[".csv"])
241
+
242
+ with gr.Row():
243
+ b_stopword = gr.Checkbox(label="Stopword filtering", value=True)
244
+ b_term_weight = gr.Checkbox(label="Term weighting (IDF)", value=True)
245
+ b_exclude_tgt = gr.Checkbox(label="Exclude target words", value=True)
246
+ b_norm = gr.Checkbox(label="Normalize (1-7)", value=False)
247
+
248
+ b_elab = gr.Dropdown(
249
+ label="Elaboration method",
250
+ choices=["none", "whitespace", "stoplist", "idf", "pos"],
251
+ value="none",
252
+ )
253
+ batch_btn = gr.Button("Score File", variant="primary")
254
+
255
+ with gr.Column():
256
+ file_output = gr.File(label="Download scored CSV")
257
+ preview = gr.Textbox(label="Preview (first 20 rows)", lines=10, interactive=False)
258
+
259
+ batch_btn.click(
260
+ fn=score_batch,
261
+ inputs=[file_input, model_selector, b_stopword, b_term_weight, b_exclude_tgt, b_norm, b_elab],
262
+ outputs=[file_output, preview],
263
+ )
264
+
265
+ with gr.TabItem("About"):
266
+ gr.Markdown(ABOUT_TEXT)
267
+ gr.Markdown(CITATION_TEXT)
268
+
269
+ if __name__ == "__main__":
270
+ demo.launch(theme=gr.themes.Soft())
assets/idf-vals.parquet ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e3d88e039748b57097a1f3f246433d5b5196e2935181a1f2360c1b9273077ec
3
+ size 57465430
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio>=5.0
2
+ gensim>=4.0
3
+ spacy>=3.7.2
4
+ numpy
5
+ pandas
6
+ scikit-learn
7
+ inflect
8
+ huggingface_hub
9
+ pyarrow
scoring.py ADDED
@@ -0,0 +1,336 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Semantic distance scoring for creativity research.
3
+
4
+ Ported from the open-creativity-scoring library (https://github.com/massivetexts/open-scoring).
5
+ Computes originality scores by measuring cosine distance between word embeddings
6
+ of a prompt and response in embedding space.
7
+ """
8
+
9
+ import os
10
+ import subprocess
11
+ import logging
12
+
13
+ import numpy as np
14
+ import pandas as pd
15
+ from gensim.models import KeyedVectors
16
+ from sklearn.preprocessing import MinMaxScaler
17
+ from huggingface_hub import hf_hub_download
18
+
19
+ logger = logging.getLogger(__name__)
20
+
21
+ # Available models with their HF repos and scaling parameters
22
+ MODELS = {
23
+ "glove_840B": {
24
+ "repo": "massivetexts/glove-840b-gensim",
25
+ "files": ["glove.840B-300d.wv", "glove.840B-300d.wv.vectors.npy"],
26
+ "main_file": "glove.840B-300d.wv",
27
+ "description": "GloVe 840B 300d (Pennington et al. 2014) — general-purpose, large vocabulary",
28
+ "scaling": {"min": 0.6456, "max": 0.9610},
29
+ },
30
+ "motes_100k": {
31
+ "repo": "massivetexts/motes-embeddings-100k",
32
+ "files": ["all_weighted_10-12_100k.kv", "all_weighted_10-12_100k.kv.vectors.npy"],
33
+ "main_file": "all_weighted_10-12_100k.kv",
34
+ "description": "MOTES children's embeddings (ages 10-12, 100k vocab)",
35
+ "scaling": {"min": 0.5033, "max": 0.8955},
36
+ },
37
+ }
38
+
39
+ DEFAULT_MODEL = "motes_100k"
40
+
41
+ # Default scaling (used when no model-specific scaling is set)
42
+ DEFAULT_SCALING = MODELS[DEFAULT_MODEL]["scaling"]
43
+
44
+ # Path to IDF values
45
+ IDF_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "idf-vals.parquet")
46
+
47
+
48
+ def ensure_spacy_model():
49
+ """Download spaCy en_core_web_sm if not already installed."""
50
+ try:
51
+ import spacy
52
+ spacy.load("en_core_web_sm")
53
+ except OSError:
54
+ subprocess.run(
55
+ ["python", "-m", "spacy", "download", "en_core_web_sm"],
56
+ check=True,
57
+ capture_output=True,
58
+ )
59
+
60
+
61
+ def download_model(model_name=None, progress_callback=None):
62
+ """Download model files from Hugging Face Hub. Returns path to main .wv/.kv file.
63
+
64
+ Args:
65
+ model_name: Key from MODELS dict (e.g., 'glove_840B', 'motes_100k').
66
+ Defaults to DEFAULT_MODEL.
67
+ progress_callback: Optional callback(progress, message) for UI updates.
68
+ """
69
+ if model_name is None:
70
+ model_name = DEFAULT_MODEL
71
+
72
+ if model_name not in MODELS:
73
+ raise ValueError(f"Unknown model: {model_name}. Available: {list(MODELS.keys())}")
74
+
75
+ model_info = MODELS[model_name]
76
+
77
+ if progress_callback:
78
+ progress_callback(0, f"Downloading {model_name} from Hugging Face Hub...")
79
+
80
+ paths = {}
81
+ for i, filename in enumerate(model_info["files"]):
82
+ path = hf_hub_download(
83
+ repo_id=model_info["repo"],
84
+ filename=filename,
85
+ repo_type="model",
86
+ )
87
+ paths[filename] = path
88
+ if progress_callback:
89
+ progress_callback((i + 1) / len(model_info["files"]), f"Downloaded {filename}")
90
+
91
+ return paths[model_info["main_file"]]
92
+
93
+
94
+ class SemanticScorer:
95
+ """Scores originality of divergent thinking responses using semantic distance.
96
+
97
+ Measures cosine similarity between word embeddings of the prompt object
98
+ and the response, then subtracts from 1 to get a distance score.
99
+ Higher scores = more original (more distant in semantic space).
100
+ """
101
+
102
+ def __init__(self, model_name=None):
103
+ self._model = None
104
+ self._idf_ref = None
105
+ self._default_idf = None
106
+ self._nlp = None
107
+ self._inflect_engine = None
108
+ self._scaler = None
109
+ self._model_name = model_name or DEFAULT_MODEL
110
+
111
+ # Set up normalization scaler using model-specific scaling
112
+ scaling = MODELS.get(self._model_name, MODELS[DEFAULT_MODEL])["scaling"]
113
+ self._scaler = MinMaxScaler(feature_range=(1.0, 7.0), clip=True)
114
+ self._scaler.fit(np.array([[scaling["min"]], [scaling["max"]]]))
115
+
116
+ def _ensure_nlp(self):
117
+ """Lazy-load spaCy model."""
118
+ if self._nlp is None:
119
+ import spacy
120
+ import inflect
121
+ ensure_spacy_model()
122
+ self._nlp = spacy.load("en_core_web_sm")
123
+ self._inflect_engine = inflect.engine()
124
+
125
+ @property
126
+ def nlp(self):
127
+ self._ensure_nlp()
128
+ return self._nlp
129
+
130
+ @property
131
+ def p(self):
132
+ self._ensure_nlp()
133
+ return self._inflect_engine
134
+
135
+ @property
136
+ def idf(self):
137
+ """Load IDF scores from parquet file.
138
+
139
+ Uses page-level scores from:
140
+ Organisciak, P. 2016. Term Frequencies for 235k Language and Literature Texts.
141
+ http://hdl.handle.net/2142/89515.
142
+ """
143
+ if self._idf_ref is None:
144
+ idf_df = pd.read_parquet(IDF_PATH)
145
+ self._idf_ref = idf_df["IPF"].to_dict()
146
+ self._default_idf = idf_df.iloc[10000]["IPF"]
147
+ return self._idf_ref
148
+
149
+ @property
150
+ def default_idf(self):
151
+ if self._default_idf is None:
152
+ _ = self.idf # triggers load
153
+ return self._default_idf
154
+
155
+ def load_model(self, model_path, mmap="r"):
156
+ """Load a gensim KeyedVectors model."""
157
+ self._model = KeyedVectors.load(model_path, mmap=mmap)
158
+
159
+ def _get_phrase_vecs(self, phrase, stopword=False, term_weighting=False, exclude=None):
160
+ """Return stacked array of model vectors for words in phrase.
161
+
162
+ Args:
163
+ phrase: Text string or spaCy Doc
164
+ stopword: If True, skip stopwords
165
+ term_weighting: If True, compute IDF weights
166
+ exclude: List of words to skip (lowercased)
167
+
168
+ Returns:
169
+ Tuple of (vectors array, weights list)
170
+ """
171
+ import spacy
172
+
173
+ if exclude is None:
174
+ exclude = []
175
+
176
+ arrlist = []
177
+ weights = []
178
+
179
+ if not isinstance(phrase, spacy.tokens.doc.Doc):
180
+ phrase = self.nlp(phrase[: self.nlp.max_length], disable=["parser", "ner", "lemmatizer"])
181
+
182
+ exclude_lower = [x.lower() for x in exclude]
183
+ for word in phrase:
184
+ if stopword and word.is_stop:
185
+ continue
186
+ elif word.lower_ in exclude_lower:
187
+ continue
188
+ else:
189
+ try:
190
+ vec = self._model[word.lower_]
191
+ arrlist.append(vec)
192
+ except KeyError:
193
+ continue
194
+
195
+ if term_weighting:
196
+ weight = self.idf.get(word.lower_, self.default_idf)
197
+ weights.append(weight)
198
+
199
+ if len(arrlist):
200
+ vecs = np.vstack(arrlist)
201
+ return vecs, weights
202
+ else:
203
+ return [], []
204
+
205
+ def originality(self, target, response, stopword=False, term_weighting=False,
206
+ flip=True, exclude_target=False):
207
+ """Score originality as semantic distance between target prompt and response.
208
+
209
+ Args:
210
+ target: The prompt/object (e.g., "brick")
211
+ response: The creative response (e.g., "modern art sculpture")
212
+ stopword: Remove stopwords before scoring
213
+ term_weighting: Weight words by IDF
214
+ flip: If True, return 1 - similarity (higher = more original)
215
+ exclude_target: If True, exclude prompt words from response
216
+
217
+ Returns:
218
+ Float originality score, or None if scoring fails
219
+ """
220
+ if self._model is None:
221
+ raise RuntimeError("No model loaded. Call load_model() first.")
222
+
223
+ exclude_words = []
224
+ if exclude_target:
225
+ exclude_words = target.split()
226
+ for word in list(exclude_words):
227
+ try:
228
+ sense = self.p.plural(word.lower())
229
+ if isinstance(sense, str) and len(sense) and sense not in exclude_words:
230
+ exclude_words.append(sense)
231
+ except Exception:
232
+ pass
233
+
234
+ vecs, weights = self._get_phrase_vecs(
235
+ response, stopword, term_weighting, exclude=exclude_words
236
+ )
237
+
238
+ if len(vecs) == 0:
239
+ return None
240
+
241
+ if " " in target:
242
+ target_vecs = self._get_phrase_vecs(target, stopword, term_weighting)[0]
243
+ if len(target_vecs) == 0:
244
+ return None
245
+ targetvec = target_vecs.sum(0)
246
+ else:
247
+ try:
248
+ targetvec = self._model[target.lower()]
249
+ except KeyError:
250
+ return None
251
+
252
+ scores = self._model.cosine_similarities(targetvec, vecs)
253
+
254
+ if len(scores) and not term_weighting:
255
+ s = np.mean(scores)
256
+ elif len(scores):
257
+ s = np.average(scores, weights=weights)
258
+ else:
259
+ return None
260
+
261
+ if flip:
262
+ s = 1 - s
263
+ return float(s)
264
+
265
+ def elaboration(self, phrase, method="whitespace"):
266
+ """Score elaboration (response length/complexity).
267
+
268
+ Args:
269
+ phrase: The response text
270
+ method: One of 'whitespace', 'stoplist', 'idf', 'pos'
271
+
272
+ Returns:
273
+ Numeric elaboration score
274
+ """
275
+ if method == "whitespace":
276
+ return len(phrase.split())
277
+
278
+ doc = self.nlp(phrase[: self.nlp.max_length], disable=["parser", "ner", "lemmatizer"])
279
+
280
+ if method == "stoplist":
281
+ return len([w for w in doc if not (w.is_stop or w.is_punct)])
282
+ elif method == "idf":
283
+ weights = []
284
+ for word in doc:
285
+ if word.is_punct:
286
+ continue
287
+ weights.append(self.idf.get(word.lower_, self.default_idf))
288
+ return sum(weights)
289
+ elif method == "pos":
290
+ doc = self.nlp(phrase[: self.nlp.max_length], disable=["ner", "lemmatizer"])
291
+ return len([w for w in doc if w.pos_ in ["NOUN", "VERB", "ADJ", "ADV", "PROPN"] and not w.is_punct])
292
+ else:
293
+ raise ValueError(f"Unknown elaboration method: {method}")
294
+
295
+ def score_batch(self, df, stopword=False, term_weighting=False,
296
+ exclude_target=False, normalize=False, elab_method=None):
297
+ """Score a DataFrame of prompt-response pairs.
298
+
299
+ Args:
300
+ df: DataFrame with 'prompt' and 'response' columns
301
+ stopword: Remove stopwords
302
+ term_weighting: Weight by IDF
303
+ exclude_target: Exclude prompt words from response
304
+ normalize: Scale to 1-7 range
305
+ elab_method: Elaboration method or None
306
+
307
+ Returns:
308
+ DataFrame with 'originality' (and optionally 'elaboration') columns added
309
+ """
310
+ df = df.copy()
311
+ df["originality"] = df.apply(
312
+ lambda x: self.originality(
313
+ x["prompt"], x["response"],
314
+ stopword=stopword,
315
+ term_weighting=term_weighting,
316
+ exclude_target=exclude_target,
317
+ ),
318
+ axis=1,
319
+ )
320
+
321
+ if normalize:
322
+ valid_mask = df["originality"].notna()
323
+ if valid_mask.any():
324
+ df.loc[valid_mask, "originality"] = self._scaler.transform(
325
+ df.loc[valid_mask, "originality"].values.reshape(-1, 1)
326
+ )[:, 0]
327
+ df["originality"] = df["originality"].round(1)
328
+ else:
329
+ df["originality"] = df["originality"].round(4)
330
+
331
+ if elab_method and elab_method != "none":
332
+ df["elaboration"] = df["response"].apply(
333
+ lambda x: self.elaboration(x, method=elab_method)
334
+ )
335
+
336
+ return df