Spaces:

massivetexts
/

ocs-semantic-scoring

Running

Peter Organisciak commited on 25 days ago

Commit

053a22c

1 Parent(s): 719b027

Initial deploy: OCS Semantic Scoring HF Space

- MOTES 100k model as default, auto-downloaded from HF Hub
- GloVe 840B noted as available for local/self-hosted use (too large to host)
- Single and batch CSV scoring with configurable options
- Fix Gradio 6 theme deprecation (move to launch())

Files changed (7) hide show

.gitattributes +0 -1
.gitignore +9 -0
README.md +85 -6
app.py +270 -0
assets/idf-vals.parquet +3 -0
requirements.txt +9 -0
scoring.py +336 -0

.gitattributes CHANGED Viewed

@@ -25,7 +25,6 @@
 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text

 *.safetensors filter=lfs diff=lfs merge=lfs -text
 saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.tar.* filter=lfs diff=lfs merge=lfs -text
 *.tflite filter=lfs diff=lfs merge=lfs -text
 *.tgz filter=lfs diff=lfs merge=lfs -text
 *.wasm filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,9 @@

+__pycache__/
+*.pyc
+.venv/
+*.egg-info/
+.beads/
+hf-models/
+models/
+push_to_hf.sh
+AGENTS.md

README.md CHANGED Viewed

@@ -1,12 +1,91 @@
 ---
-title: Ocs Semantic Scoring
-emoji: 🏃
-colorFrom: indigo
-colorTo: indigo
 sdk: gradio
-sdk_version: 6.9.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: OCS Semantic Scoring
+emoji: "\U0001F9E0"
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: "6.9.0"
 app_file: app.py
 pinned: false
+license: mit
 ---
+# OCS Semantic Scoring
+Score creativity of divergent thinking responses using **semantic distance** in word embedding space. This tool measures how original a response is by computing the cosine distance between the prompt and response in a selectable embedding model. The current default is MOTES 100k; GloVe 840B support is available for local/self-hosted deployments.
+This is part of the [Open Creativity Scoring](https://openscoring.du.edu) project.
+## How It Works
+1. **Word embeddings**: Each word in the prompt and response is looked up in the selected pre-trained word vectors (default: MOTES 100k children's embeddings; optional: GloVe 840B when available)
+2. **Cosine similarity**: The cosine similarity between the prompt vector and each response word vector is computed
+3. **Distance = Originality**: The score is `1 - similarity`, so higher values indicate more semantically distant (more original) responses
+4. **Aggregation**: Word-level scores are averaged (optionally IDF-weighted) into a single originality score
+For example, for the prompt "brick":
+- "doorstop" -> lower originality (semantically close to brick)
+- "modern art sculpture" -> higher originality (semantically distant from brick)
+## Options
+| Option | Description |
+|--------|-------------|
+| **Stopword filtering** | Skip common functional words (the, and, is, etc.) |
+| **Term weighting (IDF)** | Weight words by inverse document frequency - rarer words matter more |
+| **Exclude target words** | Don't count words from the prompt itself in the response |
+| **Normalize (1-7)** | Map raw scores to a 1-7 scale based on norms from Dumas, Organisciak, & Doherty (2020) |
+| **Elaboration** | Measure response length/complexity (whitespace, stoplist, IDF, or POS-based) |
+## Programmatic API
+This Space provides an API via the Gradio Client. Example usage:
+```python
+from gradio_client import Client
+client = Client("massivetexts/ocs-semantic-scoring")
+# Score a single response
+result = client.predict(
+    prompt="brick",
+    response="modern art sculpture",
+    stopword=True,
+    term_weighting=True,
+    exclude_target=True,
+    normalize=False,
+    elab_method="none",
+    api_name="/score_single"
+)
+print(result)
+```
+## LLM-Based Scoring (Recommended)
+For most use cases, we recommend the newer **LLM-based scoring** approach (OCSAI), which uses fine-tuned language models trained on human creativity judgments. It provides more accurate and nuanced scoring than semantic distance.
+- **Web interface**: [openscoring.du.edu](https://openscoring.du.edu)
+- **Python library**: [github.com/massivetexts/ocsai](https://github.com/massivetexts/ocsai)
+- **API documentation**: [openscoring.du.edu/docs](https://openscoring.du.edu/docs)
+## Models
+This Space currently defaults to **MOTES 100k** word vectors, hosted on Hugging Face at [`massivetexts/motes-embeddings-100k`](https://huggingface.co/massivetexts/motes-embeddings-100k). The model is downloaded automatically on first use.
+Support for **GloVe 840B 300d** is included as an option in the app for local/self-hosted deployments. Due to the 5.4 GB model size, it is not hosted on this Space. Download vectors from [Stanford NLP](https://nlp.stanford.edu/projects/glove/) and see [`massivetexts/glove-840b-gensim`](https://huggingface.co/massivetexts/glove-840b-gensim) for Gensim conversion instructions.
+IDF term weights are from: Organisciak, P. (2016). *Term Frequencies for 235k Language and Literature Texts*. http://hdl.handle.net/2142/89515
+## Citations
+If you use this tool in your research, please cite:
+> Dumas, D., Organisciak, P., & Doherty, M. D. (2020). Measuring divergent thinking originality with human raters and text-mining models: A psychometric comparison of methods. *Psychology of Aesthetics, Creativity, and the Arts*. https://doi.org/10/ghcsqq
+> Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models. *Thinking Skills and Creativity*, 49, 101356.
+## Source Code
+- This Space: [github.com/massivetexts/ocs-semantic-hf](https://github.com/massivetexts/ocs-semantic-hf)
+- Original library: [github.com/massivetexts/open-scoring](https://github.com/massivetexts/open-scoring)
+- OCSAI (LLM scoring): [github.com/massivetexts/ocsai](https://github.com/massivetexts/ocsai)

app.py ADDED Viewed

	@@ -0,0 +1,270 @@

+"""
+OCS Semantic Scoring - Hugging Face Space
+Scores creativity of divergent thinking responses using semantic distance
+in word embedding space. Part of the Open Creativity Scoring project.
+See: https://openscoring.du.edu
+"""
+import gradio as gr
+import pandas as pd
+import tempfile
+import os
+from scoring import SemanticScorer, download_model, ensure_spacy_model, MODELS, DEFAULT_MODEL
+# Global scorer instances keyed by model name
+scorers = {}
+current_model = DEFAULT_MODEL
+def get_scorer(model_name=None):
+    """Get or create a scorer for the given model."""
+    if model_name is None:
+        model_name = current_model
+    return scorers.get(model_name)
+def load_model(model_name=None, progress=gr.Progress()):
+    """Download and load a model."""
+    global current_model
+    if model_name is None:
+        model_name = DEFAULT_MODEL
+    if model_name in scorers:
+        current_model = model_name
+        return f"{model_name} already loaded."
+    progress(0, desc="Ensuring spaCy model is available...")
+    ensure_spacy_model()
+    progress(0.1, desc=f"Downloading {model_name} from Hugging Face Hub...")
+    model_path = download_model(model_name)
+    progress(0.5, desc="Loading model into memory (this may take a moment)...")
+    scorer = SemanticScorer(model_name=model_name)
+    scorer.load_model(model_path)
+    scorers[model_name] = scorer
+    current_model = model_name
+    progress(1.0, desc="Ready!")
+    return f"{model_name} loaded successfully."
+def score_single(prompt, response, model_name, stopword, term_weighting, exclude_target,
+                 normalize, elab_method, progress=gr.Progress()):
+    """Score a single prompt-response pair."""
+    scorer = get_scorer(model_name)
+    if scorer is None:
+        load_model(model_name, progress)
+        scorer = get_scorer(model_name)
+    if not prompt or not response:
+        return "Please provide both a prompt and a response."
+    orig = scorer.originality(
+        prompt.strip(), response.strip(),
+        stopword=stopword,
+        term_weighting=term_weighting,
+        exclude_target=exclude_target,
+    )
+    if orig is None:
+        result = "Could not score - no recognized words found in response."
+    else:
+        if normalize:
+            import numpy as np
+            orig = scorer._scaler.transform(np.array([[orig]]))[0, 0]
+            result = f"Originality: {orig:.1f} (on 1-7 scale)"
+        else:
+            result = f"Originality: {orig:.4f} (cosine distance, 0-1 scale)"
+    if elab_method and elab_method != "none":
+        elab = scorer.elaboration(response.strip(), method=elab_method)
+        result += f"\nElaboration ({elab_method}): {elab}"
+    return result
+def score_batch(file, model_name, stopword, term_weighting, exclude_target, normalize,
+                elab_method, progress=gr.Progress()):
+    """Score a CSV file of prompt-response pairs."""
+    scorer = get_scorer(model_name)
+    if scorer is None:
+        load_model(model_name, progress)
+        scorer = get_scorer(model_name)
+    if file is None:
+        return None, "Please upload a CSV file."
+    try:
+        df = pd.read_csv(file.name)
+    except Exception as e:
+        return None, f"Error reading CSV: {e}"
+    # Normalize column names
+    df.columns = [c.strip().lower() for c in df.columns]
+    if "prompt" not in df.columns or "response" not in df.columns:
+        # Try to use first two columns
+        if len(df.columns) >= 2:
+            df.columns = ["prompt", "response"] + list(df.columns[2:])
+        else:
+            return None, "CSV must have at least two columns (prompt, response)."
+    elab = elab_method if elab_method != "none" else None
+    progress(0.2, desc=f"Scoring {len(df)} responses...")
+    scored = scorer.score_batch(
+        df, stopword=stopword, term_weighting=term_weighting,
+        exclude_target=exclude_target, normalize=normalize,
+        elab_method=elab,
+    )
+    progress(0.9, desc="Preparing output...")
+    # Save to temp file for download
+    output_path = os.path.join(tempfile.gettempdir(), "scored_output.csv")
+    scored.to_csv(output_path, index=False)
+    return output_path, scored.head(20).to_string(index=False)
+# Citation text
+CITATION_TEXT = """
+**Citations:**
+Dumas, D., Organisciak, P., & Doherty, M. D. (2020). Measuring divergent thinking
+originality with human raters and text-mining models: A psychometric comparison of
+methods. *Psychology of Aesthetics, Creativity, and the Arts*.
+https://doi.org/10/ghcsqq
+Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic
+distance: Automated scoring of divergent thinking greatly improves with large
+language models. *Thinking Skills and Creativity*, 49, 101356.
+**Note:** For LLM-based scoring (the newer, recommended approach), see
+[openscoring.du.edu](https://openscoring.du.edu) and the
+[ocsai library](https://github.com/massivetexts/ocsai).
+"""
+ABOUT_TEXT = """
+# OCS Semantic Scoring
+Scores creativity of divergent thinking responses (e.g., Alternate Uses Task)
+by measuring **semantic distance** between a prompt and response in word
+embedding space.
+**How it works:**
+1. Looks up word vectors for the prompt and response in the selected embedding model
+2. Computes cosine similarity between them
+3. Subtracts from 1 to get a distance score (higher = more original)
+**Available models:**
+- **MOTES 100k** (default): Children's writing embeddings (ages 10–12) from the MOTES study
+- **GloVe 840B** (available for local use): General-purpose embeddings from Common Crawl (Pennington et al. 2014). Due to the 5.4 GB model size, GloVe is not hosted on this Space. For self-hosted deployments, download vectors from [Stanford NLP](https://nlp.stanford.edu/projects/glove/) and see [massivetexts/glove-840b-gensim](https://huggingface.co/massivetexts/glove-840b-gensim) for Gensim conversion instructions.
+**Options:**
+- **Stopword filtering**: Skip common functional words (the, and, etc.)
+- **Term weighting**: Weight words by IDF (rarer words matter more)
+- **Exclude target**: Don't count prompt words in the response
+- **Normalize**: Map scores to a 1-7 scale (model-specific calibration)
+- **Elaboration**: Measure response length/complexity
+"""
+# Build UI
+with gr.Blocks(title="OCS Semantic Scoring") as demo:
+    gr.Markdown("# OCS Semantic Scoring")
+    gr.Markdown("Score divergent thinking originality using semantic distance in word embedding space.")
+    # Model choices for dropdowns
+    model_choices = [(MODELS[k]["description"], k) for k in MODELS]
+    # Load model controls
+    with gr.Row():
+        model_selector = gr.Dropdown(
+            label="Model",
+            choices=model_choices,
+            value=DEFAULT_MODEL,
+        )
+        load_btn = gr.Button("Load Model", variant="primary")
+    load_status = gr.Textbox(label="Model Status", value="Model not loaded yet. Click 'Load Model' or score something to auto-load.", interactive=False)
+    load_btn.click(fn=load_model, inputs=model_selector, outputs=load_status)
+    with gr.Tabs():
+        with gr.TabItem("Single Score"):
+            with gr.Row():
+                with gr.Column():
+                    prompt_input = gr.Textbox(label="Prompt (object)", placeholder="e.g., brick", lines=1)
+                    response_input = gr.Textbox(label="Response", placeholder="e.g., modern art sculpture", lines=2)
+                    with gr.Row():
+                        stopword = gr.Checkbox(label="Stopword filtering", value=True)
+                        term_weight = gr.Checkbox(label="Term weighting (IDF)", value=True)
+                        exclude_tgt = gr.Checkbox(label="Exclude target words", value=True)
+                        norm = gr.Checkbox(label="Normalize (1-7)", value=False)
+                    elab = gr.Dropdown(
+                        label="Elaboration method",
+                        choices=["none", "whitespace", "stoplist", "idf", "pos"],
+                        value="none",
+                    )
+                    score_btn = gr.Button("Score", variant="primary")
+                with gr.Column():
+                    result_output = gr.Textbox(label="Result", lines=4, interactive=False)
+            score_btn.click(
+                fn=score_single,
+                inputs=[prompt_input, response_input, model_selector, stopword, term_weight, exclude_tgt, norm, elab],
+                outputs=result_output,
+            )
+            gr.Examples(
+                examples=[
+                    ["brick", "doorstop"],
+                    ["brick", "modern art sculpture displayed in a gallery"],
+                    ["paperclip", "emergency lockpick for escaping a submarine"],
+                    ["shoe", "flower pot for a tiny cactus"],
+                ],
+                inputs=[prompt_input, response_input],
+            )
+        with gr.TabItem("Batch Score (CSV)"):
+            gr.Markdown(
+                "Upload a CSV with `prompt` and `response` columns. "
+                "If no headers, the first two columns are used."
+            )
+            with gr.Row():
+                with gr.Column():
+                    file_input = gr.File(label="Upload CSV", file_types=[".csv"])
+                    with gr.Row():
+                        b_stopword = gr.Checkbox(label="Stopword filtering", value=True)
+                        b_term_weight = gr.Checkbox(label="Term weighting (IDF)", value=True)
+                        b_exclude_tgt = gr.Checkbox(label="Exclude target words", value=True)
+                        b_norm = gr.Checkbox(label="Normalize (1-7)", value=False)
+                    b_elab = gr.Dropdown(
+                        label="Elaboration method",
+                        choices=["none", "whitespace", "stoplist", "idf", "pos"],
+                        value="none",
+                    )
+                    batch_btn = gr.Button("Score File", variant="primary")
+                with gr.Column():
+                    file_output = gr.File(label="Download scored CSV")
+                    preview = gr.Textbox(label="Preview (first 20 rows)", lines=10, interactive=False)
+            batch_btn.click(
+                fn=score_batch,
+                inputs=[file_input, model_selector, b_stopword, b_term_weight, b_exclude_tgt, b_norm, b_elab],
+                outputs=[file_output, preview],
+            )
+        with gr.TabItem("About"):
+            gr.Markdown(ABOUT_TEXT)
+            gr.Markdown(CITATION_TEXT)
+if __name__ == "__main__":
+    demo.launch(theme=gr.themes.Soft())

assets/idf-vals.parquet ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e3d88e039748b57097a1f3f246433d5b5196e2935181a1f2360c1b9273077ec
+size 57465430

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+gradio>=5.0
+gensim>=4.0
+spacy>=3.7.2
+numpy
+pandas
+scikit-learn
+inflect
+huggingface_hub
+pyarrow

scoring.py ADDED Viewed

	@@ -0,0 +1,336 @@

+"""
+Semantic distance scoring for creativity research.
+Ported from the open-creativity-scoring library (https://github.com/massivetexts/open-scoring).
+Computes originality scores by measuring cosine distance between word embeddings
+of a prompt and response in embedding space.
+"""
+import os
+import subprocess
+import logging
+import numpy as np
+import pandas as pd
+from gensim.models import KeyedVectors
+from sklearn.preprocessing import MinMaxScaler
+from huggingface_hub import hf_hub_download
+logger = logging.getLogger(__name__)
+# Available models with their HF repos and scaling parameters
+MODELS = {
+    "glove_840B": {
+        "repo": "massivetexts/glove-840b-gensim",
+        "files": ["glove.840B-300d.wv", "glove.840B-300d.wv.vectors.npy"],
+        "main_file": "glove.840B-300d.wv",
+        "description": "GloVe 840B 300d (Pennington et al. 2014) — general-purpose, large vocabulary",
+        "scaling": {"min": 0.6456, "max": 0.9610},
+    },
+    "motes_100k": {
+        "repo": "massivetexts/motes-embeddings-100k",
+        "files": ["all_weighted_10-12_100k.kv", "all_weighted_10-12_100k.kv.vectors.npy"],
+        "main_file": "all_weighted_10-12_100k.kv",
+        "description": "MOTES children's embeddings (ages 10-12, 100k vocab)",
+        "scaling": {"min": 0.5033, "max": 0.8955},
+    },
+}
+DEFAULT_MODEL = "motes_100k"
+# Default scaling (used when no model-specific scaling is set)
+DEFAULT_SCALING = MODELS[DEFAULT_MODEL]["scaling"]
+# Path to IDF values
+IDF_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "idf-vals.parquet")
+def ensure_spacy_model():
+    """Download spaCy en_core_web_sm if not already installed."""
+    try:
+        import spacy
+        spacy.load("en_core_web_sm")
+    except OSError:
+        subprocess.run(
+            ["python", "-m", "spacy", "download", "en_core_web_sm"],
+            check=True,
+            capture_output=True,
+        )
+def download_model(model_name=None, progress_callback=None):
+    """Download model files from Hugging Face Hub. Returns path to main .wv/.kv file.
+    Args:
+        model_name: Key from MODELS dict (e.g., 'glove_840B', 'motes_100k').
+                    Defaults to DEFAULT_MODEL.
+        progress_callback: Optional callback(progress, message) for UI updates.
+    """
+    if model_name is None:
+        model_name = DEFAULT_MODEL
+    if model_name not in MODELS:
+        raise ValueError(f"Unknown model: {model_name}. Available: {list(MODELS.keys())}")
+    model_info = MODELS[model_name]
+    if progress_callback:
+        progress_callback(0, f"Downloading {model_name} from Hugging Face Hub...")
+    paths = {}
+    for i, filename in enumerate(model_info["files"]):
+        path = hf_hub_download(
+            repo_id=model_info["repo"],
+            filename=filename,
+            repo_type="model",
+        )
+        paths[filename] = path
+        if progress_callback:
+            progress_callback((i + 1) / len(model_info["files"]), f"Downloaded {filename}")
+    return paths[model_info["main_file"]]
+class SemanticScorer:
+    """Scores originality of divergent thinking responses using semantic distance.
+    Measures cosine similarity between word embeddings of the prompt object
+    and the response, then subtracts from 1 to get a distance score.
+    Higher scores = more original (more distant in semantic space).
+    """
+    def __init__(self, model_name=None):
+        self._model = None
+        self._idf_ref = None
+        self._default_idf = None
+        self._nlp = None
+        self._inflect_engine = None
+        self._scaler = None
+        self._model_name = model_name or DEFAULT_MODEL
+        # Set up normalization scaler using model-specific scaling
+        scaling = MODELS.get(self._model_name, MODELS[DEFAULT_MODEL])["scaling"]
+        self._scaler = MinMaxScaler(feature_range=(1.0, 7.0), clip=True)
+        self._scaler.fit(np.array([[scaling["min"]], [scaling["max"]]]))
+    def _ensure_nlp(self):
+        """Lazy-load spaCy model."""
+        if self._nlp is None:
+            import spacy
+            import inflect
+            ensure_spacy_model()
+            self._nlp = spacy.load("en_core_web_sm")
+            self._inflect_engine = inflect.engine()
+    @property
+    def nlp(self):
+        self._ensure_nlp()
+        return self._nlp
+    @property
+    def p(self):
+        self._ensure_nlp()
+        return self._inflect_engine
+    @property
+    def idf(self):
+        """Load IDF scores from parquet file.
+        Uses page-level scores from:
+        Organisciak, P. 2016. Term Frequencies for 235k Language and Literature Texts.
+            http://hdl.handle.net/2142/89515.
+        """
+        if self._idf_ref is None:
+            idf_df = pd.read_parquet(IDF_PATH)
+            self._idf_ref = idf_df["IPF"].to_dict()
+            self._default_idf = idf_df.iloc[10000]["IPF"]
+        return self._idf_ref
+    @property
+    def default_idf(self):
+        if self._default_idf is None:
+            _ = self.idf  # triggers load
+        return self._default_idf
+    def load_model(self, model_path, mmap="r"):
+        """Load a gensim KeyedVectors model."""
+        self._model = KeyedVectors.load(model_path, mmap=mmap)
+    def _get_phrase_vecs(self, phrase, stopword=False, term_weighting=False, exclude=None):
+        """Return stacked array of model vectors for words in phrase.
+        Args:
+            phrase: Text string or spaCy Doc
+            stopword: If True, skip stopwords
+            term_weighting: If True, compute IDF weights
+            exclude: List of words to skip (lowercased)
+        Returns:
+            Tuple of (vectors array, weights list)
+        """
+        import spacy
+        if exclude is None:
+            exclude = []
+        arrlist = []
+        weights = []
+        if not isinstance(phrase, spacy.tokens.doc.Doc):
+            phrase = self.nlp(phrase[: self.nlp.max_length], disable=["parser", "ner", "lemmatizer"])
+        exclude_lower = [x.lower() for x in exclude]
+        for word in phrase:
+            if stopword and word.is_stop:
+                continue
+            elif word.lower_ in exclude_lower:
+                continue
+            else:
+                try:
+                    vec = self._model[word.lower_]
+                    arrlist.append(vec)
+                except KeyError:
+                    continue
+                if term_weighting:
+                    weight = self.idf.get(word.lower_, self.default_idf)
+                    weights.append(weight)
+        if len(arrlist):
+            vecs = np.vstack(arrlist)
+            return vecs, weights
+        else:
+            return [], []
+    def originality(self, target, response, stopword=False, term_weighting=False,
+                    flip=True, exclude_target=False):
+        """Score originality as semantic distance between target prompt and response.
+        Args:
+            target: The prompt/object (e.g., "brick")
+            response: The creative response (e.g., "modern art sculpture")
+            stopword: Remove stopwords before scoring
+            term_weighting: Weight words by IDF
+            flip: If True, return 1 - similarity (higher = more original)
+            exclude_target: If True, exclude prompt words from response
+        Returns:
+            Float originality score, or None if scoring fails
+        """
+        if self._model is None:
+            raise RuntimeError("No model loaded. Call load_model() first.")
+        exclude_words = []
+        if exclude_target:
+            exclude_words = target.split()
+            for word in list(exclude_words):
+                try:
+                    sense = self.p.plural(word.lower())
+                    if isinstance(sense, str) and len(sense) and sense not in exclude_words:
+                        exclude_words.append(sense)
+                except Exception:
+                    pass
+        vecs, weights = self._get_phrase_vecs(
+            response, stopword, term_weighting, exclude=exclude_words
+        )
+        if len(vecs) == 0:
+            return None
+        if " " in target:
+            target_vecs = self._get_phrase_vecs(target, stopword, term_weighting)[0]
+            if len(target_vecs) == 0:
+                return None
+            targetvec = target_vecs.sum(0)
+        else:
+            try:
+                targetvec = self._model[target.lower()]
+            except KeyError:
+                return None
+        scores = self._model.cosine_similarities(targetvec, vecs)
+        if len(scores) and not term_weighting:
+            s = np.mean(scores)
+        elif len(scores):
+            s = np.average(scores, weights=weights)
+        else:
+            return None
+        if flip:
+            s = 1 - s
+        return float(s)
+    def elaboration(self, phrase, method="whitespace"):
+        """Score elaboration (response length/complexity).
+        Args:
+            phrase: The response text
+            method: One of 'whitespace', 'stoplist', 'idf', 'pos'
+        Returns:
+            Numeric elaboration score
+        """
+        if method == "whitespace":
+            return len(phrase.split())
+        doc = self.nlp(phrase[: self.nlp.max_length], disable=["parser", "ner", "lemmatizer"])
+        if method == "stoplist":
+            return len([w for w in doc if not (w.is_stop or w.is_punct)])
+        elif method == "idf":
+            weights = []
+            for word in doc:
+                if word.is_punct:
+                    continue
+                weights.append(self.idf.get(word.lower_, self.default_idf))
+            return sum(weights)
+        elif method == "pos":
+            doc = self.nlp(phrase[: self.nlp.max_length], disable=["ner", "lemmatizer"])
+            return len([w for w in doc if w.pos_ in ["NOUN", "VERB", "ADJ", "ADV", "PROPN"] and not w.is_punct])
+        else:
+            raise ValueError(f"Unknown elaboration method: {method}")
+    def score_batch(self, df, stopword=False, term_weighting=False,
+                    exclude_target=False, normalize=False, elab_method=None):
+        """Score a DataFrame of prompt-response pairs.
+        Args:
+            df: DataFrame with 'prompt' and 'response' columns
+            stopword: Remove stopwords
+            term_weighting: Weight by IDF
+            exclude_target: Exclude prompt words from response
+            normalize: Scale to 1-7 range
+            elab_method: Elaboration method or None
+        Returns:
+            DataFrame with 'originality' (and optionally 'elaboration') columns added
+        """
+        df = df.copy()
+        df["originality"] = df.apply(
+            lambda x: self.originality(
+                x["prompt"], x["response"],
+                stopword=stopword,
+                term_weighting=term_weighting,
+                exclude_target=exclude_target,
+            ),
+            axis=1,
+        )
+        if normalize:
+            valid_mask = df["originality"].notna()
+            if valid_mask.any():
+                df.loc[valid_mask, "originality"] = self._scaler.transform(
+                    df.loc[valid_mask, "originality"].values.reshape(-1, 1)
+                )[:, 0]
+            df["originality"] = df["originality"].round(1)
+        else:
+            df["originality"] = df["originality"].round(4)
+        if elab_method and elab_method != "none":
+            df["elaboration"] = df["response"].apply(
+                lambda x: self.elaboration(x, method=elab_method)
+            )
+        return df