Summarization Bias: Why Language Models Re-Label the Emotions You Tried to Hide

Community Article Published May 30, 2026

A data-engineering note from the Objective Projection project Author: Levent Bulut (ORCID 0009-0007-7500-2261) License: CC BY-NC-ND 4.0 Dataset: leventbulut/objective-projection — DOI 10.57967/hf/8960 (Zenodo archive 10.5281/zenodo.19511369)

The observation Take a passage written so that no emotion word appears in it. The light thinning at the edge of a room, a glass left at a precise angle, the temperature of a surface a character keeps not touching. The feeling is meant to arrive in the reader without ever being named — this is the core constraint of Objective Projection, what the project calls the Output Layer Rule: the physical parameters drive the writing but never surface in the text, and the Emotion Embargo forbids naming the emotion outright. Now hand that passage to a language model and ask it to summarize. The summary comes back as: "The character feels anxious and abandoned." The model did not preserve what the text did. It performed the one move the text was built to avoid. It re-attached the label. We call this Summarization Bias: the systematic tendency of language models, when compressing or paraphrasing narrative, to collapse sensory and physical encoding back into abstract emotional labels — to tell what the source text deliberately showed. This note is not a complaint about AI. It is a description of a measurable bias and an account of how this dataset is structured to work around it. Why this is structural, not stylistic Summarization rewards abstraction. A faithful summary is supposed to drop surface detail and keep the gist, and for most text the "gist" of a scene genuinely is its emotional content. So a model trained to summarize well has been trained, in effect, to discard exactly the layer Objective Projection treats as load-bearing. This matters in two concrete places in any pipeline built on physically-encoded narrative:

Generation. Ask a model to write in this style and it drifts back toward labels under almost any pressure — length, ambiguity, a vague prompt. The embargoed word reappears as soon as the model is left to "clarify." Evaluation. This is the more dangerous one. If you use an LLM as a judge to score whether a passage respected the embargo, the judge is running the same summarization step internally. It reads the physical encoding, silently re-labels it as "sadness," and then reports that the emotion is "present" — penalizing text that in fact did its job perfectly. The bias contaminates the measurement of the bias.

The neurobiological framing the project leans on (LeDoux's dual-pathway account, 1992) is useful here only as an analogy, not a claim: the fast, pre-cortical route to an affective response is precisely what gets bypassed when a label is supplied up front. A summary that hands you "anxious" has done the high-road work for you and removed the low-road experience. That is the thing the methodology is trying to protect, and the thing summarization erases. The data-engineering response The response is not to argue with the model. It is to stop relying on the model's judgment where that judgment is biased, and to give the dataset enough structure that the bias has nothing to hide behind. Three concrete moves, all already in the dataset:

  1. Annotate the physical layer explicitly, not the emotion. Each scene in the corpus (sft_complete_500_annotated.jsonl, 500 scenes) is annotated against the six physical variables and the six rules — never against an emotion label. There is no "this scene = fear" field for a summarizer to latch onto. The supervised signal is the encoding, not the gist.
  2. Hard negatives. The dataset includes hand-built hard negatives — passages that look correct but break a rule, including the failure mode where the embargo is violated and the emotion is stated. These are labeled as negatives precisely so that a model trained or evaluated on the set has explicit examples of the re-labeling move marked wrong. Summarization Bias is, in dataset terms, the production of a hard negative.
  3. Rule-based detection instead of LLM-as-judge. The repository ships apply_rules.py, a transparent, bilingual, rule-based detector for the six golden rules. It does not summarize. It checks for the surface signatures of each rule directly, so its verdict cannot be poisoned by an internal re-labeling step the way an LLM judge's can. The point of (3) is not that rules are better than models in general — they are not. The point is that for this specific bias, a method that never abstracts is structurally immune to an error that comes from abstracting. Honest limits A data-engineering note that hides its detector's weak spots would be useless. On the annotated corpus, apply_rules.py reliability varies sharply by rule:

Simile Prohibition: ~99.6% Emotion Embargo: ~97.2% Temporal Anchor: ~76% Materialized Metaphors: ~70% Micro-Focus: ~63.6% Atmosphere Contradiction: ~9.8%

Atmosphere Contradiction detection is, frankly, barely working — a rule that depends on a relationship between two parts of a scene resists flat pattern-matching, and the rule-based approach has not solved it. That number is in the table on purpose. It marks the boundary where the "don't trust the LLM, use rules" argument runs out, and where a future detector will need something more than surface signatures without sliding back into the biased summarization it was meant to avoid. Two further caveats:

The reliability figures are detector reliability against human annotation on this corpus. They are not a claim that the underlying methodology is validated. The behavioral validation (the pre-registered OPCT v2.0 protocol, n=80, ECG + GSR + pupillometry) is a separate, ongoing piece of work, not a result reported here. "Summarization Bias" is described here as a measurable engineering phenomenon. It is not offered as a moral or aesthetic verdict on summarization, which is the right tool for almost everything that is not this.

Reproducing it The fastest way to see the bias for yourself: take any positive example from the corpus, summarize it with your model of choice, and check whether an embargoed emotion word reappears in the summary. Then run the same passage and its summary through apply_rules.py and compare what the detector flags against what the summary asserts. The gap between the two is the bias, quantified. Dataset, detector, and annotation schema: leventbulut/objective-projection. Canonical definition of the narrative-entropy measure referenced throughout the corpus (Sₙ): DOI 10.5281/zenodo.20459351. Operational pilot: DOI 10.5281/zenodo.20362901.

Objective Projection encodes emotion as measurable physical parameters that govern the writing while remaining invisible in the text. Summarization Bias is one of the failure modes the dataset is built to make visible and checkable rather than to argue about.

Links and citation

Community

Sign up or log in to comment