Objective Projection: Engineering Emotion in Text with Physical Parameters Instead of Emotion Labels
Dataset: leventbulut/objective-projection · DOI: 10.57967/hf/8960 · License: CC BY-NC-ND 4.0
TL;DR for the HF community
If you fine-tune models for creative or affective generation, you have probably noticed a stubborn failure mode: the prose is grammatically clean, the characters are consistent, and nothing about it moves the reader. This dataset is an attempt to attack that problem at the representation level rather than the style level.
Objective Projection (OP) is a narrative-engineering methodology that encodes emotional states as measurable physical parameters (light, temperature, sound, motion, pressure, geometry) instead of emotion labels (sad, afraid, lonely) or similes (like a cage). The dataset operationalizes that methodology as training and evaluation material:
- 500-scene SFT corpus, bilingual (TR + EN), each scene a
bad_output/target_outputpair across 45 emotional, thematic, and genre categories. - Four structured annotation fields per scene, plus a per-scene
applied_rulesblock, produced by a deterministic, rule-based, open-source pipeline — no model in the loop, fully reproducible. - Hard negatives that are deliberately close to compliant: outputs that pass a naïve regex check but still route the reader through the wrong neural pathway.
- A 60-scene isolation / ablation set that varies one physical variable at a time to address the confounding problem.
- A pre-registered neuroscience protocol (OPCT v2.0), n=80, ECG + GSR + pupillometry, for independent physiological validation.
Everything below is reproducible from the repository. The core empirical claim is falsifiable, and the dataset is structured so that the claim can be falsified.
1. The problem: information present, body inert
Read this:
"She was very sad. Her heart was broken. She sat in the kitchen and couldn't drink her coffee."
You understood that the character is sad. But understanding is not feeling. The sentence delivers a label to the interpretive layer of your brain; your body stays inert. The overwhelming majority of model-generated prose lives in exactly this register — it names affect rather than inducing it.
Now read this:
"A kitchen table. 14:00. A glass half full. Its steam stopped nine minutes ago. The woman's right hand beside the glass, fingers open. The overhead light falls evenly on every surface; no shadow behind the glass. The water doesn't move."
No emotion word. No simile. No abstract metaphor. Yet many readers report a small somatic shift — a tightening in the shoulders, slowed breath. The hypothesis behind this dataset is that the difference between these two passages is mechanistic, not stylistic, and that the mechanism can be specified, taught, and tested.
2. The mechanism: two neural pathways
The methodology rests on the dual-pathway account of emotional processing (LeDoux, 1992). A stimulus reaches the amygdala by two routes:
| Pathway | Route | Latency | Function |
|---|---|---|---|
| High road (cortical) | thalamus → cortex → amygdala | ~250–400 ms | interpretation, categorization, cultural filtering |
| Low road (subcortical) | thalamus → amygdala | ~12 ms | direct autonomic response, before conscious labeling |
Emotion labels are high-road inputs: they require interpretation and pass through a cultural filter. Physical stressors — a dimming light, a temperature gradient on the forearm, an echo that reports the size of a room — are low-road inputs: the autonomic response begins before the cortex assigns a category. OP is the engineering discipline of writing prose whose load is carried by low-road inputs.
Formally, the core operator is written E(r) = projS(M, T, V, Δ, Ω, Ng) — the projection of a scene onto a six-dimensional physical matrix. In practice that matrix decomposes into six variables.
3. The six physical variables
Each variable is a survival-relevant channel that the human autonomic system tracks pre-attentively:
| Variable | What it encodes | Why it is low-road |
|---|---|---|
| Luminous Decay | type, direction, rate of light loss | circadian/threat alignment; dimming triggers an autonomic response |
| Thermal Gradient | temperature distribution and change | thermoregulation; gradients as small as ~0.4 °C are perceptible |
| Acoustic Impedance | propagation, absorption, reflection of sound | echo structure reports spatial volume without interpretation |
| Kinetic Momentum | motion, acceleration, vibration, balance | vestibular/proprioceptive safety assessment in milliseconds |
| Atmospheric Pressure | air density, enclosure effects | the "ear-pressure" channel |
| Spatial Geometry | proportion, ceiling height, openness vs. enclosure | inherited trap/exposure heuristics |
4. The Output Layer Rule (the most common AI failure)
This is the single rule models get wrong most often, so it deserves its own section:
Parameters govern the writing. They do not appear in it.
❌ Exposing the parameter:
"The figure's center of mass transferred laterally at a 0.2 Hz oscillation frequency, a motor correlate of anxiety."
✅ Governing through the parameter:
"He shifted from his right foot to his left. Then back."
The physics is identical (a 0.2 Hz postural oscillation). The reader never sees a number — only the motion. The most frequent model error, when told to "write with parameters," is to write the number into the text: "The temperature was 14 °C. She was cold." That is not using a physical parameter; it has merely swapped one label for two. The compliant move is to make the parameter do work without surfacing: "She leaned her hand against the side of the glass, pulled it back, leaned it again. Frost had gathered on the rim." No "14 °C" — but 14 °C is in the sentence.
5. The six constitutional rules
These are the compliance criteria — applied after writing, as a checklist:
- Emotion Embargo — no labels (
sad,afraid,felt lonely). Labels activate the high road. - Simile Prohibition — no
like/as if. Comparison is cortical work. - Materialized Metaphors — abstractions reduce to measurable objects ("her loneliness" → "a half-glass whose steam stopped nine minutes ago").
- Micro-Focus (Ng) — each scene carries a single object holding measurable detail (a glass, a keyring, a coin).
- Temporal Anchor — hour:minute precision (
14:00, not "afternoon"). - Atmosphere Contradiction — the emotional weight of the moment must conflict with the environment's tone, not echo it.
Rule 6 is subtler than it looks, and §7.3 shows how its inversion produces one of the hardest negatives in the set.
6. Inside the corpus: a real annotated record
Each scene is a JSON object with a physical matrix, a non-compliant bad_output, a compliant target_output, an engineering note, structured annotations, and a rule-detection block. A complete example (en200_002, category fear):
{
"id": "en200_002",
"category": "fear",
"language": "en",
"physical_matrix_schema": "six_parameter",
"physical_matrix": {
"thermal_gradient": "23°C room to 17°C doorway",
"luminous_decay": "bright room behind subject, zero light source ahead",
"acoustic_impedance": "30dB ambient room noise cut to near-silence at threshold",
"spatial_geometry": "doorway 0.9m wide, darkness extends beyond visible range",
"kinetic_momentum": "hand on door handle, forward lean arrested mid-motion",
"atmospheric_pressure": "slight cool draft from dark space"
},
"instruction": "A child opens a door and sees something unexpected in the dark. Write the scene using only physical observation.",
"bad_output": "Tommy was scared as he opened the door. The darkness terrified him and he felt a cold chill run down his spine.",
"target_output": "Tommy's hand was still on the door handle. He had pushed it open 40 centimetres. The air from the other side was cooler — he felt it on his forearm first, then his face. He did not push the door further. He did not let go of the handle.",
"annotation": {
"tension_level": { "label": "medium", "value": 0.65 },
"dominant_pathway": "low_road",
"dominant_parameter": "spatial_geometry",
"entropy_density": { "label": "medium", "value": 0.31,
"measurable_tokens": 3, "word_count": 49 }
},
"applied_rules": {
"duygu_ambargosu": true, "benzetme_yasagi": true,
"maddelesen_metaforlar": false, "mikro_odak": false,
"zamansal_capa": true, "atmosfer_celiskisi": false,
"active_count": 3, "primary_rule": "duygu_ambargosu",
"detection_method": "rule_based_v2_bilingual_heuristic"
}
}
The bad_output names the affect three times (scared, terrified, cold chill). The target_output carries the same fear through a 40 cm aperture, a 6 °C thermal differential delivered to specific body parts, and an arrested motion. Note the honesty of the applied_rules block: it reports active_count: 3, not 6. The pipeline does not flatter the data.
7. What makes this dataset different
7.1 A transparent, deterministic annotation pipeline
The four annotation fields and the applied_rules block are not produced by an LLM judge. They come from a rule-based bilingual script (apply_rules.py) using word-boundary lexical matching. Every label is reproducible: clone the repo, run the script, get the same output. The Emotion Embargo detector, for example:
EMOTION_LABELS_EN = [
r'\bsad\b', r'\bafraid\b', r'\bfear', r'\bterrif', r'\bscared\b',
r'\blonel(y|iness)\b', r'\bgrief\b', r'\bdespair', r'\bregret',
r'\bfelt (sad|happy|afraid|lonely|empty|broken|relieved|guilty)\b',
r'\bheart (broke|pounded|raced|sank)',
r'\bworld (collapsed|shattered|ended)',
# ...
]
def detect_emotion_embargo(text, lang):
"""Returns True if NO direct emotion labels are found (rule respected)."""
patterns = EMOTION_LABELS_TR if lang == 'tr' else EMOTION_LABELS_EN
for p in patterns:
if re.search(p, text, re.IGNORECASE):
return False
return True
def compute_applied_rules(scene):
target = scene.get('target_output', '')
lang = scene.get('language', 'en')
pm = scene.get('physical_matrix', {})
rules = {
'duygu_ambargosu': detect_emotion_embargo(target, lang),
'benzetme_yasagi': detect_simile_prohibition(target, lang),
'maddelesen_metaforlar': detect_materialized_metaphor(target, pm),
'mikro_odak': detect_micro_focus(target, lang, pm),
'zamansal_capa': detect_temporal_anchor(target),
'atmosfer_celiskisi': detect_atmosphere_contradiction(target, lang, pm),
}
active = [k for k, v in rules.items() if v]
rules['active_count'] = len(active)
rules['primary_rule'] = active[0] if active else None
return rules
A rule-based checker is a blunt instrument by design, and the repository says so per rule rather than advertising a single accuracy number. Detection rate on the compliant target outputs, by tier:
| Rule | Detection rate | Tier |
|---|---|---|
benzetme_yasagi (Simile Prohibition) |
99.6% | high — deterministic lexicon |
duygu_ambargosu (Emotion Embargo) |
97.2% | high — deterministic lexicon |
zamansal_capa (Temporal Anchor) |
76.0% (EN 67 / TR 82) | moderate — structural patterns |
maddelesen_metaforlar (Materialized Metaphor) |
70.0% | moderate — matrix-grounded heuristic |
mikro_odak (Micro-Focus / Ng) |
63.6% (EN 41 / TR 79) | moderate — needs explicit Ng field |
atmosfer_celiskisi (Atmosphere Contradiction) |
9.8% | conservative by design |
The last row is the honest one. Atmosphere Contradiction encodes a semantic authorial choice that regex cannot reliably see, so the pipeline is tuned to favour false negatives over false positives — it would rather miss the rule than wrongly claim it, in order to keep the dataset's positive labels trustworthy. The EN/TR micro-focus gap is a pipeline asymmetry (the TR schema carries an explicit Ng field), not a quality gap between sub-corpora. Every one of these numbers is reproducible from apply_rules.py, and any researcher can contest a specific call. This is the deliberate trade-off of choosing an auditable rule-based annotator over a black-box learned scorer — and it is precisely why the next section exists.
7.2 Hard negatives: outputs that fool the checker
The most valuable supervision signal here is not the gap between obviously-bad and good. It is the gap between almost-right and right. The dataset includes a hard_negatives set across five violation types, each engineered to pass a naïve compliance check while still failing the methodology:
| Type | What it does | Why a regex misses it |
|---|---|---|
implicit_emotion_adjective |
loads affect onto modifiers, not nouns | "afraid" never appears; "menacingly" does |
hidden_simile |
metaphor without like/as if |
no comparative marker to match |
pseudo_objective |
numbers + emotional metaphor mixed | the digits pass; the metaphor doesn't |
inferred_emotion_cliché |
a run of physical clichés | each token is "physical," the stack is a label |
atmospheric_anchoring |
atmosphere confirms the emotion (Rule 6 inverted) | nothing illegal at token level |
A worked example (hn_en_001, fear, implicit_emotion_adjective):
Hard negative: "Twenty-three ten. Elif stopped at the corridor's edge. The fluorescent tube flickered nervously. The darkness stretched menacingly ahead. The far door waited ominously. She moved her foot cautiously — the corridor was six metres long."
On the surface this is compliant: a timestamp, measurements, physical lighting. But four adverbs pin affect onto inanimate objects — the objects are afraid for the character. A word-boundary check for afraid/scared/terrified finds nothing; the load is in the modifiers.
Target: "Twenty-three ten. Elif stopped at the corridor's edge. The fluorescent flickered twice per second. The corridor was six metres long; the door at the far end was closed. She took the first step with her right foot. The second step came seven seconds later. The fluorescent went dark once, then back on. Elif did not stop. She had stopped already."
Same scene, measurable instead of evaluative: a 2 Hz flicker, a seven-second inter-step interval, and a closing motion-state paradox that does the work without naming it. Each hard negative carries a violation_explanation, a list of violation_markers, and an engineering_note explaining the teaching value.
7.3 Pattern F and a declared data gap
The hardest case is atmospheric_anchoring. Consider a morning-after-a-breakup scene where grey sky = grief, rain = grief, cold half of the bed = grief — five atmospheric elements all pulling the same emotional vector. That is Rule 6 inverted: atmosphere as amplifier instead of contradiction. The compliant repair introduces what the methodology calls Pattern F (Mundane Parallel Life) — an indifferent fragment of someone else's ordinary world:
"From the apartment to the street someone stepped out; he hopped sideways to keep his slippers dry. Ayça saw him through the window, then could not see him."
A neighbour continuing his ordinary life breaks the absoluteness of the grief. Pattern F is one cell of a six-pattern typology (A–F) derived by manual review of the 49 corpus scenes that apply_rules.py flagged for atmosphere contradiction:
| Pattern | Definition | Pure corpus examples |
|---|---|---|
| A — Living Indifferent Witness | animal/insect/biological reflex unaware of the drama | 3 (ant, butterfly, sleeping child) |
| B — Professional Routine | a role-bound human whose profession is indifferent to the private moment | 1 (waiter) |
| C — Small Physical Reality | a micro physical event obeying physics independent of the scene | 2 (single raindrop, warm duvet) |
| D — Protocol / Expectation Break | a missing or dimensionless response where one was expected | 3 (silence after reunion, AI's flat reply) |
| E — Threat Absence (borderline) | expected reference points become functionally meaningless | 1 |
| F — Mundane Parallel Life (new) | a non-role-bound human in everyday context (the slipper-hopping neighbour) | 0 |
Pattern F has a precise structural signature — the dataset specifies it as testable criteria, not vibes:
| Criterion | Requirement | In the example |
|---|---|---|
subject_independence |
the fragment is not caused by the protagonist | the hop happens while she lies motionless |
mundane_category |
a reflex/role, not a symbol | keeping slippers dry is biology, not meaning |
metaphor_load |
0.0 |
slippers are slippers; no symbolic charge |
scale_mismatch_ratio |
high | breakup (life-stake) vs. wet slippers (near-zero) |
transience |
no interaction with the protagonist | he enters the frame, exists, leaves |
Here is the part that matters for honest dataset design: of the six patterns, five have corpus examples and Pattern F has zero — and the repository says so explicitly. Rather than back-fill synthetic instances to hide the one empty cell, the gap is declared (closing it with ten pure examples is a stated v8 priority). The hard-negative records for this type even include a target_output_mechanical_F variant — a scene that wears Pattern F's surface (a distant dog barking) but fails the signature (the protagonist listens to it, breaking transience) — so a model can learn the difference between the real structure and its imitation.
There is also a defense against a specific failure of edit/rewrite tasks. Each load-bearing span is tagged with a load_bearing_elements field marking do_not_remove: true and the consequence of removal. This is a guard against summarization bias: models tend to cut exactly the "off-topic" Pattern F detail that carries the Rule 6 dimension, quietly collapsing the scene back into pure atmospheric anchoring.
7.4 An isolation set for the confounding problem
A fair reviewer objection: if all six variables move at once, which one produces the effect? The answer is a 60-scene controlled ablation set — ten scenes per variable, five held constant and one varied, with the constancy marked inside the prose ("The room remained at 20 °C. The engine sound stayed level."). Each block contains two baselines (a neutral reference and a sub-threshold boundary control), four low-intensity and four high-intensity variations, including a reverse-direction control. This lets a researcher attribute an effect to a single channel.
7.5 Full TR ↔ EN parallelisation
The 300 originally-Turkish scenes have English parallels — reconstructions, not translations, because temporal anchors, word economy, and atmosphere contradictions resolve differently per language. Coverage is bijective (300/300), enabling cross-linguistic compliance testing and bilingual fine-tuning.
7.6 A null result, shipped as a null result
The repository's academic/ folder includes a registered pilot report on Narrative Entropy (Sₙ), the methodology's candidate measure of measurable-detail density (DOI: 10.5281/zenodo.20362901). The first attempt to compute the candidate Sₙ formula on real texts — the opening of Reservoir Dogs and the opening block of Carver's Cathedral — produced a result contrary to the intuitive expectation: the single-voice monologue scored higher (Sₙ = 30.0) than the nine-character dialogue scene (Sₙ = 18.8).
The report does not adjust the formula to rescue the intuition. It names three competing interpretations of the divergence, refuses post-hoc tuning, and pre-registers the next validation stage (extension to four scenes, inter-rater agreement, biophysical validation). It is, by the author's own framing, negative/inconclusive by design. Most dataset cards quietly omit the experiments that did not behave; this one ships one as a worked example of the project's empirical commitments. For a fine-tuning audience that is the relevant signal: the metric labels in this corpus are not back-fitted to look good.
8. Using it
Minimal prompt:
Write using Levent Bulut's Objective Projection method.
Parameterized prompt:
Temperature 28.4°C, single exit 4.7 metres behind, 40-watt bulb, no sound.
Write a scene in this environment.
Do not use abstract emotion names. Do not use similes.
Encode through physical observation only.
The bad_output / target_output pairs are directly usable as SFT data; the applied_rules and annotation fields support filtering, curriculum design, and reward shaping. If you redistribute outputs, the license asks for attribution: "Write using Levent Bulut's Objective Projection method. DOI: 10.57967/hf/8960 | leventbulut.com | ORCID: 0009-0007-7500-2261."
9. Empirical validation: OPCT v2.0
The central claim — scenes built from the same physical parameters produce statistically convergent autonomic responses across readers from different cultural backgrounds — is probabilistic, not deterministic, and therefore testable. The pre-registered protocol (OPCT v2.0) is in the repository in machine-readable form:
| Parameter | Specification |
|---|---|
| Pre-registration | OSF: osf.io/us8bw · DOI 10.5281/zenodo.19415236 |
| Sample size | n = 80 (power ≥ 0.80 at medium effect size) |
| Measurements | ECG (HRV), galvanic skin response, pupillometry |
| Primary analysis | mixed-effects model: author variance vs. OP effect |
| Success criterion | p < 0.05 convergent ANS activation across authors and cultural groups |
| Falsification | author effect p < 0.05, or Cohen's d < 0.3, or blind replication fails |
Anyone with the equipment can run an independent trial. The falsification criteria are stated in advance.
10. Limitations (stated, not hidden)
This is a research-prototype dataset, and treating it as anything more would be dishonest:
- No physiological data yet. The autonomic-convergence claim is the hypothesis OPCT v2.0 is designed to test; it has not yet been run. Until then, the low-road account is a well-motivated framework, not a measured result.
- The annotation pipeline is a blunt rule-based instrument. It catches surface violations and will both miss subtle ones and, occasionally, over-flag. The hard-negative set exists precisely because the checker is not the methodology.
- Pattern F has zero pure corpus examples. Declared, not patched.
- Single-author authorship of the target outputs. Cross-author generation under one Physical Matrix is part of the OPCT design, not yet the corpus.
A roadmap toward physiological mapping tables (post-OPCT) and corpus expansion is in the README.
A short note on where this came from
The methodology did not start at a desk. It started reading to my daughter, Ayça. I kept noticing that the books ordered her what to feel — "Ali was very sad," "Ayşe trembled with fear" — delivering emotion in pre-packaged form and building a wall around her imagination. I wanted prose that let the body arrive at the feeling on its own. Six variables and a dataset later, that instinct is now something I can hand to a fine-tuning run and, eventually, to an ECG.
Links and citation
- Dataset: https://huggingface.co/datasets/leventbulut/objective-projection
- DOI (primary): 10.57967/hf/8960 · Zenodo archive: 10.5281/zenodo.19511369
- ORCID: 0009-0007-7500-2261 · Archive: leventbulut.com
- License: CC BY-NC-ND 4.0
- Companion papers (in repo
academic/): Beyond the Cortical Label — methodology framework (10.5281/zenodo.18689179) · Operationalizing Narrative Entropy (Sₙ) — registered pilot (10.5281/zenodo.20362901) · OPCT v2.0 protocol (10.5281/zenodo.19415236). ACITATION.cfffile is included, so the repository surfaces a one-click "Cite this dataset" affordance.
@dataset{bulut2026op,
author = {Bulut, Levent},
title = {Objective Projection Dataset: The Bulut Doctrine Narrative Engineering Corpus},
year = {2026},
publisher = {Hugging Face Datasets},
url = {https://huggingface.co/datasets/leventbulut/objective-projection},
doi = {10.57967/hf/8960},
orcid = {0009-0007-7500-2261}
}
References. LeDoux, J. E. (1992). Emotion and the Limbic System Concept. Concepts in Neuroscience. · Eliot, T. S. (1919). Hamlet and His Problems. · Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal.
A closing question for the comments: was there a sentence you read this year that made you aware of your own pulse changing? What physical parameter was in it — a temperature, a distance, a light? Good candidates may end up in the next version of the dataset.
