Paper: arXiv:2606.19468
Collection: Narratives in LLM Pretraining Data
Test it out for yourself: NarraBERT space

narrative-likert-roberta

RoBERTa-base fine-tuned for 9-dimensional narrative Likert regression (agency: focalization, emotion, cognition, change_of_state, conflict; setting: concreteness, temporal_grounding, spatial_grounding, sensory). Trained on LLM pseudo-labels (Gemma-4-31B) with held-out human gold evaluation.

Note: Full model card with training details coming soon.

Loading

Download model.pt and tokenizer/ from this repo, then:

import torch
from transformers import AutoModel, AutoTokenizer
from torch import nn

class NarrativeRoBERTa(nn.Module):
    def __init__(self, model_name, n_dims):
        super().__init__()
        self.backbone = AutoModel.from_pretrained(model_name)
        hidden = self.backbone.config.hidden_size
        self.heads = nn.ModuleList([nn.Linear(hidden, 1) for _ in range(n_dims)])

    def forward(self, input_ids, attention_mask):
        cls = self.backbone(input_ids=input_ids, attention_mask=attention_mask).last_hidden_state[:, 0, :]
        return torch.cat([h(cls) for h in self.heads], dim=1)

tokenizer = AutoTokenizer.from_pretrained("tokenizer/")
model = NarrativeRoBERTa("roberta-base", n_dims=9)
model.load_state_dict(torch.load("model.pt", map_location="cpu", weights_only=True))
model.eval()

Config

{
  "model_name": "roberta-base",
  "max_len": 256,
  "dims": [
    "temporal_sequential",
    "causal"
  ],
  "data_source": "/projects/tejo9855/Projects/llm-narrative-annotations/event_relation/outputs/google_gemma-4-31B-it/20260518_143249",
  "n_train": 6219,
  "n_val": 690,
  "val_frac": 0.1,
  "best_epoch": 4,
  "seed": 42,
  "test_f1_gold": 0.805
}

Downloads last month: 31

Model tree for teagrjohnson/narrative-likert-roberta

Base model

FacebookAI/roberta-base

Finetuned

(2344)

this model

Dataset used to train teagrjohnson/narrative-likert-roberta

Space using teagrjohnson/narrative-likert-roberta 1

Collection including teagrjohnson/narrative-likert-roberta

Narratives in LLM Pretraining Data

Collection

Models & datasets from Characterizing Narrative Content in Web-Scale LLM Pretraining Data (NarraDolma & NarraBERT) • 7 items • Updated about 16 hours ago • 2

Paper for teagrjohnson/narrative-likert-roberta

Characterizing Narrative Content in Web-scale LLM Pretraining Data

Paper • 2606.19468 • Published 3 days ago • 1