RoBERTa-based Eye-Tracking (ET) Feature Generator

Overview

This repository contains the weights, tokenizer, and architecture for a custom regression model based on roberta-base. It is designed to predict 5 distinct eye-tracking (ET) features directly from text inputs. This model was trained to serve as the ET generator component required to replicate and extend the GazeReward framework.

Reference

This model replicates the ET generator mentioned in the following work:

"Through ablation studies we test our framework with different integration methods, LLMs, and ET generator models..." (Lopez-Cardona et al., "SEEING EYE TO AI: HUMAN ALIGNMENT VIA GAZE-BASED RESPONSE REWARDS FOR LARGE LANGUAGE MODELS")

Model Architecture

Base Model: roberta-base
Custom Head: A linear layer that outputs 5 continuous ET features.
Implementation: The exact architecture is defined in the accompanying model.py file.

Training Data

The model was fine-tuned using eye-tracking data from:

ZuCo 2.0 Dataset (CC BY-NC 4.0)
Provo Corpus

How to Use & Test

To use this model, download the weights (.safetensors) and the custom architecture script (model.py). You must use the safetensors library to load the weights. The tokenizer is included in this repository and can be loaded directly from the Hub.

Below is a complete script to download the model, load the weights and tokenizer, and run a quick inference test to verify everything works correctly. The code includes the required masking for special tokens.

# File: test_inference.py
# Downloads the custom ET generator model and tokenizer from the Hugging Face Hub, loads them using safetensors, and runs a test inference.

import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from model import RobertaRegressionModel

def run_quick_test(repo_id="skboy/et_prediction_2", filename="et_predictor2_seed123.safetensors"):
    # Downloads weights, initializes the custom model, applies the tokenizer from the same repo, and prints the output tensor.
    weights_path = hf_hub_download(repo_id=repo_id, filename=filename)
    
    model = RobertaRegressionModel()
    state_dict = load_file(weights_path, device="cpu")
    model.load_state_dict(state_dict)
    model.eval()

    tokenizer = AutoTokenizer.from_pretrained(repo_id)
    
    sample_text = "This is a test sentence for eye-tracking feature generation."
    inputs = tokenizer(sample_text, return_tensors="pt")
    
    input_ids = inputs["input_ids"]
    attention_mask = inputs["attention_mask"]
    
    predict_mask = attention_mask.clone()
    predict_mask[0, 0] = 0
    predict_mask[0, -1] = 0

    with torch.no_grad():
        output = model(input_ids=input_ids, attention_mask=attention_mask, predict_mask=predict_mask)

    print(f"Output shape: {output.shape}") 
    print(f"Output tensor:\n{output}")

if __name__ == "__main__":
    run_quick_test()

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support