YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
RoBERTa-based Eye-Tracking (ET) Feature Generator
Overview
This repository contains the weights, tokenizer, and architecture for a custom regression model based on roberta-base. It is designed to predict 5 distinct eye-tracking (ET) features directly from text inputs. This model was trained to serve as the ET generator component required to replicate and extend the GazeReward framework.
Reference
This model replicates the ET generator mentioned in the following work:
"Through ablation studies we test our framework with different integration methods, LLMs, and ET generator models..." (Lopez-Cardona et al., "SEEING EYE TO AI: HUMAN ALIGNMENT VIA GAZE-BASED RESPONSE REWARDS FOR LARGE LANGUAGE MODELS")
Model Architecture
- Base Model:
roberta-base - Custom Head: A linear layer that outputs 5 continuous ET features.
- Implementation: The exact architecture is defined in the accompanying
model.pyfile.
Training Data
The model was fine-tuned using eye-tracking data from:
- ZuCo 2.0 Dataset (CC BY-NC 4.0)
- Provo Corpus
How to Use & Test
To use this model, download the weights (.safetensors) and the custom architecture script (model.py). You must use the safetensors library to load the weights. The tokenizer is included in this repository and can be loaded directly from the Hub.
Below is a complete script to download the model, load the weights and tokenizer, and run a quick inference test to verify everything works correctly. The code includes the required masking for special tokens.
# File: test_inference.py
# Downloads the custom ET generator model and tokenizer from the Hugging Face Hub, loads them using safetensors, and runs a test inference.
import torch
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from model import RobertaRegressionModel
def run_quick_test(repo_id="skboy/et_prediction_2", filename="et_predictor2_seed123.safetensors"):
# Downloads weights, initializes the custom model, applies the tokenizer from the same repo, and prints the output tensor.
weights_path = hf_hub_download(repo_id=repo_id, filename=filename)
model = RobertaRegressionModel()
state_dict = load_file(weights_path, device="cpu")
model.load_state_dict(state_dict)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(repo_id)
sample_text = "This is a test sentence for eye-tracking feature generation."
inputs = tokenizer(sample_text, return_tensors="pt")
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]
predict_mask = attention_mask.clone()
predict_mask[0, 0] = 0
predict_mask[0, -1] = 0
with torch.no_grad():
output = model(input_ids=input_ids, attention_mask=attention_mask, predict_mask=predict_mask)
print(f"Output shape: {output.shape}")
print(f"Output tensor:\n{output}")
if __name__ == "__main__":
run_quick_test()