BiTimeBERT

BiTimeBERT is pretrained on the New York Times Annotated Corpus using two temporal objectives: TAMLM (Time-aware Masked Language Modeling) and DD (Document Dating). Note that the DD task employs monthly temporal granularity, classifying documents into 246 month labels spanning the corpus timeline, and thus, the seq_relationship_head outputs 246-class temporal predictions.

Based on my search results, here's how to properly link/mention the SIGIR paper for BiTimeBERT:

📄 Official Paper Citation & Links

BiTimeBERT: Extending Pre-Trained Language Representations with Bi-Temporal Information
Jiexin Wang, Adam Jatowt, Masatoshi Yoshikawa, Yi Cai
SIGIR '23: 46th International ACM SIGIR Conference, Taipei, Taiwan, July 2023

🔗 ACM Digital Library (official publication):
https://dl.acm.org/doi/10.1145/3539618.3591686

🔗 Code Repository (GitHub):
https://github.com/WangJiexin/BiTimeBERT

🎯 Model Details

Property	Value
Base Model	`bert-base-cased`
Pretraining Tasks	TAMLM + DD
Temporal Granularity	Month-level
DD Labels	246 month classes
Training Corpus	NYT Annotated Corpus
Framework	PyTorch / Transformers
Language	English

🚀 How to Load This Model

⚠️ Important: Custom Loading Required

Due to the modified seq_relationship head (246-class vs. standard 2-class NSP), you cannot load this model with the default from_pretrained() alone. Follow one of the methods below:

You can use Helper Function as below to load BiTimeBERT

import torch
import torch.nn as nn
from transformers import BertForPreTraining, BertTokenizer, BertConfig
from huggingface_hub import hf_hub_download
import safetensors.torch as safetensors_lib

def load_bitembert(model_id="JasonWang1/BiTimeBERT", device=None, num_temporal_labels=246):
    if device is None:
        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    
    # Load config and tokenizer
    config = BertConfig.from_pretrained(model_id)
    tokenizer = BertTokenizer.from_pretrained(model_id)
    
    # Load model with mismatched sizes ignored
    model = BertForPreTraining.from_pretrained(
        model_id, 
        config=config,
        ignore_mismatched_sizes=True  
    )
    
    # Replace DD head with correct dimension
    model.cls.seq_relationship = nn.Linear(config.hidden_size, num_temporal_labels)
    
    # Download and load DD head weights from safetensors
    weights_path = hf_hub_download(repo_id=model_id, filename="model.safetensors")
    state_dict = safetensors_lib.load_file(weights_path, device='cpu')
    
    if 'cls.seq_relationship.weight' in state_dict:
        model.cls.seq_relationship.weight.data = state_dict['cls.seq_relationship.weight']
        model.cls.seq_relationship.bias.data = state_dict['cls.seq_relationship.bias']
    
    model.eval()
    return model.to(device), tokenizer

# ================= Usage =================
model, tokenizer = load_bitembert("JasonWang1/BiTimeBERT")

Downloads last month: 43

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JasonWang1/BiTimeBERT

Base model

google-bert/bert-base-cased

Finetuned

(2906)

this model