alikLab
/

NoLBERT

@@ -9,7 +9,7 @@ tags:
 ---
 # NoLBert: A Time-Stamped Pre-Trained LLM
-**NoLBERT** (No Lookahead(back) bias bidirectional encoder representation from transformers) is a foundational transformer-based language model specifically trained to avoid both lookahead and lookback bias.
 **<span style='color: blue;'>Lookahead</span> bias** is a fundamental challenge when researchers and practitioners use inferences from language models for forecasting. For example, when we ask a language model to infer the short-term return of a stock given a set of news articles, a concern is that the model may have been trained on data that include future information beyond the point in time when the news articles were released. As a result, the nature of the task changes from drawing return-related inference from text to retrieving the date of the news articles and the realized returns of the particular stock shortly after that date. Consequently, this approach becomes invalid in practice when using such models to predict stock returns beyond the training data's coverage period. To frame the task as one of natural language inference, we pre-train a new text encoder using data strictly from 1976 to 1995. Therefore, our model exhibits no lookahead bias when backtesting trading strategies using data from 1996 onward or when performing other time series forecasting tasks using text data.

 ---
 # NoLBert: A Time-Stamped Pre-Trained LLM
+**NoLBERT** (No Lookahead(back) bias bidirectional encoder representation from transformers) is a foundational transformer-based language model specifically trained to avoid both lookahead and lookback bias. Moreover, it is trained using a small time-restricted dataset and the model size is small.
 **<span style='color: blue;'>Lookahead</span> bias** is a fundamental challenge when researchers and practitioners use inferences from language models for forecasting. For example, when we ask a language model to infer the short-term return of a stock given a set of news articles, a concern is that the model may have been trained on data that include future information beyond the point in time when the news articles were released. As a result, the nature of the task changes from drawing return-related inference from text to retrieving the date of the news articles and the realized returns of the particular stock shortly after that date. Consequently, this approach becomes invalid in practice when using such models to predict stock returns beyond the training data's coverage period. To frame the task as one of natural language inference, we pre-train a new text encoder using data strictly from 1976 to 1995. Therefore, our model exhibits no lookahead bias when backtesting trading strategies using data from 1996 onward or when performing other time series forecasting tasks using text data.