ojhfklsjhl commited on
Commit
edb3859
·
verified ·
1 Parent(s): e5492d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -6
README.md CHANGED
@@ -101,14 +101,12 @@ print(f"Text: {text}")
101
  print(f"Embedding (first 10 dimensions): {cls_embedding[:10].tolist()}")
102
  ```
103
 
104
- ### A discussion about model choice
105
 
106
  Even though NoLBERT has the advantage of no lookahead and lookback bias, researchers should carefully consider their model choice on a case-by-case basis, especially for long texts.
107
 
108
  In particular, there is a bias–performance trade-off between NoLBERT or other custom small models (or simpler NLP methods, e.g., BoW, Word2Vec, etc.) versus large industrial-grade language models. On one hand, a BERT-like custom information-leakage-free model avoids temporal inconsistencies by design. On the other hand, these models lack the ability to process long texts due to limited context windows, and their output text representations are often of lower quality compared to large models trained on unconstrained data.
109
 
110
- % and is not verifiable from the input text,
111
-
112
  The advantage of avoiding temporal biases is pronounced in tasks where models must predict outcomes that go beyond the information explicitly stated in the text, such as forecasting stock price reactions from earnings call transcripts, despite the tradeoff of having less precise text representations. However, for in-context information retrieval tasks such as summarization, classification, and other NLP tasks based on given precise guidelines, the risk of information leakage from the model’s out-of-context knowledge base is limited (with careful prompting and verification, or by using methods like RAG). Therefore, large, highly performant models may be preferable.
113
 
114
 
@@ -119,10 +117,9 @@ If you use this model in your research, please cite:
119
  ```
120
  @misc{nolbert,
121
  author = {Ali Kakhbod, Peiyao Li},
122
- title = {NoLBert: A Time-Stamped Pre-Trained LLM},
123
  year = {2025},
124
- publisher = {Hugging Face},
125
- journal = {Hugging Face Model Hub},
126
  howpublished = {\url{https://huggingface.co/alikLab/NoLBERT}},
127
  }
128
  ```
 
101
  print(f"Embedding (first 10 dimensions): {cls_embedding[:10].tolist()}")
102
  ```
103
 
104
+ ### A note about model choice
105
 
106
  Even though NoLBERT has the advantage of no lookahead and lookback bias, researchers should carefully consider their model choice on a case-by-case basis, especially for long texts.
107
 
108
  In particular, there is a bias–performance trade-off between NoLBERT or other custom small models (or simpler NLP methods, e.g., BoW, Word2Vec, etc.) versus large industrial-grade language models. On one hand, a BERT-like custom information-leakage-free model avoids temporal inconsistencies by design. On the other hand, these models lack the ability to process long texts due to limited context windows, and their output text representations are often of lower quality compared to large models trained on unconstrained data.
109
 
 
 
110
  The advantage of avoiding temporal biases is pronounced in tasks where models must predict outcomes that go beyond the information explicitly stated in the text, such as forecasting stock price reactions from earnings call transcripts, despite the tradeoff of having less precise text representations. However, for in-context information retrieval tasks such as summarization, classification, and other NLP tasks based on given precise guidelines, the risk of information leakage from the model’s out-of-context knowledge base is limited (with careful prompting and verification, or by using methods like RAG). Therefore, large, highly performant models may be preferable.
111
 
112
 
 
117
  ```
118
  @misc{nolbert,
119
  author = {Ali Kakhbod, Peiyao Li},
120
+ title = {NoLBERT: A No Lookahead(back) Foundational Language Model},
121
  year = {2025},
122
+ journal = {NeurIPS 2025 (GenAI in Finance)},
 
123
  howpublished = {\url{https://huggingface.co/alikLab/NoLBERT}},
124
  }
125
  ```