Earnings-Call Word2Vec
Word2Vec word embeddings (gensim, CBOW, 300 dimensions, window 5) trained on the
question-and-answer portions of quarterly earnings-call transcripts. The repository
holds one model per year (2022-2025); each year continues training from the previous
year's model. The 2025 bigram and trigram phrase models are included so that new text
can be preprocessed into the same multi-word tokens that appear in the vocabulary
(multi-word terms are joined with an underscore, e.g. supply_chain).
These embeddings are general purpose. Measuring corporate culture (Li, Mai, Shen, and Yan, 2021) is one downstream use; the vectors can be applied to any task over earnings-call or similar business text.
Files
| File | Contents |
|---|---|
word2vec_YYYY.kv (+ .kv.vectors.npy) |
KeyedVectors for year YYYY (2022-2025): word vectors + vocabulary |
phrases_bigram_2025.mod |
gensim Phrases model that joins two-word phrases |
phrases_trigram_2025.mod |
gensim Phrases model that joins three-word phrases |
Usage
from huggingface_hub import hf_hub_download
from gensim.models import KeyedVectors
# KeyedVectors.load needs the .vectors.npy file alongside the .kv file
kv_path = hf_hub_download("maifeng/earnings-call-word2vec", "word2vec_2025.kv")
hf_hub_download("maifeng/earnings-call-word2vec", "word2vec_2025.kv.vectors.npy")
kv = KeyedVectors.load(kv_path)
kv.most_similar("innovation")
To tokenize new text the same way before looking up vectors, apply the phrase models:
from huggingface_hub import hf_hub_download
from gensim.models.phrases import Phrases
bigram = Phrases.load(hf_hub_download("maifeng/earnings-call-word2vec", "phrases_bigram_2025.mod"))
trigram = Phrases.load(hf_hub_download("maifeng/earnings-call-word2vec", "phrases_trigram_2025.mod"))
tokens = trigram[bigram["the supply chain was disrupted".split()]]
Method and reference
The training pipeline (preprocessing, phrase detection, Word2Vec, seed-word dictionary
expansion) is available as the lmsy_w2v_rfs
package.
Kai Li, Feng Mai, Rui Shen, and Xinyan Yan, "Measuring Corporate Culture Using Machine Learning," The Review of Financial Studies, 2021. DOI: 10.1093/rfs/hhaa079.
License
Released under CC-BY-NC-4.0: non-commercial reuse and redistribution are permitted with attribution to the reference above.