Earnings-Call Word2Vec

Word2Vec word embeddings (gensim, CBOW, 300 dimensions, window 5) trained on the question-and-answer portions of quarterly earnings-call transcripts. The repository holds one model per year (2022-2025); each year continues training from the previous year's model. The 2025 bigram and trigram phrase models are included so that new text can be preprocessed into the same multi-word tokens that appear in the vocabulary (multi-word terms are joined with an underscore, e.g. supply_chain).

These embeddings are general purpose. Measuring corporate culture (Li, Mai, Shen, and Yan, 2021) is one downstream use; the vectors can be applied to any task over earnings-call or similar business text.

Files

File Contents
word2vec_YYYY.kv (+ .kv.vectors.npy) KeyedVectors for year YYYY (2022-2025): word vectors + vocabulary
phrases_bigram_2025.mod gensim Phrases model that joins two-word phrases
phrases_trigram_2025.mod gensim Phrases model that joins three-word phrases

Usage

from huggingface_hub import hf_hub_download
from gensim.models import KeyedVectors

# KeyedVectors.load needs the .vectors.npy file alongside the .kv file
kv_path = hf_hub_download("maifeng/earnings-call-word2vec", "word2vec_2025.kv")
hf_hub_download("maifeng/earnings-call-word2vec", "word2vec_2025.kv.vectors.npy")
kv = KeyedVectors.load(kv_path)
kv.most_similar("innovation")

To tokenize new text the same way before looking up vectors, apply the phrase models:

from huggingface_hub import hf_hub_download
from gensim.models.phrases import Phrases

bigram = Phrases.load(hf_hub_download("maifeng/earnings-call-word2vec", "phrases_bigram_2025.mod"))
trigram = Phrases.load(hf_hub_download("maifeng/earnings-call-word2vec", "phrases_trigram_2025.mod"))
tokens = trigram[bigram["the supply chain was disrupted".split()]]

Method and reference

The training pipeline (preprocessing, phrase detection, Word2Vec, seed-word dictionary expansion) is available as the lmsy_w2v_rfs package.

Kai Li, Feng Mai, Rui Shen, and Xinyan Yan, "Measuring Corporate Culture Using Machine Learning," The Review of Financial Studies, 2021. DOI: 10.1093/rfs/hhaa079.

License

Released under CC-BY-NC-4.0: non-commercial reuse and redistribution are permitted with attribution to the reference above.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support