GloVe 840B 300d — Gensim KeyedVectors Format

This is a Gensim KeyedVectors conversion of the standard GloVe 840B 300d embeddings by Pennington, Socher, & Manning (2014).

Original Source

The original GloVe model is available from Stanford NLP:

Download: https://nlp.stanford.edu/data/glove.840B.300d.zip
Project page: https://nlp.stanford.edu/projects/glove/
Paper: Pennington, J., Socher, R., & Manning, C.D. (2014). GloVe: Global Vectors for Word Representation. EMNLP 2014. https://doi.org/10.3115/v1/D14-1162

Model Details

Training corpus: Common Crawl (840 billion tokens)
Vocabulary: 2.2 million words
Dimensions: 300
Format: Gensim KeyedVectors (.wv + .wv.vectors.npy)

Conversion

Converted from the original GloVe text format using:

from gensim.scripts.glove2word2vec import glove2word2vec
from gensim.models import KeyedVectors

glove2word2vec("glove.840B.300d.txt", "glove.840B.300d.w2v.txt")
model = KeyedVectors.load_word2vec_format("glove.840B.300d.w2v.txt", binary=False)
model.save("glove.840B-300d.wv")

Usage in OCS Semantic Scoring

This is the default model for the Open Creativity Scoring semantic distance approach. Normalization values for scaling raw cosine distances to a 1–7 range:

min: 0.6456
max: 0.9610

Calibrated in Dumas, D., Organisciak, P., & Doherty, M. (2021). Measuring divergent thinking originality with human raters and text-mining models. Psychology of Aesthetics, Creativity, and the Arts, 15(4), 645–663.

Note

Due to the large file size (~5.4 GB), the gensim-converted model files are not hosted here. To use this model:

Download the original from Stanford NLP (link above)
Convert using the script above
Or use the OCS Semantic Scoring HF Space, which handles model loading automatically

For LLM-based creativity scoring (recommended for new research), see the ocsai Python package.

Downloads last month: -; Downloads are not tracked for this model. How to track

massivetexts
/

glove-840b-gensim