Sentimentizer ENCODER Sentiment Model
Description
A Transformer Encoder for sentiment classification built on pre-trained GloVe embeddings. The model uses multi-head self-attention with positional encodings and a classification token (CLS) to produce a sentiment score.
Training Data
Trained on the Yelp Open Dataset reviews, with GloVe Wiki-Gigaword-100 pre-trained embeddings. Reviews are tokenized with a custom dictionary (20k vocab, min frequency 3) and padded/truncated to 200 tokens.
Usage
from sentimentizer.hf import download_weights
from sentimentizer.config import DriverConfig, weights_path_for
# Download weights + dictionary from Hugging Face Hub
weights_path = weights_path_for("encoder")
download_weights(
"encoder",
weights_path,
repo_id="ryeyoo/sentimentizer-encoder",
dict_path=DriverConfig.files.dictionary_file_path,
)
# Load and run inference
from sentimentizer.models.encoder import get_trained_model
from sentimentizer.tokenizer import get_trained_tokenizer
model = get_trained_model(device="cpu")
tokenizer = get_trained_tokenizer()
probs = model.predict_text('amazing food great service')
for label, prob in sorted(probs.items(), key=lambda x: -x[1]):
print(f'{label}: {prob:.4f}')
# e.g. positive: 0.8300, neutral: 0.1200, negative: 0.0500
Files
encoder_weights.pthโ Model state dictionaryyelp.dictionaryโ Gensim dictionary for tokenization