ryeyoo
/

sentimentizer-encoder

Text Classification

sentiment-analysis

Model card Files Files and versions

sentimentizer-encoder / README.md

ryeyoo's picture

Update encoder model card

3df0af0 verified 4 days ago

|

history blame contribute delete

1.57 kB

	---
	language: en
	license: mit
	tags:
	- sentiment-analysis
	- text-classification
	- encoder
	library_name: sentimentizer
	task: text-classification
	---
	# Sentimentizer ENCODER Sentiment Model
	## Description

	A Transformer Encoder for sentiment classification built on pre-trained GloVe embeddings. The model uses multi-head self-attention with positional encodings and a classification token (CLS) to produce a sentiment score.

	## Training Data

	Trained on the [Yelp Open Dataset](https://www.yelp.com/dataset) reviews, with GloVe Wiki-Gigaword-100 pre-trained embeddings. Reviews are tokenized with a custom dictionary (20k vocab, min frequency 3) and padded/truncated to 200 tokens.

	## Usage

	```python
	from sentimentizer.hf import download_weights
	from sentimentizer.config import DriverConfig, weights_path_for

	# Download weights + dictionary from Hugging Face Hub
	weights_path = weights_path_for("encoder")
	download_weights(
	"encoder",
	weights_path,
	dict_path=DriverConfig.files.dictionary_file_path,
	)

	# Load and run inference
	from sentimentizer.models.encoder import get_trained_model
	from sentimentizer.tokenizer import get_trained_tokenizer

	model = get_trained_model(device="cpu")
	tokenizer = get_trained_tokenizer()

	import numpy as np
	token_ids = tokenizer.tokenize_text("amazing food great service")
	score = model.predict(token_ids)
	print(f'Sentiment score: {score.item():.4f}') # >0.5 = positive, <0.5 = negative
	```

	## Files

	- `encoder_weights.pth` — Model state dictionary
	- `yelp.dictionary` — Gensim dictionary for tokenization