marcovise
/

TextEmbedding3SmallSentimentHead

Text Classification

feature-extraction

sentiment-analysis

openai-embeddings

Model card Files Files and versions

marcovise commited on Aug 20, 2025

Commit

297586a

·

verified ·

1 Parent(s): cae8c4c

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +75 -0

README.md ADDED Viewed

	@@ -0,0 +1,75 @@

+---
+license: mit
+tags:
+- sentiment-analysis
+- text-classification
+- openai-embeddings
+- pytorch
+pipeline_tag: text-classification
+library_name: transformers
+---
+# TextEmbedding3SmallSentimentHead
+In case you needed a sentiment analysis classifier on top of embeddings from OpenAI embeddings model.
+## Model Description
+- **What this is**: A compact PyTorch classifier head trained on top of `text-embedding-3-small` (1536-dim) to predict sentiment: negative, neutral, positive.
+- **Data**: Preprocessed from the [Kaggle Sentiment Analysis Dataset](https://www.kaggle.com/datasets/abhi8923shriv/sentiment-analysis-dataset).
+- **Metrics (val)**: **F1 macro ≈ 0.89**, **Accuracy ≈ 0.89** on a held-out validation split.
+- **Architecture**: Simple MLP head (256 hidden units, dropout 0.2), trained for 5 epochs with Adam.
+## Input/Output
+- **Input**: Float32 tensor of shape `[batch, 1536]` (OpenAI text-embedding-3-small embeddings).
+- **Output**: Logits over 3 classes. Argmax → {0: negative, 1: neutral, 2: positive}.
+## Usage
+```python
+from transformers import AutoModel
+import torch
+# Load model
+model = AutoModel.from_pretrained(
+    "marcovise/TextEmbedding3SmallSentimentHead",
+    trust_remote_code=True
+).eval()
+# Your 1536-dim OpenAI embeddings
+embeddings = torch.randn(4, 1536)  # batch of 4 examples
+# Predict sentiment
+with torch.no_grad():
+    logits = model(inputs_embeds=embeddings)["logits"]  # [batch, 3]
+    predictions = logits.argmax(dim=1)  # [batch]
+    # 0=negative, 1=neutral, 2=positive
+print(predictions)  # tensor([1, 0, 2, 1])
+```
+## Training Details
+- **Training data**: Kaggle Sentiment Analysis Dataset
+- **Preprocessing**: Text → OpenAI embeddings → 3-class labels {negative: 0.0, neutral: 0.5, positive: 1.0}
+- **Architecture**: 1536 → 256 → ReLU → Dropout(0.2) → 3 classes
+- **Optimizer**: Adam (lr=1e-3, weight_decay=1e-4)
+- **Loss**: CrossEntropyLoss with label smoothing (0.05)
+- **Epochs**: 5
+## Intended Use
+- Quick, lightweight sentiment classification for short text once embeddings are available.
+- Works well for general sentiment analysis tasks similar to the training distribution.
+## Limitations
+- Trained on a specific sentiment dataset; may have domain bias.
+- Requires OpenAI text-embedding-3-small embeddings as input.
+- Not safety-critical; evaluate before production use.
+- May reflect biases present in the training data.
+## License
+MIT