File size: 3,860 Bytes
312633f b36c28a 8654514 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
title: Entity Sentiment Classification
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---
# Entity Sentiment Classification
Classify sentiment (positive, neutral, negative) for named entities in news article text using fine-tuned DistilBERT models.
Three classification modes:
- **marker** β entity wrapped in `[E]...[/E]` special tokens, single-sequence input
- **qa_m** β question-answering multi-class: "What do you think of the sentiment of {entity}?"
- **qa_b** β question-answering binary: three hypotheses per entity, argmax of P(yes)
Plus a **fastText** baseline using marker-mode text.
**Report:** [report.pdf](report.pdf).
**Live demo:** https://huggingface.co/spaces/lamossta/entity-sentiment-classification
**Logs:** https://telemetry.betterstack.com/team/t529434/tail?s=2383648
## Setup
### 1. Environment Variables
Create a `.env` file in the project root:
```
HF_TOKEN=<your huggingface token> (not needed for the inference)
BETTERSTACK_SOURCE_TOKEN=<your betterstack token>
```
### 2. Run
```bash
docker compose up
```
This will install dependencies, download models from HuggingFace, and start the backend and frontend. Everything is accessible on port **7860**:
- Frontend: http://localhost:7860
- API: http://localhost:7860/api/
## API Endpoints
| Endpoint | Method | Description |
|--------------------|---|---|
| `/predict` | POST | Classify entities using the marker model |
| `/predict-all-models` | POST | Classify entities using all available models |
| `/health` | GET | Health check |
| `/docs` | GET | Interactive Swagger UI API docs |
## Sending Requests
### Request Format
```json
[
{
"id": 0,
"text": "Google had solid Q4 2025 earnings but Microsoft's were not great.",
"entities": [
{
"entity_id": 0,
"entity_text": "Google",
"entity_type": "company",
"positions": [
{"position_text": "Google", "length": 6, "offset": 0}
]
},
{
"entity_id": 1,
"entity_text": "Microsoft",
"entity_type": "company",
"positions": [
{"position_text": "Microsoft", "length": 9, "offset": 40}
]
}
]
}
]
```
### Response Format
```json
[
{
"id": 0,
"entities": [
{"entity_id": 0, "entity_text": "Google", "classification": "positive"},
{"entity_id": 1, "entity_text": "Microsoft", "classification": "negative"}
]
}
]
```
### Local
```bash
curl -X POST http://localhost:7860/api/predict \
-H "Content-Type: application/json" \
-d @sample_input.json
```
```bash
curl -X POST http://localhost:7860/api/predict-all-models \
-H "Content-Type: application/json" \
-d @sample_input.json
```
### HuggingFace Spaces
```bash
curl -X POST https://<your-space>.hf.space/predict \
-H "Content-Type: application/json" \
-d @sample_input.json
```
```bash
curl -X POST https://<your-space>.hf.space/predict-all-models \
-H "Content-Type: application/json" \
-d @sample_input.json
```
## Notebooks
- [`notebooks/data_preprocessing_analysis.ipynb`](notebooks/data_preprocessing_analysis.ipynb) β data hygiene checks on `data/data_raw.json`
- [`notebooks/data_augmentation_analysis.ipynb`](notebooks/data_augmentation_analysis.ipynb) β article length + label distribution analysis
- [`notebooks/data_splits_analysis.ipynb`](notebooks/data_splits_analysis.ipynb) β train/val/test splitting strategy
- [`notebooks/train_marker.ipynb`](notebooks/train_marker.ipynb) β fine-tunes marker mode
- [`notebooks/train_qa_m.ipynb`](notebooks/train_qa_m.ipynb) β fine-tunes QA-M mode
- [`notebooks/train_qa_b.ipynb`](notebooks/train_qa_b.ipynb) β fine-tunes QA-B mode
- [`notebooks/train_fasttext.ipynb`](notebooks/train_fasttext.ipynb) β trains fastText baseline
|