File size: 3,860 Bytes
312633f
 
 
 
 
 
 
 
 
 
b36c28a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8654514
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: Entity Sentiment Classification
emoji: πŸ“Š
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---

# Entity Sentiment Classification

Classify sentiment (positive, neutral, negative) for named entities in news article text using fine-tuned DistilBERT models.

Three classification modes:
- **marker** β€” entity wrapped in `[E]...[/E]` special tokens, single-sequence input
- **qa_m** β€” question-answering multi-class: "What do you think of the sentiment of {entity}?"
- **qa_b** β€” question-answering binary: three hypotheses per entity, argmax of P(yes)

Plus a **fastText** baseline using marker-mode text.

**Report:** [report.pdf](report.pdf).

**Live demo:** https://huggingface.co/spaces/lamossta/entity-sentiment-classification

**Logs:** https://telemetry.betterstack.com/team/t529434/tail?s=2383648

## Setup

### 1. Environment Variables

Create a `.env` file in the project root:

```
HF_TOKEN=<your huggingface token> (not needed for the inference)
BETTERSTACK_SOURCE_TOKEN=<your betterstack token>
```

### 2. Run

```bash
docker compose up
```

This will install dependencies, download models from HuggingFace, and start the backend and frontend. Everything is accessible on port **7860**:
- Frontend: http://localhost:7860
- API: http://localhost:7860/api/

## API Endpoints

| Endpoint           | Method | Description |
|--------------------|---|---|
| `/predict`         | POST | Classify entities using the marker model |
| `/predict-all-models` | POST | Classify entities using all available models |
| `/health`          | GET | Health check |
| `/docs`            | GET | Interactive Swagger UI API docs |


## Sending Requests

### Request Format

```json
[
  {
    "id": 0,
    "text": "Google had solid Q4 2025 earnings but Microsoft's were not great.",
    "entities": [
      {
        "entity_id": 0,
        "entity_text": "Google",
        "entity_type": "company",
        "positions": [
          {"position_text": "Google", "length": 6, "offset": 0}
        ]
      },
      {
        "entity_id": 1,
        "entity_text": "Microsoft",
        "entity_type": "company",
        "positions": [
          {"position_text": "Microsoft", "length": 9, "offset": 40}
        ]
      }
    ]
  }
]
```

### Response Format

```json
[
  {
    "id": 0,
    "entities": [
      {"entity_id": 0, "entity_text": "Google", "classification": "positive"},
      {"entity_id": 1, "entity_text": "Microsoft", "classification": "negative"}
    ]
  }
]
```

### Local

```bash
curl -X POST http://localhost:7860/api/predict \
  -H "Content-Type: application/json" \
  -d @sample_input.json
```

```bash
curl -X POST http://localhost:7860/api/predict-all-models \
  -H "Content-Type: application/json" \
  -d @sample_input.json
```

### HuggingFace Spaces

```bash
curl -X POST https://<your-space>.hf.space/predict \
  -H "Content-Type: application/json" \
  -d @sample_input.json
```

```bash
curl -X POST https://<your-space>.hf.space/predict-all-models \
  -H "Content-Type: application/json" \
  -d @sample_input.json
```

## Notebooks

- [`notebooks/data_preprocessing_analysis.ipynb`](notebooks/data_preprocessing_analysis.ipynb) β€” data hygiene checks on `data/data_raw.json`
- [`notebooks/data_augmentation_analysis.ipynb`](notebooks/data_augmentation_analysis.ipynb) β€” article length + label distribution analysis
- [`notebooks/data_splits_analysis.ipynb`](notebooks/data_splits_analysis.ipynb) β€” train/val/test splitting strategy
- [`notebooks/train_marker.ipynb`](notebooks/train_marker.ipynb) β€” fine-tunes marker mode
- [`notebooks/train_qa_m.ipynb`](notebooks/train_qa_m.ipynb) β€” fine-tunes QA-M mode
- [`notebooks/train_qa_b.ipynb`](notebooks/train_qa_b.ipynb) β€” fine-tunes QA-B mode
- [`notebooks/train_fasttext.ipynb`](notebooks/train_fasttext.ipynb) β€” trains fastText baseline