Entity Sentiment
Your task is to create a FastAPI application which will classify sentiment with respect to specific entities in a text.
For example, given a text such as Google had solid Q4 2025 earnings but Microsoft's were below expectations, the system should be able to say that for Google the sentiment is positive, but for Microsoft the sentiment is negative.
You will use positive, neutral, and negative as sentiment values.
Data
The data is in data.json. It is an array of samples with the following format:
[
{
"id": int, sample ID,
"text": str, article text,
"entities": [
{
"entity_id": int, entity ID,
"entity_text": str, text of the entity,
"entity_type": str, one of ["company", "location"],
"positions": [
{
"position_text": str, text of the occurrence,
"length": int, length of the occurrence,
"offset": int, offset from the start of text
},
... other positions
],
"label": str, one of ["positive", "negative", "neutral"]
},
... other entities
],
},
... other samples
]
Each sample can have multiple entities, each with its own label, and multiple positions per entity (an entity can occur multiple times in the text).
Assignment
Create a FastAPI application which will expose a /predict endpoint. The endpoint will accept an array of samples in the same format as the example above, except for the label key.
[
{
"id": int, sample ID,
"text": str, article text,
"entities": [
{
"entity_id": int, entity ID,
"entity_text": str, text of the entity,
"entity_type": str, one of ["company", "location"],
"positions": [
{
"position_text": str, text of the occurrence,
"length": int, length of the occurrence,
"offset": int, offset from the start of text
},
... other positions
]
},
... other entities
]
},
... other inputs
]
For each sample, it will perform the sentiment classification and output an object with the following shape:
[
{
"id": int, sample ID,
"entities": [
{
"entity_id": int, entity ID,
"entity_text": str, text of the entity,
"classification": str, one of ["positive", "negative", "neutral"]
},
...
]
},
... other outputs
]
Data Preparation
Examine the data and create an understanding of the dataset. We are interested in data hygiene practices and general good data science practices.
Classification
Create a system which will perform the classification. We do not expect perfect metrics or extensive experiments—the point of this assignment is not to spend days training—but we do expect a well-reasoned approach which can deal with all the different edge cases this task has. Make sure to note the best metrics you achieve. Feel free to use any framework, architecture, and approach.
API and Docker
Create the FastAPI application, a Dockerfile for the application, and a docker-compose.yml file so that we can simply run the application with docker compose up.
Deliverables and Documentation
Share a link to your GitHub repository or a zipped folder containing:
- The Python code (the application itself, any data analysis, preprocessing, training, and evaluation scripts)
- The
Dockerfileanddocker-compose.yml - A short documentation
Documentation
The documentation should not be a formal report; it can be just structured notes. The goal is for us to understand your process, the decisions you made, why you made them, and what worked and what did not. We want to see how you approach a relatively open-ended problem and what solution you come up with. If you know your solution has shortcomings or edge cases it cannot deal with, note them.