Spaces:

lamossta
/

sv-task

Sleeping

App Files Files Community

sv-task / assignment.md

lamossta

env config files

4820148 20 days ago

preview code

raw

history blame contribute delete

4.32 kB

Entity Sentiment

Your task is to create a FastAPI application which will classify sentiment with respect to specific entities in a text.

For example, given a text such as Google had solid Q4 2025 earnings but Microsoft's were below expectations, the system should be able to say that for Google the sentiment is positive, but for Microsoft the sentiment is negative.

You will use positive, neutral, and negative as sentiment values.

Data

The data is in data.json. It is an array of samples with the following format:

[
    {
        "id": int, sample ID,
        "text": str, article text,
        "entities": [
            {
                "entity_id": int, entity ID,
                "entity_text": str, text of the entity,
                "entity_type": str, one of ["company", "location"],
                "positions": [
                    {
                        "position_text": str, text of the occurrence,
                        "length": int, length of the occurrence,
                        "offset": int, offset from the start of text
                    },
                    ... other positions
                ],
                "label": str, one of ["positive", "negative", "neutral"]
            },
            ... other entities
        ],
    },
    ... other samples
]

Each sample can have multiple entities, each with its own label, and multiple positions per entity (an entity can occur multiple times in the text).

Assignment

Create a FastAPI application which will expose a /predict endpoint. The endpoint will accept an array of samples in the same format as the example above, except for the label key.

[
    {
        "id": int, sample ID,
        "text": str, article text,
        "entities": [
            {
                "entity_id": int, entity ID,
                "entity_text": str, text of the entity,
                "entity_type": str, one of ["company", "location"],
                "positions": [
                    {
                        "position_text": str, text of the occurrence,
                        "length": int, length of the occurrence,
                        "offset": int, offset from the start of text
                    },
                    ... other positions
                ]
            },
            ... other entities
        ]
    },
    ... other inputs
]

For each sample, it will perform the sentiment classification and output an object with the following shape:

[
    {
        "id": int, sample ID,
        "entities": [
            {
                "entity_id": int, entity ID,
                "entity_text": str, text of the entity,
                "classification": str, one of ["positive", "negative", "neutral"]
            },
            ...
        ]
    },
    ... other outputs
]

Data Preparation

Examine the data and create an understanding of the dataset. We are interested in data hygiene practices and general good data science practices.

Classification

Create a system which will perform the classification. We do not expect perfect metrics or extensive experiments—the point of this assignment is not to spend days training—but we do expect a well-reasoned approach which can deal with all the different edge cases this task has. Make sure to note the best metrics you achieve. Feel free to use any framework, architecture, and approach.

API and Docker

Create the FastAPI application, a Dockerfile for the application, and a docker-compose.yml file so that we can simply run the application with docker compose up.

Deliverables and Documentation

Share a link to your GitHub repository or a zipped folder containing:

The Python code (the application itself, any data analysis, preprocessing, training, and evaluation scripts)
The Dockerfile and docker-compose.yml
A short documentation

Documentation

The documentation should not be a formal report; it can be just structured notes. The goal is for us to understand your process, the decisions you made, why you made them, and what worked and what did not. We want to see how you approach a relatively open-ended problem and what solution you come up with. If you know your solution has shortcomings or edge cases it cannot deal with, note them.