Huzaifa367 commited on
Commit
d0ca212
·
verified ·
1 Parent(s): f8f0c53

Upload 8 files

Browse files
Files changed (8) hide show
  1. Dockerfile +61 -0
  2. README (1).md +11 -0
  3. README.Docker.md +65 -0
  4. compose.yaml +49 -0
  5. dockerignore +34 -0
  6. main.py +106 -0
  7. requirements.txt +8 -0
  8. test_main.py +70 -0
Dockerfile ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Comments are provided throughout this file to help you get started.
2
+ # If you need more help, visit the Dockerfile reference guide at
3
+ # https://docs.docker.com/go/dockerfile-reference/
4
+
5
+ # Want to help us make this template better? Share your feedback here: https://forms.gle/ybq9Krt8jtBL3iCk7
6
+
7
+ ARG PYTHON_VERSION=3.11.9
8
+ FROM python:${PYTHON_VERSION}-slim as base
9
+
10
+ # Prevents Python from writing pyc files.
11
+ ENV PYTHONDONTWRITEBYTECODE=1
12
+
13
+ # Keeps Python from buffering stdout and stderr to avoid situations where
14
+ # the application crashes without emitting any logs due to buffering.
15
+ ENV PYTHONUNBUFFERED=1
16
+
17
+ WORKDIR /app
18
+
19
+ # Create a non-privileged user that the app will run under.
20
+ # See https://docs.docker.com/go/dockerfile-user-best-practices/
21
+ ARG UID=10001
22
+ RUN adduser \
23
+ --disabled-password \
24
+ --gecos "" \
25
+ --home "/nonexistent" \
26
+ --shell "/sbin/nologin" \
27
+ --no-create-home \
28
+ --uid "${UID}" \
29
+ appuser
30
+
31
+ # Download dependencies as a separate step to take advantage of Docker's caching.
32
+ # Leverage a cache mount to /root/.cache/pip to speed up subsequent builds.
33
+ # Leverage a bind mount to requirements.txt to avoid having to copy them into
34
+ # into this layer.
35
+ RUN --mount=type=cache,target=/root/.cache/pip \
36
+ --mount=type=bind,source=requirements.txt,target=requirements.txt \
37
+ python -m pip install -r requirements.txt
38
+
39
+ # Switch to the non-privileged user to run the application.
40
+ USER appuser
41
+
42
+ # Set the TRANSFORMERS_CACHE environment variable
43
+ ENV TRANSFORMERS_CACHE=/tmp/.cache/huggingface
44
+
45
+ # Create the cache folder with appropriate permissions
46
+ RUN mkdir -p $TRANSFORMERS_CACHE && chmod -R 777 $TRANSFORMERS_CACHE
47
+
48
+ # Set NLTK data directory
49
+ ENV NLTK_DATA=/tmp/nltk_data
50
+
51
+ # Create the NLTK data directory with appropriate permissions
52
+ RUN mkdir -p $NLTK_DATA && chmod -R 777 $NLTK_DATA
53
+
54
+ # Copy the source code into the container.
55
+ COPY . .
56
+
57
+ # Expose the port that the application listens on.
58
+ EXPOSE 8000
59
+
60
+ # Run the application.
61
+ CMD uvicorn 'main:app' --host=0.0.0.0 --port=7860
README (1).md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Text Classification API
3
+ emoji: 🐢
4
+ colorFrom: red
5
+ colorTo: gray
6
+ sdk: docker
7
+ pinned: false
8
+ license: apache-2.0
9
+ ---
10
+
11
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
README.Docker.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Text Classification API
2
+ **FastAPI, Docker, and Hugging Face Transformers**\
3
+ This API provides text classification capabilities using a pre-trained model for sentiment analysis. It allows users to analyze the sentiment of text inputs and obtain the corresponding sentiment labels.
4
+ - The API has been built using the Hugging Face `transformers` library.
5
+ - It uses the following pre-trained transformer model from Hugging Face:
6
+ - `cardiffnlp/twitter-roberta-base-sentiment-latest`
7
+ - It classifies the text as `positive`, `negative`, or `neutral`.
8
+
9
+ ## Table of Contents
10
+ - [Text Classification API](#text-classification-api)
11
+ - [Table of Contents](#table-of-contents)
12
+ - [Introduction](#introduction)
13
+ - [Installation](#installation)
14
+ - [Usage](#usage)
15
+ - [Documentation](#documentation)
16
+ - [Building and Running the Docker Container](#building-and-running-the-docker-container)
17
+ - [Interacting with the API](#interacting-with-the-api)
18
+ - [Acknowledgments](#acknowledgments)
19
+ - [License](#license)
20
+
21
+ ## Introduction
22
+ This API is built using FastAPI and leverages a pre-trained sentiment analysis model from the Hugging Face model hub. It preprocesses the input text and passes it through the model to classify the sentiment as positive, negative, or neutral.
23
+
24
+ ## Installation
25
+ To install and run the API locally, follow these steps:
26
+
27
+ 1. Clone this repository to your local machine.
28
+ 2. Ensure you have Docker installed.
29
+ 3. Change the port to 8000 in the Dockerfile.
30
+ 4. Build the Docker container using the provided Dockerfile.
31
+ 5. Run the Docker container.
32
+
33
+ ## Usage
34
+ To use the API, send HTTP requests to the appropriate endpoints. The API provides the following endpoints:
35
+
36
+ - `GET /`: Welcome endpoint, returns a greeting message.
37
+ - `POST /analyze/{text}`: Analyze endpoint, classifies the sentiment of the provided text.
38
+
39
+ ## Documentation
40
+ The API is documented using FastAPI's automatic documentation features. You can access the API documentation using the Swagger UI or ReDoc interface. Simply navigate to the appropriate URL after starting the API server.
41
+
42
+ - **Swagger UI** `http://localhost:8000/docs`
43
+ - **ReDoc** `http://localhost:8000/redoc`
44
+
45
+ ## Building and Running the Docker Container
46
+ To build and run the Docker container, follow these steps:
47
+ 1. Navigate to the folder in which your FastAPI app resides.
48
+ 2. Build a Docker image using the following command
49
+ ```
50
+ docker build -t text-classification-api .
51
+ ```
52
+ 3. Containerize the application by creating a Docker container from the built image
53
+ ```
54
+ docker run -d -p 8000:8000 text-classification-api
55
+ ```
56
+ 4. The API will be available at `http://localhost:8000`
57
+ 5. The API documentaion will be avaialable at `http://localhost:8000/docs` or `http://localhost:8000/redoc`
58
+
59
+ ## Interacting with the API
60
+ Once the API is running, you can interact with it using HTTP requests.
61
+ ## Acknowledgments
62
+ This API was built with inspiration from various open-source projects and libraries. Special thanks to the developers and contributors of FastAPI, Hugging Face Transformers, and NLTK.
63
+
64
+ ## License
65
+ This project is licensed under the [Apache license version 2.0](LICENSE).
compose.yaml ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Comments are provided throughout this file to help you get started.
2
+ # If you need more help, visit the Docker Compose reference guide at
3
+ # https://docs.docker.com/go/compose-spec-reference/
4
+
5
+ # Here the instructions define your application as a service called "server".
6
+ # This service is built from the Dockerfile in the current directory.
7
+ # You can add other services your application may depend on here, such as a
8
+ # database or a cache. For examples, see the Awesome Compose repository:
9
+ # https://github.com/docker/awesome-compose
10
+ services:
11
+ server:
12
+ build:
13
+ context: .
14
+ ports:
15
+ - 8000:8000
16
+
17
+ # The commented out section below is an example of how to define a PostgreSQL
18
+ # database that your application can use. `depends_on` tells Docker Compose to
19
+ # start the database before your application. The `db-data` volume persists the
20
+ # database data between container restarts. The `db-password` secret is used
21
+ # to set the database password. You must create `db/password.txt` and add
22
+ # a password of your choosing to it before running `docker compose up`.
23
+ # depends_on:
24
+ # db:
25
+ # condition: service_healthy
26
+ # db:
27
+ # image: postgres
28
+ # restart: always
29
+ # user: postgres
30
+ # secrets:
31
+ # - db-password
32
+ # volumes:
33
+ # - db-data:/var/lib/postgresql/data
34
+ # environment:
35
+ # - POSTGRES_DB=example
36
+ # - POSTGRES_PASSWORD_FILE=/run/secrets/db-password
37
+ # expose:
38
+ # - 5432
39
+ # healthcheck:
40
+ # test: [ "CMD", "pg_isready" ]
41
+ # interval: 10s
42
+ # timeout: 5s
43
+ # retries: 5
44
+ # volumes:
45
+ # db-data:
46
+ # secrets:
47
+ # db-password:
48
+ # file: db/password.txt
49
+
dockerignore ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Include any files or directories that you don't want to be copied to your
2
+ # container here (e.g., local build artifacts, temporary files, etc.).
3
+ #
4
+ # For more help, visit the .dockerignore file reference guide at
5
+ # https://docs.docker.com/go/build-context-dockerignore/
6
+
7
+ **/.DS_Store
8
+ **/__pycache__
9
+ **/.venv
10
+ **/.classpath
11
+ **/.dockerignore
12
+ **/.env
13
+ **/.git
14
+ **/.gitignore
15
+ **/.project
16
+ **/.settings
17
+ **/.toolstarget
18
+ **/.vs
19
+ **/.vscode
20
+ **/*.*proj.user
21
+ **/*.dbmdl
22
+ **/*.jfm
23
+ **/bin
24
+ **/charts
25
+ **/docker-compose*
26
+ **/compose*
27
+ **/Dockerfile*
28
+ **/node_modules
29
+ **/npm-debug.log
30
+ **/obj
31
+ **/secrets.dev.yaml
32
+ **/values.dev.yaml
33
+ LICENSE
34
+ README.md
main.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from contextlib import asynccontextmanager
2
+ from fastapi import FastAPI, HTTPException
3
+ from pydantic import BaseModel, ValidationError
4
+ from fastapi.encoders import jsonable_encoder
5
+
6
+ # TEXT PREPROCESSING
7
+ # --------------------------------------------------------------------
8
+ import re
9
+ import string
10
+ import nltk
11
+ nltk.download('punkt')
12
+ nltk.download('wordnet')
13
+ nltk.download('omw-1.4')
14
+ from nltk.stem import WordNetLemmatizer
15
+
16
+ # Function to remove URLs from text
17
+ def remove_urls(text):
18
+ return re.sub(r'http[s]?://\S+', '', text)
19
+
20
+ # Function to remove punctuations from text
21
+ def remove_punctuation(text):
22
+ regular_punct = string.punctuation
23
+ return str(re.sub(r'['+regular_punct+']', '', str(text)))
24
+
25
+ # Function to convert the text into lower case
26
+ def lower_case(text):
27
+ return text.lower()
28
+
29
+ # Function to lemmatize text
30
+ def lemmatize(text):
31
+ wordnet_lemmatizer = WordNetLemmatizer()
32
+
33
+ tokens = nltk.word_tokenize(text)
34
+ lemma_txt = ''
35
+ for w in tokens:
36
+ lemma_txt = lemma_txt + wordnet_lemmatizer.lemmatize(w) + ' '
37
+
38
+ return lemma_txt
39
+
40
+ def preprocess_text(text):
41
+ # Preprocess the input text
42
+ text = remove_urls(text)
43
+ text = remove_punctuation(text)
44
+ text = lower_case(text)
45
+ text = lemmatize(text)
46
+ return text
47
+
48
+ # Load the model using FastAPI lifespan event so that teh model is loaded at the beginning for efficiency
49
+ @asynccontextmanager
50
+ async def lifespan(app: FastAPI):
51
+ # Load the model from HuggingFace transformers library
52
+ from transformers import pipeline
53
+ global sentiment_task
54
+ sentiment_task = pipeline("sentiment-analysis", model="cardiffnlp/twitter-roberta-base-sentiment-latest", tokenizer="cardiffnlp/twitter-roberta-base-sentiment-latest")
55
+ yield
56
+ # Clean up the model and release the resources
57
+ del sentiment_task
58
+
59
+ description = """
60
+ ## Text Classification API
61
+ This app shows the sentiment of the text (positive, negative, or neutral).
62
+ Check out the docs for the `/analyze/{text}` endpoint below to try it out!
63
+ """
64
+
65
+ # Initialize the FastAPI app
66
+ app = FastAPI(lifespan=lifespan, docs_url="/", description=description)
67
+
68
+ # Define the input data model
69
+ class TextInput(BaseModel):
70
+ text: str
71
+
72
+ # Define the welcome endpoint
73
+ @app.get('/')
74
+ async def welcome():
75
+ return "Welcome to our Text Classification API"
76
+
77
+ # Validate input text length
78
+ MAX_TEXT_LENGTH = 1000
79
+
80
+ # Define the sentiment analysis endpoint
81
+ @app.post('/analyze/{text}')
82
+ async def classify_text(text_input:TextInput):
83
+ try:
84
+ # Convert input data to JSON serializable dictionary
85
+ text_input_dict = jsonable_encoder(text_input)
86
+ # Validate input data using Pydantic model
87
+ text_data = TextInput(**text_input_dict) # Convert to Pydantic model
88
+
89
+ # Validate input text length
90
+ if len(text_input.text) > MAX_TEXT_LENGTH:
91
+ raise HTTPException(status_code=400, detail="Text length exceeds maximum allowed length")
92
+ elif len(text_input.text) == 0:
93
+ raise HTTPException(status_code=400, detail="Text cannot be empty")
94
+ except ValidationError as e:
95
+ # Handle validation error
96
+ raise HTTPException(status_code=422, detail=str(e))
97
+
98
+ try:
99
+ # Perform text classification
100
+ return sentiment_task(preprocess_text(text_input.text))
101
+ except ValueError as ve:
102
+ # Handle value error
103
+ raise HTTPException(status_code=400, detail=str(ve))
104
+ except Exception as e:
105
+ # Handle other server errors
106
+ raise HTTPException(status_code=500, detail=str(e))
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.110.1
2
+ nest_asyncio==1.6.0
3
+ nltk==3.8.1
4
+ pydantic==2.7.0
5
+ transformers==4.38.2
6
+ uvicorn==0.29.0
7
+ torch==2.2.2
8
+ pytest
test_main.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ rom fastapi.testclient import TestClient
2
+ from main import app
3
+ from main import TextInput
4
+ from fastapi.encoders import jsonable_encoder
5
+
6
+ client = TestClient(app)
7
+
8
+ # Test the welcome endpoint
9
+ def test_welcome():
10
+ # Test the welcome endpoint
11
+ response = client.get("/")
12
+ assert response.status_code == 200
13
+ assert response.json() == "Welcome to our Text Classification API"
14
+
15
+ # Test the sentiment analysis endpoint for positive sentiment
16
+ def test_positive_sentiment():
17
+ with client:
18
+ # Define the request payload
19
+ # Initialize payload as a TextInput object
20
+ payload = TextInput(text="I love this product! It's amazing!")
21
+
22
+ # Convert TextInput object to JSON-serializable dictionary
23
+ payload_dict = jsonable_encoder(payload)
24
+
25
+ # Send a POST request to the sentiment analysis endpoint
26
+ response = client.post("/analyze/{text}", json=payload_dict)
27
+
28
+ # Assert that the response status code is 200 OK
29
+ assert response.status_code == 200
30
+
31
+ # Assert that the sentiment returned is positive
32
+ assert response.json()[0]['label'] == "positive"
33
+
34
+ # Test the sentiment analysis endpoint for negative sentiment
35
+ def test_negative_sentiment():
36
+ with client:
37
+ # Define the request payload
38
+ # Initialize payload as a TextInput object
39
+ payload = TextInput(text="I'm really disappointed with this service. It's terrible.")
40
+
41
+ # Convert TextInput object to JSON-serializable dictionary
42
+ payload_dict = jsonable_encoder(payload)
43
+
44
+ # Send a POST request to the sentiment analysis endpoint
45
+ response = client.post("/analyze/{text}", json=payload_dict)
46
+
47
+ # Assert that the response status code is 200 OK
48
+ assert response.status_code == 200
49
+
50
+ # Assert that the sentiment returned is positive
51
+ assert response.json()[0]['label'] == "negative"
52
+
53
+ # Test the sentiment analysis endpoint for neutral sentiment
54
+ def test_neutral_sentiment():
55
+ with client:
56
+ # Define the request payload
57
+ # Initialize payload as a TextInput object
58
+ payload = TextInput(text="This is a neutral statement.")
59
+
60
+ # Convert TextInput object to JSON-serializable dictionary
61
+ payload_dict = jsonable_encoder(payload)
62
+
63
+ # Send a POST request to the sentiment analysis endpoint
64
+ response = client.post("/analyze/{text}", json=payload_dict)
65
+
66
+ # Assert that the response status code is 200 OK
67
+ assert response.status_code == 200
68
+
69
+ # Assert that the sentiment returned is positive
70
+ assert response.json()[0]['label'] == "neutral"