notbeekay commited on
Commit
047a1eb
·
verified ·
1 Parent(s): 408de47

Upload 9 files

Browse files

This is a Support Vector Regression (SVR) model with parameters tuned using Random Search (hyperparameter tuning) -- the optimal model determined, with the least errors and highest predictive power compared to other models -- used to help users predict the IMDb scores of movies they are planning to watch, so that they can decide whether to watch them or not.

Files changed (9) hide show
  1. P1M2_brenda_kwan.ipynb +0 -0
  2. P1M2_brenda_kwan_inf.ipynb +213 -0
  3. app.py +10 -0
  4. eda.py +205 -0
  5. imdb.jpeg +0 -0
  6. model_svr.pkl +3 -0
  7. movies.csv +0 -0
  8. prediction.py +70 -0
  9. requirements.txt +9 -0
P1M2_brenda_kwan.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
P1M2_brenda_kwan_inf.ipynb ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# Support Vector Regression Model Inference\n",
8
+ "--- "
9
+ ]
10
+ },
11
+ {
12
+ "cell_type": "markdown",
13
+ "metadata": {},
14
+ "source": [
15
+ "## Import Libraries"
16
+ ]
17
+ },
18
+ {
19
+ "cell_type": "code",
20
+ "execution_count": 7,
21
+ "metadata": {},
22
+ "outputs": [],
23
+ "source": [
24
+ "# Data manipulation\n",
25
+ "import pandas as pd\n",
26
+ "\n",
27
+ "# Load model\n",
28
+ "import pickle"
29
+ ]
30
+ },
31
+ {
32
+ "cell_type": "markdown",
33
+ "metadata": {},
34
+ "source": [
35
+ "## Load Model"
36
+ ]
37
+ },
38
+ {
39
+ "cell_type": "code",
40
+ "execution_count": 8,
41
+ "metadata": {},
42
+ "outputs": [],
43
+ "source": [
44
+ "# Open Model\n",
45
+ "with open('model_svr.pkl', 'rb') as file_1:\n",
46
+ " model_svr = pickle.load(file_1)"
47
+ ]
48
+ },
49
+ {
50
+ "cell_type": "markdown",
51
+ "metadata": {},
52
+ "source": [
53
+ "## Data Inference"
54
+ ]
55
+ },
56
+ {
57
+ "cell_type": "code",
58
+ "execution_count": 9,
59
+ "metadata": {},
60
+ "outputs": [
61
+ {
62
+ "data": {
63
+ "text/html": [
64
+ "<div>\n",
65
+ "<style scoped>\n",
66
+ " .dataframe tbody tr th:only-of-type {\n",
67
+ " vertical-align: middle;\n",
68
+ " }\n",
69
+ "\n",
70
+ " .dataframe tbody tr th {\n",
71
+ " vertical-align: top;\n",
72
+ " }\n",
73
+ "\n",
74
+ " .dataframe thead th {\n",
75
+ " text-align: right;\n",
76
+ " }\n",
77
+ "</style>\n",
78
+ "<table border=\"1\" class=\"dataframe\">\n",
79
+ " <thead>\n",
80
+ " <tr style=\"text-align: right;\">\n",
81
+ " <th></th>\n",
82
+ " <th>name</th>\n",
83
+ " <th>rating</th>\n",
84
+ " <th>genre</th>\n",
85
+ " <th>year</th>\n",
86
+ " <th>released</th>\n",
87
+ " <th>votes</th>\n",
88
+ " <th>director</th>\n",
89
+ " <th>writer</th>\n",
90
+ " <th>star</th>\n",
91
+ " <th>country</th>\n",
92
+ " <th>budget</th>\n",
93
+ " <th>gross</th>\n",
94
+ " <th>company</th>\n",
95
+ " <th>runtime</th>\n",
96
+ " </tr>\n",
97
+ " </thead>\n",
98
+ " <tbody>\n",
99
+ " <tr>\n",
100
+ " <th>0</th>\n",
101
+ " <td>Oppenheimer</td>\n",
102
+ " <td>R</td>\n",
103
+ " <td>History</td>\n",
104
+ " <td>2023</td>\n",
105
+ " <td>July 19, 2023 (United States)</td>\n",
106
+ " <td>787446</td>\n",
107
+ " <td>Christopher Nolan</td>\n",
108
+ " <td>Christopher Nolan</td>\n",
109
+ " <td>Cillian Murphy</td>\n",
110
+ " <td>United States</td>\n",
111
+ " <td>100000000</td>\n",
112
+ " <td>958000000</td>\n",
113
+ " <td>Universal Pictures</td>\n",
114
+ " <td>189</td>\n",
115
+ " </tr>\n",
116
+ " </tbody>\n",
117
+ "</table>\n",
118
+ "</div>"
119
+ ],
120
+ "text/plain": [
121
+ " name rating genre year released votes \\\n",
122
+ "0 Oppenheimer R History 2023 July 19, 2023 (United States) 787446 \n",
123
+ "\n",
124
+ " director writer star country \\\n",
125
+ "0 Christopher Nolan Christopher Nolan Cillian Murphy United States \n",
126
+ "\n",
127
+ " budget gross company runtime \n",
128
+ "0 100000000 958000000 Universal Pictures 189 "
129
+ ]
130
+ },
131
+ "execution_count": 9,
132
+ "metadata": {},
133
+ "output_type": "execute_result"
134
+ }
135
+ ],
136
+ "source": [
137
+ "# Create dataframe\n",
138
+ "data_inf = pd.DataFrame([{\n",
139
+ " 'name': 'Oppenheimer', \n",
140
+ " 'rating': 'R', \n",
141
+ " 'genre': 'History', \n",
142
+ " 'year': 2023, \n",
143
+ " 'released': 'July 19, 2023 (United States)',\n",
144
+ " 'votes': '787446',\n",
145
+ " 'director': 'Christopher Nolan',\n",
146
+ " 'writer': 'Christopher Nolan',\n",
147
+ " 'star':'Cillian Murphy',\n",
148
+ " 'country': 'United States',\n",
149
+ " 'budget':100000000,\n",
150
+ " 'gross':958000000,\n",
151
+ " 'company': 'Universal Pictures',\n",
152
+ " 'runtime':189\n",
153
+ "}])\n",
154
+ "data_inf"
155
+ ]
156
+ },
157
+ {
158
+ "cell_type": "code",
159
+ "execution_count": 10,
160
+ "metadata": {},
161
+ "outputs": [],
162
+ "source": [
163
+ "# Make IMDb score prediction using the loaded pipeline\n",
164
+ "prediction = model_svr.predict(data_inf)"
165
+ ]
166
+ },
167
+ {
168
+ "cell_type": "code",
169
+ "execution_count": 11,
170
+ "metadata": {},
171
+ "outputs": [
172
+ {
173
+ "name": "stdout",
174
+ "output_type": "stream",
175
+ "text": [
176
+ "[8.07123459]\n"
177
+ ]
178
+ }
179
+ ],
180
+ "source": [
181
+ "print(prediction)"
182
+ ]
183
+ },
184
+ {
185
+ "cell_type": "markdown",
186
+ "metadata": {},
187
+ "source": [
188
+ "The Support Vector Regression Model predicts the IMDb score of Oppenheimer to be 8.07/10. This value is very close to the actual IMDb score of the movie which is 8.3/10, indicating that the model has generalised well to the Oppenheimer movie data (unseen data), with a mean absolute error of only 0.23, even smaller than the calculated MAE of the SVR model (0.541)."
189
+ ]
190
+ }
191
+ ],
192
+ "metadata": {
193
+ "kernelspec": {
194
+ "display_name": "phase1",
195
+ "language": "python",
196
+ "name": "python3"
197
+ },
198
+ "language_info": {
199
+ "codemirror_mode": {
200
+ "name": "ipython",
201
+ "version": 3
202
+ },
203
+ "file_extension": ".py",
204
+ "mimetype": "text/x-python",
205
+ "name": "python",
206
+ "nbconvert_exporter": "python",
207
+ "pygments_lexer": "ipython3",
208
+ "version": "3.12.4"
209
+ }
210
+ },
211
+ "nbformat": 4,
212
+ "nbformat_minor": 2
213
+ }
app.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import eda
3
+ import prediction
4
+
5
+ page = st.sidebar.selectbox('Choose page: ', ('EDA', 'Prediction'))
6
+
7
+ if page == 'EDA':
8
+ eda.run()
9
+ else:
10
+ prediction.run()
eda.py ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import seaborn as sns
4
+ import matplotlib.pyplot as plt
5
+ import plotly.express as px
6
+ from PIL import Image
7
+
8
+ def run():
9
+
10
+ # Create title
11
+ st.title('IMDb Movie Score Prediction')
12
+
13
+ # Create subheader
14
+ st.subheader('Exploratory Data Analysis (EDA) to Analyse IMDb Scores of Previous Movies')
15
+
16
+ # Insert image
17
+ image = Image.open('imdb.jpeg')
18
+ st.image(image, caption = 'This web application analyses IMDb scores of past movies and predicts IMDb scores for future/upcoming movies')
19
+
20
+ # Create text
21
+ st.write('This page is written by Brenda')
22
+
23
+ # Make a straight line
24
+ st.markdown('---')
25
+ st.write('') # Adds spacing line
26
+
27
+ # Load and show dataframe
28
+ df = pd.read_csv('movies.csv')
29
+ st.write('### This is our dataset of previous movies:')
30
+ st.dataframe(df)
31
+ st.write('')
32
+ st.write('')
33
+ st.write('')
34
+
35
+ # Make a barplot based on user input to view data
36
+ st.write('### Top N Movies With Highest Scores Based on User Input')
37
+ option = st.selectbox('Choose a Column to view the Top N highest-rated mean score', ('name','director', 'writer', 'genre', 'star', 'country', 'company'))
38
+ # Select top N
39
+ top_n = st.selectbox('Select Top N', (10, 20, 30, 40))
40
+ # Calculate mean score based on selected column
41
+ mean_scores = df.groupby(option)['score'].mean().sort_values(ascending=False)
42
+ top_n_df = mean_scores.head(top_n).reset_index()
43
+ top_n_df.columns = [option, 'mean_score']
44
+ # Plot a barplot of top N mean movie scores based on option
45
+ fig, ax = plt.subplots(figsize=(10, 6))
46
+ sns.barplot(x=option, y='mean_score', data=top_n_df, palette='Blues_d', ax=ax)
47
+ ax.set_title(f'Top {top_n} {option.capitalize()} with Highest Mean Movie Scores')
48
+ ax.set_xlabel(option.capitalize())
49
+ ax.set_ylabel('Mean Score')
50
+ ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha='right')
51
+ plt.tight_layout()
52
+ st.pyplot(fig)
53
+ # Additional information: name, director, writer, genre, star, country, company vs IMDb score
54
+ if option == 'name':
55
+ max_score = df['score'].max()
56
+ movie_with_max_score = df[df['score'] == max_score]['name'].iloc[0]
57
+ min_score = df['score'].min()
58
+ movie_with_min_score= df[df['score'] == min_score]['name'].iloc[0]
59
+ st.write(f"The movie with the highest IMDb score is: **{movie_with_max_score}** with a score of **{max_score}**.")
60
+ st.write(f"The movie with the lowest IMDb score is: **{movie_with_min_score}** with a score of **{min_score}**.")
61
+ elif option == 'director':
62
+ mean_scores_by_director = df.groupby('director')['score'].mean()
63
+ max_score = mean_scores_by_director.max()
64
+ director_with_max_score = df[df['score'] == max_score]['director'].iloc[0]
65
+ min_score = mean_scores_by_director.min()
66
+ director_with_min_score = df[df['score'] == min_score]['director'].iloc[0]
67
+ st.write(f"The director with the highest mean IMDb score is: **{director_with_max_score}** with a score of **{max_score}**.")
68
+ st.write(f"The director with the lowest mean IMDb score is: **{director_with_min_score}** with a score of **{min_score}**.")
69
+ elif option == 'writer':
70
+ mean_scores_by_writer = df.groupby('writer')['score'].mean()
71
+ max_score = mean_scores_by_writer.max()
72
+ writer_with_max_score = df[df['score'] == max_score]['writer'].iloc[0]
73
+ min_score = mean_scores_by_writer.min()
74
+ writer_with_min_score = df[df['score'] == min_score]['writer'].iloc[0]
75
+ st.write(f"The movie with the highest mean IMDb score is: **{writer_with_max_score}** with a score of **{max_score}**.")
76
+ st.write(f"The movie with the lowest mean IMDb score is: **{writer_with_min_score}** with a score of **{min_score}**.")
77
+ elif option == 'genre':
78
+ mean_scores_by_genre = df.groupby('genre')['score'].mean()
79
+ max_score = mean_scores_by_genre.max()
80
+ genre_with_max_score_df = mean_scores_by_genre[mean_scores_by_genre == max_score]
81
+ if not genre_with_max_score_df.empty:
82
+ genre_with_max_score = genre_with_max_score_df.index[0]
83
+ st.write(f"The genre with the highest mean IMDb score is: **{genre_with_max_score}** with a score of **{max_score}**.")
84
+ else:
85
+ st.write("No genre found with the highest mean score.")
86
+
87
+ min_score = mean_scores_by_genre.min()
88
+ genre_with_min_score_df = mean_scores_by_genre[mean_scores_by_genre == min_score]
89
+ if not genre_with_min_score_df.empty:
90
+ genre_with_min_score = genre_with_min_score_df.index[0]
91
+ st.write(f"The genre with the lowest mean IMDb score is: **{genre_with_min_score}** with a score of **{min_score}**.")
92
+ else:
93
+ st.write("No genre found with the lowest mean score.")
94
+ st.write(f"The genre with the highest mean IMDb score is: **{genre_with_max_score}** with a score of **{max_score}**.")
95
+ st.write(f"The genre with the lowest mean IMDb score is: **{genre_with_min_score}** with a score of **{min_score}**.")
96
+ elif option == 'star':
97
+ mean_scores_by_star = df.groupby('star')['score'].mean()
98
+ max_score = mean_scores_by_star.max()
99
+ star_with_max_score = df[df['score'] == max_score]['star'].iloc[0]
100
+ min_score = mean_scores_by_star.min()
101
+ star_with_min_score = df[df['score'] == min_score]['star'].iloc[0]
102
+ st.write(f"The star with the highest mean IMDb score is: **{star_with_max_score}** with a score of **{max_score}**.")
103
+ st.write(f"The star with the lowest mean IMDb score is: **{star_with_min_score}** with a score of **{min_score}**.")
104
+ elif option == 'country':
105
+ mean_scores_by_country = df.groupby('country')['score'].mean()
106
+ max_score = mean_scores_by_country.max()
107
+ country_with_max_score = df[df['score'] == max_score]['country'].iloc[0]
108
+ min_score = mean_scores_by_country.min()
109
+ country_with_min_score = df[df['score'] == min_score]['country'].iloc[0]
110
+ st.write(f"The country with the highest mean IMDb score is: **{country_with_max_score}** with a score of **{max_score}**.")
111
+ st.write(f"The country with the lowest mean IMDb score is: **{country_with_min_score}** with a score of **{min_score}**.")
112
+ elif option == 'company':
113
+ mean_scores_by_company = df.groupby('company')['score'].mean()
114
+ max_score = mean_scores_by_company.max()
115
+ company_with_max_score = df[df['score'] == max_score]['company'].iloc[0]
116
+ min_score = mean_scores_by_company.min()
117
+ company_with_min_score = df[df['score'] == min_score]['company'].iloc[0]
118
+ st.write(f"The company with the highest mean IMDb score is: **{company_with_max_score}** with a score of **{max_score}**.")
119
+ st.write(f"The company with the lowest mean IMDb score is: **{company_with_min_score}** with a score of **{min_score}**.")
120
+ st.write('')
121
+ st.write('')
122
+ st.write('')
123
+
124
+ # Make a scatterplot with regression line to display IMDb Score vs Gross Revenue
125
+ st.write('### IMDb Score vs Gross Revenue')
126
+ # Plot scatterplot with regression line (score vs gross)
127
+ fig = px.scatter(
128
+ df,
129
+ x='gross',
130
+ y='score',
131
+ hover_data=['name', 'score', 'gross'], # hover over data point
132
+ labels={'gross': 'Gross Revenue', 'score': 'IMDb Score'},
133
+ title='IMDb Score vs Gross Revenue',
134
+ trendline='ols', # add regression line
135
+ trendline_color_override='red'
136
+ )
137
+ st.plotly_chart(fig)
138
+
139
+ # Additional information: gross revenue vs IMDb score
140
+ max_score = df['score'].max()
141
+ movie_with_max_score = df[df['score'] == max_score]['name'].iloc[0]
142
+ movie_with_max_score_gross = df[df['score'] == max_score]['gross'].iloc[0]
143
+ max_gross = df['gross'].max()
144
+ movie_with_max_gross = df[df['gross'] == max_gross]['name'].iloc[0]
145
+ movie_with_max_gross_score = df[df['gross'] == max_gross]['score'].iloc[0]
146
+ st.write(f"The movie with the highest IMDb score is: **{movie_with_max_score}** with a score of **{max_score}** and gross revenue of **${movie_with_max_score_gross}**.")
147
+ st.write(f"The movie with the highest gross is: **{movie_with_max_gross}** with a score of **{movie_with_max_gross_score}** and gross revenue of **${max_gross}**.")
148
+ st.write('')
149
+ st.write('')
150
+ st.write('')
151
+
152
+
153
+ # Make a scatterplot with regression line to display IMDb Score vs Runtime
154
+ st.write('### IMDb Score vs Movie Runtime')
155
+ # Plot scatterplot with regression line (score vs runtime)
156
+ fig = px.scatter(
157
+ df,
158
+ x='runtime',
159
+ y='score',
160
+ hover_data=['name', 'score', 'runtime'], # hover over data point
161
+ labels={'runtime': 'Runtime', 'score': 'IMDb Score'},
162
+ title='IMDb Score vs Runtime',
163
+ trendline='ols', # add regression line
164
+ trendline_color_override='red'
165
+ )
166
+ st.plotly_chart(fig)
167
+
168
+ # Additional information: runtime vs IMDb score
169
+ max_score = df['score'].max()
170
+ movie_with_max_score = df[df['score'] == max_score]['name'].iloc[0]
171
+ movie_with_max_score_runtime = df[df['score'] == max_score]['runtime'].iloc[0]
172
+ max_runtime = df['runtime'].max()
173
+ movie_with_max_runtime= df[df['runtime'] == max_runtime]['name'].iloc[0]
174
+ movie_with_max_runtime_score = df[df['runtime'] == max_runtime]['score'].iloc[0]
175
+ st.write(f"The movie with the highest IMDb score is: **{movie_with_max_score}** with a score of **{max_score}** and runtime of **{movie_with_max_score_runtime} minutes**.")
176
+ st.write(f"The movie with the highest runtime is: **{movie_with_max_runtime}** with a score of **{movie_with_max_runtime_score}** and runtime of **{max_runtime} minutes**.")
177
+ st.write('')
178
+ st.write('')
179
+ st.write('')
180
+
181
+
182
+ # Scatterplot of Budget vs IMDb score with Regression Line
183
+ st.write('### IMDb Score vs Budget')
184
+ # Minimum and maximum budget calculated to determine the range of the slider for the budget
185
+ min_budget = int(df['budget'].min())
186
+ max_budget = int(df['budget'].max())
187
+ selected_budget = st.slider('Select Budget Range', min_budget, max_budget, (min_budget, max_budget))
188
+ # Filter dataframe based on budget range selected by the user
189
+ df_filtered = df[(df['budget'] >= selected_budget[0]) & (df['budget'] <= selected_budget[1])]
190
+
191
+ # Plot a scatterplot with regression line of budget vs score
192
+ fig = px.scatter(
193
+ df_filtered,
194
+ x='budget',
195
+ y='score',
196
+ hover_data=['name', 'score', 'budget'], # hover over data point
197
+ labels={'budget': 'Budget', 'score': 'IMDb Score'},
198
+ title='IMDb Score vs Budget',
199
+ trendline='ols', # add regression line
200
+ trendline_color_override='red'
201
+ )
202
+ st.plotly_chart(fig)
203
+
204
+ if __name__ == '__main__':
205
+ run()
imdb.jpeg ADDED
model_svr.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ed32c463ffb26e78a853995187069f3825027ce2bb4b15a54b8c48c42f31c66
3
+ size 322357
movies.csv ADDED
The diff for this file is too large to render. See raw diff
 
prediction.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import pickle
4
+
5
+ # Load the pre-trained model
6
+ with open('model_svr.pkl', 'rb') as file_1:
7
+ model_svr = pickle.load(file_1)
8
+
9
+ def run():
10
+ # Create title
11
+ st.title('IMDb Movie Score Prediction')
12
+
13
+ # Create subheader
14
+ st.subheader('Calculate IMDb Score of Movies')
15
+
16
+ # Create a form for input
17
+ with st.form('form_movie_prediction'):
18
+ # Text inputs
19
+ name = st.text_input('Movie Name: ', value = '')
20
+ director = st.text_input('Director: ', value = '')
21
+ writer = st.text_input('Writer: ', value = '')
22
+ star = st.text_input('Star: ', value = '')
23
+ country = st.text_input('Country: ', value = '')
24
+ company = st.text_input('Production Company: ', value ='')
25
+ released = st.text_input('Date Released: ', value = '')
26
+
27
+ # Number inputs
28
+ year = st.number_input('Release Year: ', value=2022, min_value=1900, max_value=2100)
29
+ budget = st.number_input('Budget ($): ', value=500000000, min_value=0)
30
+ gross = st.number_input('Gross Revenue ($): ', value=958000000, min_value=0)
31
+ runtime = st.number_input('Runtime (minutes): ', value=189, min_value=1)
32
+ votes = st.number_input('Votes: ', value=500000, min_value=0)
33
+
34
+ # Categorical inputs
35
+ rating = st.selectbox('Rating: ', ('G', 'PG', 'PG-13', 'R', 'NC-17'), index=3)
36
+ genre = st.selectbox('Genre: ', ('Action', 'Adventure', 'Comedy', 'Drama', 'History', 'Sci-Fi', 'Thriller'), index=4)
37
+
38
+ # Submit button
39
+ submitted = st.form_submit_button('Predict IMDb Score')
40
+
41
+ # Prepare the data for prediction
42
+ data_inf = {
43
+ 'name': name,
44
+ 'rating': rating,
45
+ 'genre': genre,
46
+ 'year': year,
47
+ 'released': released,
48
+ 'votes': votes,
49
+ 'director': director,
50
+ 'writer': writer,
51
+ 'star': star,
52
+ 'country': country,
53
+ 'budget': budget,
54
+ 'gross': gross,
55
+ 'company': company,
56
+ 'runtime': runtime
57
+ }
58
+
59
+ data_inf = pd.DataFrame([data_inf])
60
+ st.dataframe(data_inf)
61
+
62
+ if submitted:
63
+ # Predict IMDb score for Oppenheimer using the SVR model
64
+ prediction = model_svr.predict(data_inf)
65
+
66
+ st.write('## Predicted IMDb Score: ', str(round(prediction[0], 2)))
67
+
68
+
69
+ if __name__ == '__main__':
70
+ run()
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ streamlit
2
+ pandas
3
+ seaborn
4
+ matplotlib
5
+ numpy
6
+ scikit-learn == 1.5.1
7
+ Pillow
8
+ plotly
9
+ statsmodels