{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# Notebook 1: Real Data Extraction and Synthetic Enrichment\n",
"\n",
"## Project\n",
"**AI-Powered Menu Decision Assistant for Restaurants**\n",
"\n",
"## Objective\n",
"This notebook prepares the dataset used in the project. It combines a small real-world benchmark dataset with synthetic restaurant data for **Trattoria Bellapasta**.\n",
"\n",
"## Workflow\n",
"This notebook performs the following steps:\n",
"1. Load a small real-world benchmark dataset from public restaurant menu references\n",
"2. Build the fictional menu for Trattoria Bellapasta\n",
"3. Generate synthetic monthly sales data\n",
"4. Generate synthetic customer reviews\n",
"5. Annotate the reviews with sentiment and theme labels\n",
"6. Export the final CSV files for Notebook 2"
],
"metadata": {
"id": "lnf7unLtv33n"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "T52Nn9X6vuNb"
},
"outputs": [],
"source": [
"# Import libraries used for dataset creation and manipulation\n",
"import pandas as pd\n",
"import numpy as np\n",
"import random\n",
"from datetime import datetime, timedelta\n",
"\n",
"# Set random seeds for reproducibility\n",
"random.seed(42)\n",
"np.random.seed(42)"
]
},
{
"cell_type": "markdown",
"source": [
"## Step 1: Loading the Real-World Benchmark Dataset\n",
"\n",
"A small real-world benchmark dataset is used to ground the project in realistic Italian restaurant menu conditions.\n",
"\n",
"This benchmark is not the main analytical dataset. \n",
"Its role is to provide realistic reference points for:\n",
"- dish categories\n",
"- menu structure\n",
"- price ranges\n",
"\n",
"The benchmark data was compiled from publicly available restaurant menu sources."
],
"metadata": {
"id": "FpHcBDdWwQ_u"
}
},
{
"cell_type": "code",
"source": [
"# Benchmark dataset removed for Hugging Face version\n",
"print(\"Benchmark dataset skipped.\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "z6eWiJxLwS2-",
"outputId": "25a2133d-5e26-4d43-a5fd-8fd44043fdff"
},
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Benchmark dataset skipped.\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## Step 2: Inspecting the Benchmark Data\n",
"\n",
"Before building the fictional restaurant dataset, the benchmark data is inspected to confirm its structure and relevance.\n",
"\n",
"The benchmark dataset helps ensure that the synthetic restaurant is grounded in realistic:\n",
"- menu categories\n",
"- dish naming conventions\n",
"- price levels"
],
"metadata": {
"id": "UFl_wWkowZJl"
}
},
{
"cell_type": "code",
"source": [
"# Benchmark dataset inspection removed for Hugging Face version\n",
"print(\"Benchmark dataset inspection skipped.\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 726
},
"id": "YJ_z1NJZwaYO",
"outputId": "d1e7e76e-1363-48fd-e6e1-80cd0faf5626"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Benchmark dataset shape: (10, 4)\n",
"\n",
"Columns:\n",
"['reference_restaurant', 'menu_item', 'category', 'listed_price']\n",
"\n",
"Preview of benchmark data:\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
" reference_restaurant menu_item category listed_price\n",
"0 Reference Restaurant A Spaghetti Pomodoro Tomato Pastas 13.5\n",
"1 Reference Restaurant A Penne Arrabbiata Tomato Pastas 14.5\n",
"2 Reference Restaurant A Tagliatelle Alfredo Cream Pastas 16.0\n",
"3 Reference Restaurant B Lasagna Classica Baked Pastas 18.5\n",
"4 Reference Restaurant B Seafood Linguine Seafood Pastas 21.0\n",
"5 Reference Restaurant B Bruschetta Antipasti 8.5\n",
"6 Reference Restaurant C Rigatoni alla Vodka Cream Pastas 17.5\n",
"7 Reference Restaurant C Baked Ziti Baked Pastas 16.5\n",
"8 Reference Restaurant C Calamari Fritti Antipasti 10.5\n",
"9 Reference Restaurant C Linguine alle Vongole Seafood Pastas 20.0"
],
"text/html": [
"\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" reference_restaurant | \n",
" menu_item | \n",
" category | \n",
" listed_price | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Reference Restaurant A | \n",
" Spaghetti Pomodoro | \n",
" Tomato Pastas | \n",
" 13.5 | \n",
"
\n",
" \n",
" | 1 | \n",
" Reference Restaurant A | \n",
" Penne Arrabbiata | \n",
" Tomato Pastas | \n",
" 14.5 | \n",
"
\n",
" \n",
" | 2 | \n",
" Reference Restaurant A | \n",
" Tagliatelle Alfredo | \n",
" Cream Pastas | \n",
" 16.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" Reference Restaurant B | \n",
" Lasagna Classica | \n",
" Baked Pastas | \n",
" 18.5 | \n",
"
\n",
" \n",
" | 4 | \n",
" Reference Restaurant B | \n",
" Seafood Linguine | \n",
" Seafood Pastas | \n",
" 21.0 | \n",
"
\n",
" \n",
" | 5 | \n",
" Reference Restaurant B | \n",
" Bruschetta | \n",
" Antipasti | \n",
" 8.5 | \n",
"
\n",
" \n",
" | 6 | \n",
" Reference Restaurant C | \n",
" Rigatoni alla Vodka | \n",
" Cream Pastas | \n",
" 17.5 | \n",
"
\n",
" \n",
" | 7 | \n",
" Reference Restaurant C | \n",
" Baked Ziti | \n",
" Baked Pastas | \n",
" 16.5 | \n",
"
\n",
" \n",
" | 8 | \n",
" Reference Restaurant C | \n",
" Calamari Fritti | \n",
" Antipasti | \n",
" 10.5 | \n",
"
\n",
" \n",
" | 9 | \n",
" Reference Restaurant C | \n",
" Linguine alle Vongole | \n",
" Seafood Pastas | \n",
" 20.0 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "benchmark_df",
"summary": "{\n \"name\": \"benchmark_df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"reference_restaurant\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Reference Restaurant A\",\n \"Reference Restaurant B\",\n \"Reference Restaurant C\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Calamari Fritti\",\n \"Penne Arrabbiata\",\n \"Bruschetta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\",\n \"Antipasti\",\n \"Baked Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"listed_price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.993397328368694,\n \"min\": 8.5,\n \"max\": 21.0,\n \"num_unique_values\": 10,\n \"samples\": [\n 10.5,\n 14.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {}
},
{
"output_type": "stream",
"name": "stdout",
"text": [
"\n",
"Category distribution in benchmark data:\n"
]
},
{
"output_type": "display_data",
"data": {
"text/plain": [
" count\n",
"category \n",
"Tomato Pastas 2\n",
"Cream Pastas 2\n",
"Baked Pastas 2\n",
"Seafood Pastas 2\n",
"Antipasti 2"
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
"
\n",
" \n",
" | category | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" | Tomato Pastas | \n",
" 2 | \n",
"
\n",
" \n",
" | Cream Pastas | \n",
" 2 | \n",
"
\n",
" \n",
" | Baked Pastas | \n",
" 2 | \n",
"
\n",
" \n",
" | Seafood Pastas | \n",
" 2 | \n",
"
\n",
" \n",
" | Antipasti | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"summary": "{\n \"name\": \"display(benchmark_df[\\\"category\\\"]\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\",\n \"Antipasti\",\n \"Baked Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"count\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 2,\n \"max\": 2,\n \"num_unique_values\": 1,\n \"samples\": [\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"## Step 3: Building the Trattoria Bellapasta Menu\n",
"\n",
"Using the benchmark logic above, a fictional but realistic Italian pasta restaurant called **Trattoria Bellapasta** is created.\n",
"\n",
"The restaurant contains **12 menu items** across five categories:\n",
"- Tomato Pastas\n",
"- Cream Pastas\n",
"- Seafood Pastas\n",
"- Baked Pastas\n",
"- Antipasti\n",
"\n",
"Additional operational variables such as food cost, preparation time, and item flags are added to support later analysis."
],
"metadata": {
"id": "m2Re6j3xwfYm"
}
},
{
"cell_type": "code",
"source": [
"# Create the fictional Trattoria Bellapasta menu\n",
"menu_data = [\n",
" [\"Trattoria Bellapasta\", \"Spaghetti Pomodoro\", \"Tomato Pastas\", 13.50, 3.90, 12, 1, 0],\n",
" [\"Trattoria Bellapasta\", \"Penne Arrabbiata\", \"Tomato Pastas\", 14.50, 4.20, 13, 1, 1],\n",
" [\"Trattoria Bellapasta\", \"Pappardelle al Sugo\", \"Tomato Pastas\", 15.50, 4.60, 13, 0, 0],\n",
" [\"Trattoria Bellapasta\", \"Fettuccine Alfredo\", \"Cream Pastas\", 16.00, 5.10, 14, 1, 0],\n",
" [\"Trattoria Bellapasta\", \"Tagliatelle ai Funghi\", \"Cream Pastas\", 17.00, 5.40, 15, 1, 0],\n",
" [\"Trattoria Bellapasta\", \"Rigatoni alla Vodka\", \"Cream Pastas\", 17.50, 5.60, 15, 1, 0],\n",
" [\"Trattoria Bellapasta\", \"Linguine alle Vongole\", \"Seafood Pastas\", 20.00, 7.40, 17, 0, 0],\n",
" [\"Trattoria Bellapasta\", \"Seafood Linguine\", \"Seafood Pastas\", 21.00, 7.90, 18, 0, 0],\n",
" [\"Trattoria Bellapasta\", \"Rigatoni al Forno\", \"Baked Pastas\", 17.50, 5.80, 18, 0, 0],\n",
" [\"Trattoria Bellapasta\", \"Lasagna Classica\", \"Baked Pastas\", 18.50, 6.40, 20, 0, 0],\n",
" [\"Trattoria Bellapasta\", \"Bruschetta Classica\", \"Antipasti\", 8.50, 2.10, 7, 1, 0],\n",
" [\"Trattoria Bellapasta\", \"Calamari Fritti\", \"Antipasti\", 10.50, 3.40, 9, 0, 0]\n",
"]\n",
"\n",
"menu_df = pd.DataFrame(\n",
" menu_data,\n",
" columns=[\n",
" \"restaurant_name\", \"menu_item\", \"category\", \"listed_price\", \"food_cost\",\n",
" \"prep_time_minutes\", \"vegetarian_flag\", \"spicy_flag\"\n",
" ]\n",
")\n",
"\n",
"menu_df"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 671
},
"id": "8cdZ2cXowgd2",
"outputId": "f0c1af3d-d623-429d-a70f-db1ef93d54a3"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" restaurant_name menu_item category listed_price \\\n",
"0 Trattoria Bellapasta Spaghetti Pomodoro Tomato Pastas 13.5 \n",
"1 Trattoria Bellapasta Penne Arrabbiata Tomato Pastas 14.5 \n",
"2 Trattoria Bellapasta Pappardelle al Sugo Tomato Pastas 15.5 \n",
"3 Trattoria Bellapasta Fettuccine Alfredo Cream Pastas 16.0 \n",
"4 Trattoria Bellapasta Tagliatelle ai Funghi Cream Pastas 17.0 \n",
"5 Trattoria Bellapasta Rigatoni alla Vodka Cream Pastas 17.5 \n",
"6 Trattoria Bellapasta Linguine alle Vongole Seafood Pastas 20.0 \n",
"7 Trattoria Bellapasta Seafood Linguine Seafood Pastas 21.0 \n",
"8 Trattoria Bellapasta Rigatoni al Forno Baked Pastas 17.5 \n",
"9 Trattoria Bellapasta Lasagna Classica Baked Pastas 18.5 \n",
"10 Trattoria Bellapasta Bruschetta Classica Antipasti 8.5 \n",
"11 Trattoria Bellapasta Calamari Fritti Antipasti 10.5 \n",
"\n",
" food_cost prep_time_minutes vegetarian_flag spicy_flag \n",
"0 3.9 12 1 0 \n",
"1 4.2 13 1 1 \n",
"2 4.6 13 0 0 \n",
"3 5.1 14 1 0 \n",
"4 5.4 15 1 0 \n",
"5 5.6 15 1 0 \n",
"6 7.4 17 0 0 \n",
"7 7.9 18 0 0 \n",
"8 5.8 18 0 0 \n",
"9 6.4 20 0 0 \n",
"10 2.1 7 1 0 \n",
"11 3.4 9 0 0 "
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" restaurant_name | \n",
" menu_item | \n",
" category | \n",
" listed_price | \n",
" food_cost | \n",
" prep_time_minutes | \n",
" vegetarian_flag | \n",
" spicy_flag | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Trattoria Bellapasta | \n",
" Spaghetti Pomodoro | \n",
" Tomato Pastas | \n",
" 13.5 | \n",
" 3.9 | \n",
" 12 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 1 | \n",
" Trattoria Bellapasta | \n",
" Penne Arrabbiata | \n",
" Tomato Pastas | \n",
" 14.5 | \n",
" 4.2 | \n",
" 13 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" | 2 | \n",
" Trattoria Bellapasta | \n",
" Pappardelle al Sugo | \n",
" Tomato Pastas | \n",
" 15.5 | \n",
" 4.6 | \n",
" 13 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 3 | \n",
" Trattoria Bellapasta | \n",
" Fettuccine Alfredo | \n",
" Cream Pastas | \n",
" 16.0 | \n",
" 5.1 | \n",
" 14 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 4 | \n",
" Trattoria Bellapasta | \n",
" Tagliatelle ai Funghi | \n",
" Cream Pastas | \n",
" 17.0 | \n",
" 5.4 | \n",
" 15 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 5 | \n",
" Trattoria Bellapasta | \n",
" Rigatoni alla Vodka | \n",
" Cream Pastas | \n",
" 17.5 | \n",
" 5.6 | \n",
" 15 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 6 | \n",
" Trattoria Bellapasta | \n",
" Linguine alle Vongole | \n",
" Seafood Pastas | \n",
" 20.0 | \n",
" 7.4 | \n",
" 17 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 7 | \n",
" Trattoria Bellapasta | \n",
" Seafood Linguine | \n",
" Seafood Pastas | \n",
" 21.0 | \n",
" 7.9 | \n",
" 18 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 8 | \n",
" Trattoria Bellapasta | \n",
" Rigatoni al Forno | \n",
" Baked Pastas | \n",
" 17.5 | \n",
" 5.8 | \n",
" 18 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 9 | \n",
" Trattoria Bellapasta | \n",
" Lasagna Classica | \n",
" Baked Pastas | \n",
" 18.5 | \n",
" 6.4 | \n",
" 20 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 10 | \n",
" Trattoria Bellapasta | \n",
" Bruschetta Classica | \n",
" Antipasti | \n",
" 8.5 | \n",
" 2.1 | \n",
" 7 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 11 | \n",
" Trattoria Bellapasta | \n",
" Calamari Fritti | \n",
" Antipasti | \n",
" 10.5 | \n",
" 3.4 | \n",
" 9 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "menu_df",
"summary": "{\n \"name\": \"menu_df\",\n \"rows\": 12,\n \"fields\": [\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"listed_price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.6639108100829123,\n \"min\": 8.5,\n \"max\": 21.0,\n \"num_unique_values\": 11,\n \"samples\": [\n 17.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"food_cost\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.6517208656978994,\n \"min\": 2.1,\n \"max\": 7.9,\n \"num_unique_values\": 12,\n \"samples\": [\n 2.1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"prep_time_minutes\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3,\n \"min\": 7,\n \"max\": 20,\n \"num_unique_values\": 9,\n \"samples\": [\n 7\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"vegetarian_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"spicy_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 19
}
]
},
{
"cell_type": "markdown",
"source": [
"## Step 4: Generating Synthetic Monthly Sales Data\n",
"\n",
"Synthetic monthly sales data is created for each menu item across 12 months.\n",
"\n",
"This provides the business-performance layer required for later analysis. \n",
"The generated variables include:\n",
"- monthly orders\n",
"- revenue\n",
"- total food cost\n",
"- gross profit\n",
"- complaint rate\n",
"- repeat order rate"
],
"metadata": {
"id": "9w4e4QdRwnJ-"
}
},
{
"cell_type": "code",
"source": [
"# Generate synthetic monthly sales data for each menu item\n",
"months = pd.date_range(\"2025-01-01\", periods=12, freq=\"MS\")\n",
"\n",
"sales_rows = []\n",
"\n",
"for _, row in menu_df.iterrows():\n",
" base_orders = random.randint(60, 140)\n",
"\n",
" for month in months:\n",
" seasonal_factor = random.uniform(0.9, 1.15)\n",
" orders = int(base_orders * seasonal_factor + np.random.normal(0, 8))\n",
" orders = max(20, orders)\n",
"\n",
" revenue = round(orders * row[\"listed_price\"], 2)\n",
" total_food_cost = round(orders * row[\"food_cost\"], 2)\n",
" gross_profit = round(revenue - total_food_cost, 2)\n",
"\n",
" complaint_rate = round(random.uniform(0.01, 0.12), 3)\n",
" repeat_order_rate = round(random.uniform(0.20, 0.55), 3)\n",
"\n",
" sales_rows.append([\n",
" row[\"restaurant_name\"],\n",
" month.strftime(\"%Y-%m\"),\n",
" row[\"menu_item\"],\n",
" row[\"category\"],\n",
" orders,\n",
" revenue,\n",
" total_food_cost,\n",
" gross_profit,\n",
" complaint_rate,\n",
" repeat_order_rate\n",
" ])\n",
"\n",
"sales_df = pd.DataFrame(\n",
" sales_rows,\n",
" columns=[\n",
" \"restaurant_name\", \"month\", \"menu_item\", \"category\", \"orders\", \"revenue\",\n",
" \"total_food_cost\", \"gross_profit\", \"complaint_rate\", \"repeat_order_rate\"\n",
" ]\n",
")\n",
"\n",
"sales_df.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 313
},
"id": "o03p9szDwoJQ",
"outputId": "476a669a-1872-4605-8249-5773a5cb5d48"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" restaurant_name month menu_item category orders \\\n",
"0 Trattoria Bellapasta 2025-01 Spaghetti Pomodoro Tomato Pastas 71 \n",
"1 Trattoria Bellapasta 2025-02 Spaghetti Pomodoro Tomato Pastas 79 \n",
"2 Trattoria Bellapasta 2025-03 Spaghetti Pomodoro Tomato Pastas 73 \n",
"3 Trattoria Bellapasta 2025-04 Spaghetti Pomodoro Tomato Pastas 82 \n",
"4 Trattoria Bellapasta 2025-05 Spaghetti Pomodoro Tomato Pastas 68 \n",
"\n",
" revenue total_food_cost gross_profit complaint_rate repeat_order_rate \n",
"0 958.5 276.9 681.6 0.040 0.278 \n",
"1 1066.5 308.1 758.4 0.084 0.512 \n",
"2 985.5 284.7 700.8 0.056 0.210 \n",
"3 1107.0 319.8 787.2 0.066 0.209 \n",
"4 918.0 265.2 652.8 0.081 0.391 "
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" restaurant_name | \n",
" month | \n",
" menu_item | \n",
" category | \n",
" orders | \n",
" revenue | \n",
" total_food_cost | \n",
" gross_profit | \n",
" complaint_rate | \n",
" repeat_order_rate | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Trattoria Bellapasta | \n",
" 2025-01 | \n",
" Spaghetti Pomodoro | \n",
" Tomato Pastas | \n",
" 71 | \n",
" 958.5 | \n",
" 276.9 | \n",
" 681.6 | \n",
" 0.040 | \n",
" 0.278 | \n",
"
\n",
" \n",
" | 1 | \n",
" Trattoria Bellapasta | \n",
" 2025-02 | \n",
" Spaghetti Pomodoro | \n",
" Tomato Pastas | \n",
" 79 | \n",
" 1066.5 | \n",
" 308.1 | \n",
" 758.4 | \n",
" 0.084 | \n",
" 0.512 | \n",
"
\n",
" \n",
" | 2 | \n",
" Trattoria Bellapasta | \n",
" 2025-03 | \n",
" Spaghetti Pomodoro | \n",
" Tomato Pastas | \n",
" 73 | \n",
" 985.5 | \n",
" 284.7 | \n",
" 700.8 | \n",
" 0.056 | \n",
" 0.210 | \n",
"
\n",
" \n",
" | 3 | \n",
" Trattoria Bellapasta | \n",
" 2025-04 | \n",
" Spaghetti Pomodoro | \n",
" Tomato Pastas | \n",
" 82 | \n",
" 1107.0 | \n",
" 319.8 | \n",
" 787.2 | \n",
" 0.066 | \n",
" 0.209 | \n",
"
\n",
" \n",
" | 4 | \n",
" Trattoria Bellapasta | \n",
" 2025-05 | \n",
" Spaghetti Pomodoro | \n",
" Tomato Pastas | \n",
" 68 | \n",
" 918.0 | \n",
" 265.2 | \n",
" 652.8 | \n",
" 0.081 | \n",
" 0.391 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "sales_df",
"summary": "{\n \"name\": \"sales_df\",\n \"rows\": 144,\n \"fields\": [\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"month\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"2025-11\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"orders\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 26,\n \"min\": 46,\n \"max\": 162,\n \"num_unique_values\": 73,\n \"samples\": [\n 68\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"revenue\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 583.1490021780527,\n \"min\": 391.0,\n \"max\": 2860.0,\n \"num_unique_values\": 120,\n \"samples\": [\n 1428.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_food_cost\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 216.4400761721973,\n \"min\": 96.6,\n \"max\": 1058.2,\n \"num_unique_values\": 121,\n \"samples\": [\n 453.6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"gross_profit\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 374.4191271786242,\n \"min\": 294.4,\n \"max\": 1895.4,\n \"num_unique_values\": 119,\n \"samples\": [\n 1213.8\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"complaint_rate\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.030614731759388336,\n \"min\": 0.01,\n \"max\": 0.12,\n \"num_unique_values\": 78,\n \"samples\": [\n 0.068\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"repeat_order_rate\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.10995740504682221,\n \"min\": 0.201,\n \"max\": 0.546,\n \"num_unique_values\": 122,\n \"samples\": [\n 0.286\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 20
}
]
},
{
"cell_type": "markdown",
"source": [
"## Step 5: Generating Synthetic Customer Reviews\n",
"\n",
"Synthetic customer reviews are created at menu-item level to simulate customer feedback.\n",
"\n",
"Each review includes:\n",
"- a numerical rating\n",
"- a short review text\n",
"- a review date\n",
"- the associated menu item\n",
"\n",
"This creates the qualitative data layer needed for later review analysis."
],
"metadata": {
"id": "MAaPosUDwrLO"
}
},
{
"cell_type": "code",
"source": [
"# Templates for synthetic reviews\n",
"positive_templates = [\n",
" \"Loved the {item}. Great flavour and very satisfying.\",\n",
" \"The {item} was excellent and well balanced.\",\n",
" \"Really enjoyed the {item}. Would order again.\",\n",
" \"The {item} tasted fresh and was one of the best dishes on the menu.\",\n",
" \"Very happy with the {item}. Strong choice overall.\"\n",
"]\n",
"\n",
"neutral_templates = [\n",
" \"The {item} was decent but not especially memorable.\",\n",
" \"The {item} was okay. Nothing particularly wrong, but nothing exceptional.\",\n",
" \"The {item} was fine, although it could have been more flavourful.\",\n",
" \"Reasonable experience with the {item}, but there is room for improvement.\"\n",
"]\n",
"\n",
"negative_templates = [\n",
" \"The {item} was disappointing and lacked flavour.\",\n",
" \"Did not enjoy the {item}. It felt overpriced for the quality.\",\n",
" \"The {item} was underwhelming and not worth ordering again.\",\n",
" \"The {item} was too heavy and not very satisfying.\",\n",
" \"The {item} needs improvement in taste and consistency.\"\n",
"]\n",
"\n",
"review_rows = []\n",
"review_id = 1\n",
"\n",
"for _, row in menu_df.iterrows():\n",
" n_reviews = random.randint(12, 20)\n",
"\n",
" for _ in range(n_reviews):\n",
" rating = random.choices(\n",
" population=[2, 3, 4, 5],\n",
" weights=[0.10, 0.20, 0.40, 0.30],\n",
" k=1\n",
" )[0]\n",
"\n",
" if rating >= 4:\n",
" review_text = random.choice(positive_templates).format(item=row[\"menu_item\"])\n",
" elif rating == 3:\n",
" review_text = random.choice(neutral_templates).format(item=row[\"menu_item\"])\n",
" else:\n",
" review_text = random.choice(negative_templates).format(item=row[\"menu_item\"])\n",
"\n",
" review_date = datetime(2025, 1, 1) + timedelta(days=random.randint(0, 300))\n",
"\n",
" review_rows.append([\n",
" review_id,\n",
" row[\"restaurant_name\"],\n",
" review_date.strftime(\"%Y-%m-%d\"),\n",
" row[\"menu_item\"],\n",
" rating,\n",
" review_text\n",
" ])\n",
" review_id += 1\n",
"\n",
"reviews_df = pd.DataFrame(\n",
" review_rows,\n",
" columns=[\n",
" \"review_id\", \"restaurant_name\", \"review_date\", \"menu_item\", \"rating\", \"review_text\"\n",
" ]\n",
")\n",
"\n",
"reviews_df.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "Vt5wJbkXwwfG",
"outputId": "29a83c3d-971a-4ace-9140-a81e014520e6"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" review_id restaurant_name review_date menu_item rating \\\n",
"0 1 Trattoria Bellapasta 2025-09-16 Spaghetti Pomodoro 4 \n",
"1 2 Trattoria Bellapasta 2025-08-17 Spaghetti Pomodoro 4 \n",
"2 3 Trattoria Bellapasta 2025-08-19 Spaghetti Pomodoro 5 \n",
"3 4 Trattoria Bellapasta 2025-05-22 Spaghetti Pomodoro 3 \n",
"4 5 Trattoria Bellapasta 2025-09-06 Spaghetti Pomodoro 5 \n",
"\n",
" review_text \n",
"0 The Spaghetti Pomodoro tasted fresh and was on... \n",
"1 Very happy with the Spaghetti Pomodoro. Strong... \n",
"2 The Spaghetti Pomodoro tasted fresh and was on... \n",
"3 The Spaghetti Pomodoro was okay. Nothing parti... \n",
"4 Very happy with the Spaghetti Pomodoro. Strong... "
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" review_id | \n",
" restaurant_name | \n",
" review_date | \n",
" menu_item | \n",
" rating | \n",
" review_text | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" Trattoria Bellapasta | \n",
" 2025-09-16 | \n",
" Spaghetti Pomodoro | \n",
" 4 | \n",
" The Spaghetti Pomodoro tasted fresh and was on... | \n",
"
\n",
" \n",
" | 1 | \n",
" 2 | \n",
" Trattoria Bellapasta | \n",
" 2025-08-17 | \n",
" Spaghetti Pomodoro | \n",
" 4 | \n",
" Very happy with the Spaghetti Pomodoro. Strong... | \n",
"
\n",
" \n",
" | 2 | \n",
" 3 | \n",
" Trattoria Bellapasta | \n",
" 2025-08-19 | \n",
" Spaghetti Pomodoro | \n",
" 5 | \n",
" The Spaghetti Pomodoro tasted fresh and was on... | \n",
"
\n",
" \n",
" | 3 | \n",
" 4 | \n",
" Trattoria Bellapasta | \n",
" 2025-05-22 | \n",
" Spaghetti Pomodoro | \n",
" 3 | \n",
" The Spaghetti Pomodoro was okay. Nothing parti... | \n",
"
\n",
" \n",
" | 4 | \n",
" 5 | \n",
" Trattoria Bellapasta | \n",
" 2025-09-06 | \n",
" Spaghetti Pomodoro | \n",
" 5 | \n",
" Very happy with the Spaghetti Pomodoro. Strong... | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "reviews_df",
"summary": "{\n \"name\": \"reviews_df\",\n \"rows\": 200,\n \"fields\": [\n {\n \"column\": \"review_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 57,\n \"min\": 1,\n \"max\": 200,\n \"num_unique_values\": 200,\n \"samples\": [\n 96,\n 16,\n 31\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_date\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 145,\n \"samples\": [\n \"2025-03-19\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 2,\n \"max\": 5,\n \"num_unique_values\": 4,\n \"samples\": [\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_text\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 97,\n \"samples\": [\n \"The Seafood Linguine was decent but not especially memorable.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 21
}
]
},
{
"cell_type": "markdown",
"source": [
"## Step 6: Annotating the Reviews\n",
"\n",
"To make the review dataset more useful for later analysis, each review is annotated with:\n",
"\n",
"- `sentiment_label`\n",
"- `theme`\n",
"- `complaint_flag`\n",
"\n",
"This makes the dataset closer to the structured review annotation approach used in workshop exercises."
],
"metadata": {
"id": "1JDqpCL0w2HR"
}
},
{
"cell_type": "code",
"source": [
"# Annotate synthetic reviews\n",
"themes_positive = [\"Taste\", \"Freshness\", \"Portion Size\", \"Value\"]\n",
"themes_neutral = [\"Taste\", \"Texture\", \"Portion Size\", \"Value\"]\n",
"themes_negative = [\"Taste\", \"Value\", \"Texture\", \"Waiting Time\"]\n",
"\n",
"def assign_sentiment(rating):\n",
" if rating >= 4:\n",
" return \"Positive\"\n",
" elif rating == 3:\n",
" return \"Neutral\"\n",
" return \"Negative\"\n",
"\n",
"def assign_theme(rating):\n",
" if rating >= 4:\n",
" return random.choice(themes_positive)\n",
" elif rating == 3:\n",
" return random.choice(themes_neutral)\n",
" return random.choice(themes_negative)\n",
"\n",
"reviews_df[\"sentiment_label\"] = reviews_df[\"rating\"].apply(assign_sentiment)\n",
"reviews_df[\"theme\"] = reviews_df[\"rating\"].apply(assign_theme)\n",
"reviews_df[\"complaint_flag\"] = reviews_df[\"rating\"].apply(lambda x: 1 if x <= 2 else 0)\n",
"\n",
"reviews_df.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 521
},
"id": "84kPk1yCw2-v",
"outputId": "29918bc9-d83b-44b1-947c-fe8dd0bf0e61"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" review_id restaurant_name review_date menu_item rating \\\n",
"0 1 Trattoria Bellapasta 2025-09-16 Spaghetti Pomodoro 4 \n",
"1 2 Trattoria Bellapasta 2025-08-17 Spaghetti Pomodoro 4 \n",
"2 3 Trattoria Bellapasta 2025-08-19 Spaghetti Pomodoro 5 \n",
"3 4 Trattoria Bellapasta 2025-05-22 Spaghetti Pomodoro 3 \n",
"4 5 Trattoria Bellapasta 2025-09-06 Spaghetti Pomodoro 5 \n",
"\n",
" review_text sentiment_label \\\n",
"0 The Spaghetti Pomodoro tasted fresh and was on... Positive \n",
"1 Very happy with the Spaghetti Pomodoro. Strong... Positive \n",
"2 The Spaghetti Pomodoro tasted fresh and was on... Positive \n",
"3 The Spaghetti Pomodoro was okay. Nothing parti... Neutral \n",
"4 Very happy with the Spaghetti Pomodoro. Strong... Positive \n",
"\n",
" theme complaint_flag \n",
"0 Taste 0 \n",
"1 Taste 0 \n",
"2 Portion Size 0 \n",
"3 Value 0 \n",
"4 Taste 0 "
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" review_id | \n",
" restaurant_name | \n",
" review_date | \n",
" menu_item | \n",
" rating | \n",
" review_text | \n",
" sentiment_label | \n",
" theme | \n",
" complaint_flag | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" Trattoria Bellapasta | \n",
" 2025-09-16 | \n",
" Spaghetti Pomodoro | \n",
" 4 | \n",
" The Spaghetti Pomodoro tasted fresh and was on... | \n",
" Positive | \n",
" Taste | \n",
" 0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 2 | \n",
" Trattoria Bellapasta | \n",
" 2025-08-17 | \n",
" Spaghetti Pomodoro | \n",
" 4 | \n",
" Very happy with the Spaghetti Pomodoro. Strong... | \n",
" Positive | \n",
" Taste | \n",
" 0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 3 | \n",
" Trattoria Bellapasta | \n",
" 2025-08-19 | \n",
" Spaghetti Pomodoro | \n",
" 5 | \n",
" The Spaghetti Pomodoro tasted fresh and was on... | \n",
" Positive | \n",
" Portion Size | \n",
" 0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 4 | \n",
" Trattoria Bellapasta | \n",
" 2025-05-22 | \n",
" Spaghetti Pomodoro | \n",
" 3 | \n",
" The Spaghetti Pomodoro was okay. Nothing parti... | \n",
" Neutral | \n",
" Value | \n",
" 0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 5 | \n",
" Trattoria Bellapasta | \n",
" 2025-09-06 | \n",
" Spaghetti Pomodoro | \n",
" 5 | \n",
" Very happy with the Spaghetti Pomodoro. Strong... | \n",
" Positive | \n",
" Taste | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "reviews_df",
"summary": "{\n \"name\": \"reviews_df\",\n \"rows\": 200,\n \"fields\": [\n {\n \"column\": \"review_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 57,\n \"min\": 1,\n \"max\": 200,\n \"num_unique_values\": 200,\n \"samples\": [\n 96,\n 16,\n 31\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_date\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 145,\n \"samples\": [\n \"2025-03-19\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 2,\n \"max\": 5,\n \"num_unique_values\": 4,\n \"samples\": [\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_text\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 97,\n \"samples\": [\n \"The Seafood Linguine was decent but not especially memorable.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"sentiment_label\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Positive\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"theme\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Taste\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"complaint_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 22
}
]
},
{
"cell_type": "markdown",
"source": [
"## Step 7: Creating a Menu Item Summary Table\n",
"\n",
"A summary table is created to provide a compact item-level overview before moving to the analysis notebook.\n",
"\n",
"This table combines:\n",
"- menu information\n",
"- average rating\n",
"- review count\n",
"- total orders\n",
"- total revenue\n",
"- total gross profit\n",
"- profit margin percentage"
],
"metadata": {
"id": "mCz-xB9tw7wL"
}
},
{
"cell_type": "code",
"source": [
"# Create review summary\n",
"review_summary = reviews_df.groupby(\"menu_item\").agg(\n",
" avg_rating=(\"rating\", \"mean\"),\n",
" review_count=(\"review_id\", \"count\")\n",
").reset_index()\n",
"\n",
"# Create sales summary\n",
"sales_summary = sales_df.groupby(\"menu_item\").agg(\n",
" total_orders=(\"orders\", \"sum\"),\n",
" total_revenue=(\"revenue\", \"sum\"),\n",
" total_gross_profit=(\"gross_profit\", \"sum\")\n",
").reset_index()\n",
"\n",
"# Merge into one summary table\n",
"menu_item_summary_df = menu_df.merge(review_summary, on=\"menu_item\", how=\"left\")\n",
"menu_item_summary_df = menu_item_summary_df.merge(sales_summary, on=\"menu_item\", how=\"left\")\n",
"\n",
"# Add profit margin percentage\n",
"menu_item_summary_df[\"avg_rating\"] = menu_item_summary_df[\"avg_rating\"].round(2)\n",
"menu_item_summary_df[\"profit_margin_pct\"] = (\n",
" (menu_item_summary_df[\"listed_price\"] - menu_item_summary_df[\"food_cost\"]) /\n",
" menu_item_summary_df[\"listed_price\"] * 100\n",
").round(2)\n",
"\n",
"menu_item_summary_df.head()"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 313
},
"id": "GuDK3-5Bw9Zs",
"outputId": "9effa7dc-ede5-4bb4-868e-f7f414b09820"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" restaurant_name menu_item category listed_price \\\n",
"0 Trattoria Bellapasta Spaghetti Pomodoro Tomato Pastas 13.5 \n",
"1 Trattoria Bellapasta Penne Arrabbiata Tomato Pastas 14.5 \n",
"2 Trattoria Bellapasta Pappardelle al Sugo Tomato Pastas 15.5 \n",
"3 Trattoria Bellapasta Fettuccine Alfredo Cream Pastas 16.0 \n",
"4 Trattoria Bellapasta Tagliatelle ai Funghi Cream Pastas 17.0 \n",
"\n",
" food_cost prep_time_minutes vegetarian_flag spicy_flag avg_rating \\\n",
"0 3.9 12 1 0 3.85 \n",
"1 4.2 13 1 1 4.17 \n",
"2 4.6 13 0 0 4.19 \n",
"3 5.1 14 1 0 3.95 \n",
"4 5.4 15 1 0 3.77 \n",
"\n",
" review_count total_orders total_revenue total_gross_profit \\\n",
"0 20 899 12136.5 8630.4 \n",
"1 18 1696 24592.0 17468.8 \n",
"2 16 1071 16600.5 11673.9 \n",
"3 19 1109 17744.0 12088.1 \n",
"4 13 1025 17425.0 11890.0 \n",
"\n",
" profit_margin_pct \n",
"0 71.11 \n",
"1 71.03 \n",
"2 70.32 \n",
"3 68.12 \n",
"4 68.24 "
],
"text/html": [
"\n",
" \n",
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" restaurant_name | \n",
" menu_item | \n",
" category | \n",
" listed_price | \n",
" food_cost | \n",
" prep_time_minutes | \n",
" vegetarian_flag | \n",
" spicy_flag | \n",
" avg_rating | \n",
" review_count | \n",
" total_orders | \n",
" total_revenue | \n",
" total_gross_profit | \n",
" profit_margin_pct | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" Trattoria Bellapasta | \n",
" Spaghetti Pomodoro | \n",
" Tomato Pastas | \n",
" 13.5 | \n",
" 3.9 | \n",
" 12 | \n",
" 1 | \n",
" 0 | \n",
" 3.85 | \n",
" 20 | \n",
" 899 | \n",
" 12136.5 | \n",
" 8630.4 | \n",
" 71.11 | \n",
"
\n",
" \n",
" | 1 | \n",
" Trattoria Bellapasta | \n",
" Penne Arrabbiata | \n",
" Tomato Pastas | \n",
" 14.5 | \n",
" 4.2 | \n",
" 13 | \n",
" 1 | \n",
" 1 | \n",
" 4.17 | \n",
" 18 | \n",
" 1696 | \n",
" 24592.0 | \n",
" 17468.8 | \n",
" 71.03 | \n",
"
\n",
" \n",
" | 2 | \n",
" Trattoria Bellapasta | \n",
" Pappardelle al Sugo | \n",
" Tomato Pastas | \n",
" 15.5 | \n",
" 4.6 | \n",
" 13 | \n",
" 0 | \n",
" 0 | \n",
" 4.19 | \n",
" 16 | \n",
" 1071 | \n",
" 16600.5 | \n",
" 11673.9 | \n",
" 70.32 | \n",
"
\n",
" \n",
" | 3 | \n",
" Trattoria Bellapasta | \n",
" Fettuccine Alfredo | \n",
" Cream Pastas | \n",
" 16.0 | \n",
" 5.1 | \n",
" 14 | \n",
" 1 | \n",
" 0 | \n",
" 3.95 | \n",
" 19 | \n",
" 1109 | \n",
" 17744.0 | \n",
" 12088.1 | \n",
" 68.12 | \n",
"
\n",
" \n",
" | 4 | \n",
" Trattoria Bellapasta | \n",
" Tagliatelle ai Funghi | \n",
" Cream Pastas | \n",
" 17.0 | \n",
" 5.4 | \n",
" 15 | \n",
" 1 | \n",
" 0 | \n",
" 3.77 | \n",
" 13 | \n",
" 1025 | \n",
" 17425.0 | \n",
" 11890.0 | \n",
" 68.24 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n"
],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
"variable_name": "menu_item_summary_df",
"summary": "{\n \"name\": \"menu_item_summary_df\",\n \"rows\": 12,\n \"fields\": [\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"listed_price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.6639108100829123,\n \"min\": 8.5,\n \"max\": 21.0,\n \"num_unique_values\": 11,\n \"samples\": [\n 17.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"food_cost\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.6517208656978994,\n \"min\": 2.1,\n \"max\": 7.9,\n \"num_unique_values\": 12,\n \"samples\": [\n 2.1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"prep_time_minutes\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3,\n \"min\": 7,\n \"max\": 20,\n \"num_unique_values\": 9,\n \"samples\": [\n 7\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"vegetarian_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"spicy_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"avg_rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.21020913107636116,\n \"min\": 3.77,\n \"max\": 4.42,\n \"num_unique_values\": 11,\n \"samples\": [\n 3.83\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_count\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3,\n \"min\": 12,\n \"max\": 20,\n \"num_unique_values\": 6,\n \"samples\": [\n 20\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_orders\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 302,\n \"min\": 728,\n \"max\": 1696,\n \"num_unique_values\": 12,\n \"samples\": [\n 728\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_revenue\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6983.92005056811,\n \"min\": 6188.0,\n \"max\": 29700.0,\n \"num_unique_values\": 12,\n \"samples\": [\n 6188.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_gross_profit\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4466.736931423817,\n \"min\": 4659.2,\n \"max\": 18907.2,\n \"num_unique_values\": 12,\n \"samples\": [\n 4659.2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"profit_margin_pct\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.593292994657484,\n \"min\": 62.38,\n \"max\": 75.29,\n \"num_unique_values\": 12,\n \"samples\": [\n 75.29\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
}
},
"metadata": {},
"execution_count": 23
}
]
},
{
"cell_type": "markdown",
"source": [
"## Step 8: Exporting the Final Files\n",
"\n",
"The final step is to export the cleaned and enriched datasets that will be used in Notebook 2 for analysis.\n",
"\n",
"The exported files are:\n",
"- `real_world_menu_benchmark.csv`\n",
"- `menu_items.csv`\n",
"- `monthly_sales.csv`\n",
"- `reviews.csv`\n",
"- `menu_item_summary.csv`"
],
"metadata": {
"id": "51LCiy1f0Guy"
}
},
{
"cell_type": "code",
"source": [
"# Export all project files for Notebook 2\n",
"menu_df.to_csv(\"menu_items.csv\", index=False)\n",
"sales_df.to_csv(\"monthly_sales.csv\", index=False)\n",
"reviews_df.to_csv(\"reviews.csv\", index=False)\n",
"menu_item_summary_df.to_csv(\"menu_item_summary.csv\", index=False)\n",
"\n",
"print(\"Files exported successfully.\")"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "A5561w4-0Gea",
"outputId": "814e93ea-5918-4445-fab7-795cf505a510"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Files exported successfully.\n"
]
}
]
}
]
}