{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Notebook 1: Real Data Extraction and Synthetic Enrichment\n", "\n", "## Project\n", "**AI-Powered Menu Decision Assistant for Restaurants**\n", "\n", "## Objective\n", "This notebook prepares the dataset used in the project. It combines a small real-world benchmark dataset with synthetic restaurant data for **Trattoria Bellapasta**.\n", "\n", "## Workflow\n", "This notebook performs the following steps:\n", "1. Load a small real-world benchmark dataset from public restaurant menu references\n", "2. Build the fictional menu for Trattoria Bellapasta\n", "3. Generate synthetic monthly sales data\n", "4. Generate synthetic customer reviews\n", "5. Annotate the reviews with sentiment and theme labels\n", "6. Export the final CSV files for Notebook 2" ], "metadata": { "id": "lnf7unLtv33n" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "T52Nn9X6vuNb" }, "outputs": [], "source": [ "# Import libraries used for dataset creation and manipulation\n", "import pandas as pd\n", "import numpy as np\n", "import random\n", "from datetime import datetime, timedelta\n", "\n", "# Set random seeds for reproducibility\n", "random.seed(42)\n", "np.random.seed(42)" ] }, { "cell_type": "markdown", "source": [ "## Step 1: Loading the Real-World Benchmark Dataset\n", "\n", "A small real-world benchmark dataset is used to ground the project in realistic Italian restaurant menu conditions.\n", "\n", "This benchmark is not the main analytical dataset. \n", "Its role is to provide realistic reference points for:\n", "- dish categories\n", "- menu structure\n", "- price ranges\n", "\n", "The benchmark data was compiled from publicly available restaurant menu sources." ], "metadata": { "id": "FpHcBDdWwQ_u" } }, { "cell_type": "code", "source": [ "# Benchmark dataset removed for Hugging Face version\n", "print(\"Benchmark dataset skipped.\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "z6eWiJxLwS2-", "outputId": "25a2133d-5e26-4d43-a5fd-8fd44043fdff" }, "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Benchmark dataset skipped.\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Step 2: Inspecting the Benchmark Data\n", "\n", "Before building the fictional restaurant dataset, the benchmark data is inspected to confirm its structure and relevance.\n", "\n", "The benchmark dataset helps ensure that the synthetic restaurant is grounded in realistic:\n", "- menu categories\n", "- dish naming conventions\n", "- price levels" ], "metadata": { "id": "UFl_wWkowZJl" } }, { "cell_type": "code", "source": [ "# Benchmark dataset inspection removed for Hugging Face version\n", "print(\"Benchmark dataset inspection skipped.\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 726 }, "id": "YJ_z1NJZwaYO", "outputId": "d1e7e76e-1363-48fd-e6e1-80cd0faf5626" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Benchmark dataset shape: (10, 4)\n", "\n", "Columns:\n", "['reference_restaurant', 'menu_item', 'category', 'listed_price']\n", "\n", "Preview of benchmark data:\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ " reference_restaurant menu_item category listed_price\n", "0 Reference Restaurant A Spaghetti Pomodoro Tomato Pastas 13.5\n", "1 Reference Restaurant A Penne Arrabbiata Tomato Pastas 14.5\n", "2 Reference Restaurant A Tagliatelle Alfredo Cream Pastas 16.0\n", "3 Reference Restaurant B Lasagna Classica Baked Pastas 18.5\n", "4 Reference Restaurant B Seafood Linguine Seafood Pastas 21.0\n", "5 Reference Restaurant B Bruschetta Antipasti 8.5\n", "6 Reference Restaurant C Rigatoni alla Vodka Cream Pastas 17.5\n", "7 Reference Restaurant C Baked Ziti Baked Pastas 16.5\n", "8 Reference Restaurant C Calamari Fritti Antipasti 10.5\n", "9 Reference Restaurant C Linguine alle Vongole Seafood Pastas 20.0" ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
reference_restaurantmenu_itemcategorylisted_price
0Reference Restaurant ASpaghetti PomodoroTomato Pastas13.5
1Reference Restaurant APenne ArrabbiataTomato Pastas14.5
2Reference Restaurant ATagliatelle AlfredoCream Pastas16.0
3Reference Restaurant BLasagna ClassicaBaked Pastas18.5
4Reference Restaurant BSeafood LinguineSeafood Pastas21.0
5Reference Restaurant BBruschettaAntipasti8.5
6Reference Restaurant CRigatoni alla VodkaCream Pastas17.5
7Reference Restaurant CBaked ZitiBaked Pastas16.5
8Reference Restaurant CCalamari FrittiAntipasti10.5
9Reference Restaurant CLinguine alle VongoleSeafood Pastas20.0
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "benchmark_df", "summary": "{\n \"name\": \"benchmark_df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"reference_restaurant\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Reference Restaurant A\",\n \"Reference Restaurant B\",\n \"Reference Restaurant C\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Calamari Fritti\",\n \"Penne Arrabbiata\",\n \"Bruschetta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\",\n \"Antipasti\",\n \"Baked Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"listed_price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.993397328368694,\n \"min\": 8.5,\n \"max\": 21.0,\n \"num_unique_values\": 10,\n \"samples\": [\n 10.5,\n 14.5,\n 8.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {} }, { "output_type": "stream", "name": "stdout", "text": [ "\n", "Category distribution in benchmark data:\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ " count\n", "category \n", "Tomato Pastas 2\n", "Cream Pastas 2\n", "Baked Pastas 2\n", "Seafood Pastas 2\n", "Antipasti 2" ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
count
category
Tomato Pastas2
Cream Pastas2
Baked Pastas2
Seafood Pastas2
Antipasti2
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "summary": "{\n \"name\": \"display(benchmark_df[\\\"category\\\"]\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\",\n \"Antipasti\",\n \"Baked Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"count\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 2,\n \"max\": 2,\n \"num_unique_values\": 1,\n \"samples\": [\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "## Step 3: Building the Trattoria Bellapasta Menu\n", "\n", "Using the benchmark logic above, a fictional but realistic Italian pasta restaurant called **Trattoria Bellapasta** is created.\n", "\n", "The restaurant contains **12 menu items** across five categories:\n", "- Tomato Pastas\n", "- Cream Pastas\n", "- Seafood Pastas\n", "- Baked Pastas\n", "- Antipasti\n", "\n", "Additional operational variables such as food cost, preparation time, and item flags are added to support later analysis." ], "metadata": { "id": "m2Re6j3xwfYm" } }, { "cell_type": "code", "source": [ "# Create the fictional Trattoria Bellapasta menu\n", "menu_data = [\n", " [\"Trattoria Bellapasta\", \"Spaghetti Pomodoro\", \"Tomato Pastas\", 13.50, 3.90, 12, 1, 0],\n", " [\"Trattoria Bellapasta\", \"Penne Arrabbiata\", \"Tomato Pastas\", 14.50, 4.20, 13, 1, 1],\n", " [\"Trattoria Bellapasta\", \"Pappardelle al Sugo\", \"Tomato Pastas\", 15.50, 4.60, 13, 0, 0],\n", " [\"Trattoria Bellapasta\", \"Fettuccine Alfredo\", \"Cream Pastas\", 16.00, 5.10, 14, 1, 0],\n", " [\"Trattoria Bellapasta\", \"Tagliatelle ai Funghi\", \"Cream Pastas\", 17.00, 5.40, 15, 1, 0],\n", " [\"Trattoria Bellapasta\", \"Rigatoni alla Vodka\", \"Cream Pastas\", 17.50, 5.60, 15, 1, 0],\n", " [\"Trattoria Bellapasta\", \"Linguine alle Vongole\", \"Seafood Pastas\", 20.00, 7.40, 17, 0, 0],\n", " [\"Trattoria Bellapasta\", \"Seafood Linguine\", \"Seafood Pastas\", 21.00, 7.90, 18, 0, 0],\n", " [\"Trattoria Bellapasta\", \"Rigatoni al Forno\", \"Baked Pastas\", 17.50, 5.80, 18, 0, 0],\n", " [\"Trattoria Bellapasta\", \"Lasagna Classica\", \"Baked Pastas\", 18.50, 6.40, 20, 0, 0],\n", " [\"Trattoria Bellapasta\", \"Bruschetta Classica\", \"Antipasti\", 8.50, 2.10, 7, 1, 0],\n", " [\"Trattoria Bellapasta\", \"Calamari Fritti\", \"Antipasti\", 10.50, 3.40, 9, 0, 0]\n", "]\n", "\n", "menu_df = pd.DataFrame(\n", " menu_data,\n", " columns=[\n", " \"restaurant_name\", \"menu_item\", \"category\", \"listed_price\", \"food_cost\",\n", " \"prep_time_minutes\", \"vegetarian_flag\", \"spicy_flag\"\n", " ]\n", ")\n", "\n", "menu_df" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 671 }, "id": "8cdZ2cXowgd2", "outputId": "f0c1af3d-d623-429d-a70f-db1ef93d54a3" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " restaurant_name menu_item category listed_price \\\n", "0 Trattoria Bellapasta Spaghetti Pomodoro Tomato Pastas 13.5 \n", "1 Trattoria Bellapasta Penne Arrabbiata Tomato Pastas 14.5 \n", "2 Trattoria Bellapasta Pappardelle al Sugo Tomato Pastas 15.5 \n", "3 Trattoria Bellapasta Fettuccine Alfredo Cream Pastas 16.0 \n", "4 Trattoria Bellapasta Tagliatelle ai Funghi Cream Pastas 17.0 \n", "5 Trattoria Bellapasta Rigatoni alla Vodka Cream Pastas 17.5 \n", "6 Trattoria Bellapasta Linguine alle Vongole Seafood Pastas 20.0 \n", "7 Trattoria Bellapasta Seafood Linguine Seafood Pastas 21.0 \n", "8 Trattoria Bellapasta Rigatoni al Forno Baked Pastas 17.5 \n", "9 Trattoria Bellapasta Lasagna Classica Baked Pastas 18.5 \n", "10 Trattoria Bellapasta Bruschetta Classica Antipasti 8.5 \n", "11 Trattoria Bellapasta Calamari Fritti Antipasti 10.5 \n", "\n", " food_cost prep_time_minutes vegetarian_flag spicy_flag \n", "0 3.9 12 1 0 \n", "1 4.2 13 1 1 \n", "2 4.6 13 0 0 \n", "3 5.1 14 1 0 \n", "4 5.4 15 1 0 \n", "5 5.6 15 1 0 \n", "6 7.4 17 0 0 \n", "7 7.9 18 0 0 \n", "8 5.8 18 0 0 \n", "9 6.4 20 0 0 \n", "10 2.1 7 1 0 \n", "11 3.4 9 0 0 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
restaurant_namemenu_itemcategorylisted_pricefood_costprep_time_minutesvegetarian_flagspicy_flag
0Trattoria BellapastaSpaghetti PomodoroTomato Pastas13.53.91210
1Trattoria BellapastaPenne ArrabbiataTomato Pastas14.54.21311
2Trattoria BellapastaPappardelle al SugoTomato Pastas15.54.61300
3Trattoria BellapastaFettuccine AlfredoCream Pastas16.05.11410
4Trattoria BellapastaTagliatelle ai FunghiCream Pastas17.05.41510
5Trattoria BellapastaRigatoni alla VodkaCream Pastas17.55.61510
6Trattoria BellapastaLinguine alle VongoleSeafood Pastas20.07.41700
7Trattoria BellapastaSeafood LinguineSeafood Pastas21.07.91800
8Trattoria BellapastaRigatoni al FornoBaked Pastas17.55.81800
9Trattoria BellapastaLasagna ClassicaBaked Pastas18.56.42000
10Trattoria BellapastaBruschetta ClassicaAntipasti8.52.1710
11Trattoria BellapastaCalamari FrittiAntipasti10.53.4900
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", " \n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "menu_df", "summary": "{\n \"name\": \"menu_df\",\n \"rows\": 12,\n \"fields\": [\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"listed_price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.6639108100829123,\n \"min\": 8.5,\n \"max\": 21.0,\n \"num_unique_values\": 11,\n \"samples\": [\n 17.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"food_cost\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.6517208656978994,\n \"min\": 2.1,\n \"max\": 7.9,\n \"num_unique_values\": 12,\n \"samples\": [\n 2.1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"prep_time_minutes\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3,\n \"min\": 7,\n \"max\": 20,\n \"num_unique_values\": 9,\n \"samples\": [\n 7\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"vegetarian_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"spicy_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 19 } ] }, { "cell_type": "markdown", "source": [ "## Step 4: Generating Synthetic Monthly Sales Data\n", "\n", "Synthetic monthly sales data is created for each menu item across 12 months.\n", "\n", "This provides the business-performance layer required for later analysis. \n", "The generated variables include:\n", "- monthly orders\n", "- revenue\n", "- total food cost\n", "- gross profit\n", "- complaint rate\n", "- repeat order rate" ], "metadata": { "id": "9w4e4QdRwnJ-" } }, { "cell_type": "code", "source": [ "# Generate synthetic monthly sales data for each menu item\n", "months = pd.date_range(\"2025-01-01\", periods=12, freq=\"MS\")\n", "\n", "sales_rows = []\n", "\n", "for _, row in menu_df.iterrows():\n", " base_orders = random.randint(60, 140)\n", "\n", " for month in months:\n", " seasonal_factor = random.uniform(0.9, 1.15)\n", " orders = int(base_orders * seasonal_factor + np.random.normal(0, 8))\n", " orders = max(20, orders)\n", "\n", " revenue = round(orders * row[\"listed_price\"], 2)\n", " total_food_cost = round(orders * row[\"food_cost\"], 2)\n", " gross_profit = round(revenue - total_food_cost, 2)\n", "\n", " complaint_rate = round(random.uniform(0.01, 0.12), 3)\n", " repeat_order_rate = round(random.uniform(0.20, 0.55), 3)\n", "\n", " sales_rows.append([\n", " row[\"restaurant_name\"],\n", " month.strftime(\"%Y-%m\"),\n", " row[\"menu_item\"],\n", " row[\"category\"],\n", " orders,\n", " revenue,\n", " total_food_cost,\n", " gross_profit,\n", " complaint_rate,\n", " repeat_order_rate\n", " ])\n", "\n", "sales_df = pd.DataFrame(\n", " sales_rows,\n", " columns=[\n", " \"restaurant_name\", \"month\", \"menu_item\", \"category\", \"orders\", \"revenue\",\n", " \"total_food_cost\", \"gross_profit\", \"complaint_rate\", \"repeat_order_rate\"\n", " ]\n", ")\n", "\n", "sales_df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 313 }, "id": "o03p9szDwoJQ", "outputId": "476a669a-1872-4605-8249-5773a5cb5d48" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " restaurant_name month menu_item category orders \\\n", "0 Trattoria Bellapasta 2025-01 Spaghetti Pomodoro Tomato Pastas 71 \n", "1 Trattoria Bellapasta 2025-02 Spaghetti Pomodoro Tomato Pastas 79 \n", "2 Trattoria Bellapasta 2025-03 Spaghetti Pomodoro Tomato Pastas 73 \n", "3 Trattoria Bellapasta 2025-04 Spaghetti Pomodoro Tomato Pastas 82 \n", "4 Trattoria Bellapasta 2025-05 Spaghetti Pomodoro Tomato Pastas 68 \n", "\n", " revenue total_food_cost gross_profit complaint_rate repeat_order_rate \n", "0 958.5 276.9 681.6 0.040 0.278 \n", "1 1066.5 308.1 758.4 0.084 0.512 \n", "2 985.5 284.7 700.8 0.056 0.210 \n", "3 1107.0 319.8 787.2 0.066 0.209 \n", "4 918.0 265.2 652.8 0.081 0.391 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
restaurant_namemonthmenu_itemcategoryordersrevenuetotal_food_costgross_profitcomplaint_raterepeat_order_rate
0Trattoria Bellapasta2025-01Spaghetti PomodoroTomato Pastas71958.5276.9681.60.0400.278
1Trattoria Bellapasta2025-02Spaghetti PomodoroTomato Pastas791066.5308.1758.40.0840.512
2Trattoria Bellapasta2025-03Spaghetti PomodoroTomato Pastas73985.5284.7700.80.0560.210
3Trattoria Bellapasta2025-04Spaghetti PomodoroTomato Pastas821107.0319.8787.20.0660.209
4Trattoria Bellapasta2025-05Spaghetti PomodoroTomato Pastas68918.0265.2652.80.0810.391
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "sales_df", "summary": "{\n \"name\": \"sales_df\",\n \"rows\": 144,\n \"fields\": [\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"month\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"2025-11\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"orders\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 26,\n \"min\": 46,\n \"max\": 162,\n \"num_unique_values\": 73,\n \"samples\": [\n 68\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"revenue\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 583.1490021780527,\n \"min\": 391.0,\n \"max\": 2860.0,\n \"num_unique_values\": 120,\n \"samples\": [\n 1428.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_food_cost\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 216.4400761721973,\n \"min\": 96.6,\n \"max\": 1058.2,\n \"num_unique_values\": 121,\n \"samples\": [\n 453.6\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"gross_profit\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 374.4191271786242,\n \"min\": 294.4,\n \"max\": 1895.4,\n \"num_unique_values\": 119,\n \"samples\": [\n 1213.8\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"complaint_rate\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.030614731759388336,\n \"min\": 0.01,\n \"max\": 0.12,\n \"num_unique_values\": 78,\n \"samples\": [\n 0.068\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"repeat_order_rate\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.10995740504682221,\n \"min\": 0.201,\n \"max\": 0.546,\n \"num_unique_values\": 122,\n \"samples\": [\n 0.286\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 20 } ] }, { "cell_type": "markdown", "source": [ "## Step 5: Generating Synthetic Customer Reviews\n", "\n", "Synthetic customer reviews are created at menu-item level to simulate customer feedback.\n", "\n", "Each review includes:\n", "- a numerical rating\n", "- a short review text\n", "- a review date\n", "- the associated menu item\n", "\n", "This creates the qualitative data layer needed for later review analysis." ], "metadata": { "id": "MAaPosUDwrLO" } }, { "cell_type": "code", "source": [ "# Templates for synthetic reviews\n", "positive_templates = [\n", " \"Loved the {item}. Great flavour and very satisfying.\",\n", " \"The {item} was excellent and well balanced.\",\n", " \"Really enjoyed the {item}. Would order again.\",\n", " \"The {item} tasted fresh and was one of the best dishes on the menu.\",\n", " \"Very happy with the {item}. Strong choice overall.\"\n", "]\n", "\n", "neutral_templates = [\n", " \"The {item} was decent but not especially memorable.\",\n", " \"The {item} was okay. Nothing particularly wrong, but nothing exceptional.\",\n", " \"The {item} was fine, although it could have been more flavourful.\",\n", " \"Reasonable experience with the {item}, but there is room for improvement.\"\n", "]\n", "\n", "negative_templates = [\n", " \"The {item} was disappointing and lacked flavour.\",\n", " \"Did not enjoy the {item}. It felt overpriced for the quality.\",\n", " \"The {item} was underwhelming and not worth ordering again.\",\n", " \"The {item} was too heavy and not very satisfying.\",\n", " \"The {item} needs improvement in taste and consistency.\"\n", "]\n", "\n", "review_rows = []\n", "review_id = 1\n", "\n", "for _, row in menu_df.iterrows():\n", " n_reviews = random.randint(12, 20)\n", "\n", " for _ in range(n_reviews):\n", " rating = random.choices(\n", " population=[2, 3, 4, 5],\n", " weights=[0.10, 0.20, 0.40, 0.30],\n", " k=1\n", " )[0]\n", "\n", " if rating >= 4:\n", " review_text = random.choice(positive_templates).format(item=row[\"menu_item\"])\n", " elif rating == 3:\n", " review_text = random.choice(neutral_templates).format(item=row[\"menu_item\"])\n", " else:\n", " review_text = random.choice(negative_templates).format(item=row[\"menu_item\"])\n", "\n", " review_date = datetime(2025, 1, 1) + timedelta(days=random.randint(0, 300))\n", "\n", " review_rows.append([\n", " review_id,\n", " row[\"restaurant_name\"],\n", " review_date.strftime(\"%Y-%m-%d\"),\n", " row[\"menu_item\"],\n", " rating,\n", " review_text\n", " ])\n", " review_id += 1\n", "\n", "reviews_df = pd.DataFrame(\n", " review_rows,\n", " columns=[\n", " \"review_id\", \"restaurant_name\", \"review_date\", \"menu_item\", \"rating\", \"review_text\"\n", " ]\n", ")\n", "\n", "reviews_df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "Vt5wJbkXwwfG", "outputId": "29a83c3d-971a-4ace-9140-a81e014520e6" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " review_id restaurant_name review_date menu_item rating \\\n", "0 1 Trattoria Bellapasta 2025-09-16 Spaghetti Pomodoro 4 \n", "1 2 Trattoria Bellapasta 2025-08-17 Spaghetti Pomodoro 4 \n", "2 3 Trattoria Bellapasta 2025-08-19 Spaghetti Pomodoro 5 \n", "3 4 Trattoria Bellapasta 2025-05-22 Spaghetti Pomodoro 3 \n", "4 5 Trattoria Bellapasta 2025-09-06 Spaghetti Pomodoro 5 \n", "\n", " review_text \n", "0 The Spaghetti Pomodoro tasted fresh and was on... \n", "1 Very happy with the Spaghetti Pomodoro. Strong... \n", "2 The Spaghetti Pomodoro tasted fresh and was on... \n", "3 The Spaghetti Pomodoro was okay. Nothing parti... \n", "4 Very happy with the Spaghetti Pomodoro. Strong... " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
review_idrestaurant_namereview_datemenu_itemratingreview_text
01Trattoria Bellapasta2025-09-16Spaghetti Pomodoro4The Spaghetti Pomodoro tasted fresh and was on...
12Trattoria Bellapasta2025-08-17Spaghetti Pomodoro4Very happy with the Spaghetti Pomodoro. Strong...
23Trattoria Bellapasta2025-08-19Spaghetti Pomodoro5The Spaghetti Pomodoro tasted fresh and was on...
34Trattoria Bellapasta2025-05-22Spaghetti Pomodoro3The Spaghetti Pomodoro was okay. Nothing parti...
45Trattoria Bellapasta2025-09-06Spaghetti Pomodoro5Very happy with the Spaghetti Pomodoro. Strong...
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "reviews_df", "summary": "{\n \"name\": \"reviews_df\",\n \"rows\": 200,\n \"fields\": [\n {\n \"column\": \"review_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 57,\n \"min\": 1,\n \"max\": 200,\n \"num_unique_values\": 200,\n \"samples\": [\n 96,\n 16,\n 31\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_date\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 145,\n \"samples\": [\n \"2025-03-19\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 2,\n \"max\": 5,\n \"num_unique_values\": 4,\n \"samples\": [\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_text\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 97,\n \"samples\": [\n \"The Seafood Linguine was decent but not especially memorable.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 21 } ] }, { "cell_type": "markdown", "source": [ "## Step 6: Annotating the Reviews\n", "\n", "To make the review dataset more useful for later analysis, each review is annotated with:\n", "\n", "- `sentiment_label`\n", "- `theme`\n", "- `complaint_flag`\n", "\n", "This makes the dataset closer to the structured review annotation approach used in workshop exercises." ], "metadata": { "id": "1JDqpCL0w2HR" } }, { "cell_type": "code", "source": [ "# Annotate synthetic reviews\n", "themes_positive = [\"Taste\", \"Freshness\", \"Portion Size\", \"Value\"]\n", "themes_neutral = [\"Taste\", \"Texture\", \"Portion Size\", \"Value\"]\n", "themes_negative = [\"Taste\", \"Value\", \"Texture\", \"Waiting Time\"]\n", "\n", "def assign_sentiment(rating):\n", " if rating >= 4:\n", " return \"Positive\"\n", " elif rating == 3:\n", " return \"Neutral\"\n", " return \"Negative\"\n", "\n", "def assign_theme(rating):\n", " if rating >= 4:\n", " return random.choice(themes_positive)\n", " elif rating == 3:\n", " return random.choice(themes_neutral)\n", " return random.choice(themes_negative)\n", "\n", "reviews_df[\"sentiment_label\"] = reviews_df[\"rating\"].apply(assign_sentiment)\n", "reviews_df[\"theme\"] = reviews_df[\"rating\"].apply(assign_theme)\n", "reviews_df[\"complaint_flag\"] = reviews_df[\"rating\"].apply(lambda x: 1 if x <= 2 else 0)\n", "\n", "reviews_df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 521 }, "id": "84kPk1yCw2-v", "outputId": "29918bc9-d83b-44b1-947c-fe8dd0bf0e61" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " review_id restaurant_name review_date menu_item rating \\\n", "0 1 Trattoria Bellapasta 2025-09-16 Spaghetti Pomodoro 4 \n", "1 2 Trattoria Bellapasta 2025-08-17 Spaghetti Pomodoro 4 \n", "2 3 Trattoria Bellapasta 2025-08-19 Spaghetti Pomodoro 5 \n", "3 4 Trattoria Bellapasta 2025-05-22 Spaghetti Pomodoro 3 \n", "4 5 Trattoria Bellapasta 2025-09-06 Spaghetti Pomodoro 5 \n", "\n", " review_text sentiment_label \\\n", "0 The Spaghetti Pomodoro tasted fresh and was on... Positive \n", "1 Very happy with the Spaghetti Pomodoro. Strong... Positive \n", "2 The Spaghetti Pomodoro tasted fresh and was on... Positive \n", "3 The Spaghetti Pomodoro was okay. Nothing parti... Neutral \n", "4 Very happy with the Spaghetti Pomodoro. Strong... Positive \n", "\n", " theme complaint_flag \n", "0 Taste 0 \n", "1 Taste 0 \n", "2 Portion Size 0 \n", "3 Value 0 \n", "4 Taste 0 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
review_idrestaurant_namereview_datemenu_itemratingreview_textsentiment_labelthemecomplaint_flag
01Trattoria Bellapasta2025-09-16Spaghetti Pomodoro4The Spaghetti Pomodoro tasted fresh and was on...PositiveTaste0
12Trattoria Bellapasta2025-08-17Spaghetti Pomodoro4Very happy with the Spaghetti Pomodoro. Strong...PositiveTaste0
23Trattoria Bellapasta2025-08-19Spaghetti Pomodoro5The Spaghetti Pomodoro tasted fresh and was on...PositivePortion Size0
34Trattoria Bellapasta2025-05-22Spaghetti Pomodoro3The Spaghetti Pomodoro was okay. Nothing parti...NeutralValue0
45Trattoria Bellapasta2025-09-06Spaghetti Pomodoro5Very happy with the Spaghetti Pomodoro. Strong...PositiveTaste0
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "reviews_df", "summary": "{\n \"name\": \"reviews_df\",\n \"rows\": 200,\n \"fields\": [\n {\n \"column\": \"review_id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 57,\n \"min\": 1,\n \"max\": 200,\n \"num_unique_values\": 200,\n \"samples\": [\n 96,\n 16,\n 31\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_date\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 145,\n \"samples\": [\n \"2025-03-19\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 2,\n \"max\": 5,\n \"num_unique_values\": 4,\n \"samples\": [\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_text\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 97,\n \"samples\": [\n \"The Seafood Linguine was decent but not especially memorable.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"sentiment_label\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Positive\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"theme\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Taste\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"complaint_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 22 } ] }, { "cell_type": "markdown", "source": [ "## Step 7: Creating a Menu Item Summary Table\n", "\n", "A summary table is created to provide a compact item-level overview before moving to the analysis notebook.\n", "\n", "This table combines:\n", "- menu information\n", "- average rating\n", "- review count\n", "- total orders\n", "- total revenue\n", "- total gross profit\n", "- profit margin percentage" ], "metadata": { "id": "mCz-xB9tw7wL" } }, { "cell_type": "code", "source": [ "# Create review summary\n", "review_summary = reviews_df.groupby(\"menu_item\").agg(\n", " avg_rating=(\"rating\", \"mean\"),\n", " review_count=(\"review_id\", \"count\")\n", ").reset_index()\n", "\n", "# Create sales summary\n", "sales_summary = sales_df.groupby(\"menu_item\").agg(\n", " total_orders=(\"orders\", \"sum\"),\n", " total_revenue=(\"revenue\", \"sum\"),\n", " total_gross_profit=(\"gross_profit\", \"sum\")\n", ").reset_index()\n", "\n", "# Merge into one summary table\n", "menu_item_summary_df = menu_df.merge(review_summary, on=\"menu_item\", how=\"left\")\n", "menu_item_summary_df = menu_item_summary_df.merge(sales_summary, on=\"menu_item\", how=\"left\")\n", "\n", "# Add profit margin percentage\n", "menu_item_summary_df[\"avg_rating\"] = menu_item_summary_df[\"avg_rating\"].round(2)\n", "menu_item_summary_df[\"profit_margin_pct\"] = (\n", " (menu_item_summary_df[\"listed_price\"] - menu_item_summary_df[\"food_cost\"]) /\n", " menu_item_summary_df[\"listed_price\"] * 100\n", ").round(2)\n", "\n", "menu_item_summary_df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 313 }, "id": "GuDK3-5Bw9Zs", "outputId": "9effa7dc-ede5-4bb4-868e-f7f414b09820" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " restaurant_name menu_item category listed_price \\\n", "0 Trattoria Bellapasta Spaghetti Pomodoro Tomato Pastas 13.5 \n", "1 Trattoria Bellapasta Penne Arrabbiata Tomato Pastas 14.5 \n", "2 Trattoria Bellapasta Pappardelle al Sugo Tomato Pastas 15.5 \n", "3 Trattoria Bellapasta Fettuccine Alfredo Cream Pastas 16.0 \n", "4 Trattoria Bellapasta Tagliatelle ai Funghi Cream Pastas 17.0 \n", "\n", " food_cost prep_time_minutes vegetarian_flag spicy_flag avg_rating \\\n", "0 3.9 12 1 0 3.85 \n", "1 4.2 13 1 1 4.17 \n", "2 4.6 13 0 0 4.19 \n", "3 5.1 14 1 0 3.95 \n", "4 5.4 15 1 0 3.77 \n", "\n", " review_count total_orders total_revenue total_gross_profit \\\n", "0 20 899 12136.5 8630.4 \n", "1 18 1696 24592.0 17468.8 \n", "2 16 1071 16600.5 11673.9 \n", "3 19 1109 17744.0 12088.1 \n", "4 13 1025 17425.0 11890.0 \n", "\n", " profit_margin_pct \n", "0 71.11 \n", "1 71.03 \n", "2 70.32 \n", "3 68.12 \n", "4 68.24 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
restaurant_namemenu_itemcategorylisted_pricefood_costprep_time_minutesvegetarian_flagspicy_flagavg_ratingreview_counttotal_orderstotal_revenuetotal_gross_profitprofit_margin_pct
0Trattoria BellapastaSpaghetti PomodoroTomato Pastas13.53.912103.852089912136.58630.471.11
1Trattoria BellapastaPenne ArrabbiataTomato Pastas14.54.213114.1718169624592.017468.871.03
2Trattoria BellapastaPappardelle al SugoTomato Pastas15.54.613004.1916107116600.511673.970.32
3Trattoria BellapastaFettuccine AlfredoCream Pastas16.05.114103.9519110917744.012088.168.12
4Trattoria BellapastaTagliatelle ai FunghiCream Pastas17.05.415103.7713102517425.011890.068.24
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "menu_item_summary_df", "summary": "{\n \"name\": \"menu_item_summary_df\",\n \"rows\": 12,\n \"fields\": [\n {\n \"column\": \"restaurant_name\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"Trattoria Bellapasta\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"menu_item\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 12,\n \"samples\": [\n \"Bruschetta Classica\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"Cream Pastas\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"listed_price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.6639108100829123,\n \"min\": 8.5,\n \"max\": 21.0,\n \"num_unique_values\": 11,\n \"samples\": [\n 17.5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"food_cost\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1.6517208656978994,\n \"min\": 2.1,\n \"max\": 7.9,\n \"num_unique_values\": 12,\n \"samples\": [\n 2.1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"prep_time_minutes\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3,\n \"min\": 7,\n \"max\": 20,\n \"num_unique_values\": 9,\n \"samples\": [\n 7\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"vegetarian_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"spicy_flag\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0,\n \"min\": 0,\n \"max\": 1,\n \"num_unique_values\": 2,\n \"samples\": [\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"avg_rating\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.21020913107636116,\n \"min\": 3.77,\n \"max\": 4.42,\n \"num_unique_values\": 11,\n \"samples\": [\n 3.83\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"review_count\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3,\n \"min\": 12,\n \"max\": 20,\n \"num_unique_values\": 6,\n \"samples\": [\n 20\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_orders\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 302,\n \"min\": 728,\n \"max\": 1696,\n \"num_unique_values\": 12,\n \"samples\": [\n 728\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_revenue\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6983.92005056811,\n \"min\": 6188.0,\n \"max\": 29700.0,\n \"num_unique_values\": 12,\n \"samples\": [\n 6188.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"total_gross_profit\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4466.736931423817,\n \"min\": 4659.2,\n \"max\": 18907.2,\n \"num_unique_values\": 12,\n \"samples\": [\n 4659.2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"profit_margin_pct\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 3.593292994657484,\n \"min\": 62.38,\n \"max\": 75.29,\n \"num_unique_values\": 12,\n \"samples\": [\n 75.29\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 23 } ] }, { "cell_type": "markdown", "source": [ "## Step 8: Exporting the Final Files\n", "\n", "The final step is to export the cleaned and enriched datasets that will be used in Notebook 2 for analysis.\n", "\n", "The exported files are:\n", "- `real_world_menu_benchmark.csv`\n", "- `menu_items.csv`\n", "- `monthly_sales.csv`\n", "- `reviews.csv`\n", "- `menu_item_summary.csv`" ], "metadata": { "id": "51LCiy1f0Guy" } }, { "cell_type": "code", "source": [ "# Export all project files for Notebook 2\n", "menu_df.to_csv(\"menu_items.csv\", index=False)\n", "sales_df.to_csv(\"monthly_sales.csv\", index=False)\n", "reviews_df.to_csv(\"reviews.csv\", index=False)\n", "menu_item_summary_df.to_csv(\"menu_item_summary.csv\", index=False)\n", "\n", "print(\"Files exported successfully.\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "A5561w4-0Gea", "outputId": "814e93ea-5918-4445-fab7-795cf505a510" }, "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Files exported successfully.\n" ] } ] } ] }