{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Assigment 1: EDA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Intro\n", "The Global Findex database is the most comprehensive dataset on adult financial behaviors worldwide, capturing insights into how individuals save, borrow, make payments, and manage financial risks. Initiated by the World Bank in 2011, the dataset is based on nationally representative surveys of over 150,000 adults across more than 140 economies. The 2021 edition provides updated indicators on the use of both formal and informal financial services.\n", "\n", "For this analysis, we will conduct an Exploratory Data Analysis (EDA) to uncover key patterns and trends in the financial behaviors of individuals globally across regions and genders. As this analysis is undertaken by a group of finance students with a strong interest in banking and personal finance, our focus will be on examining how different demographics access and use financial services across various economies." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Data Cleaning and Manipulation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Importning libraries to be used\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from scipy.stats import chi2_contingency\n", "\n", "\n", "# Initially, the data couldn’t be read with UTF-8 due to special characters. Using ChatGPT we found that latin-1 supports these characters, allowing for proper file decoding.\n", "# data = pd.read_csv('https://github.com/aaubs/ds-master/raw/main/data/assignments_datasets/FINDEX/WLD_2021_FINDEX_v03_M_csv.zip')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Reading the data using the raw URL on github\n", "data = pd.read_csv('https://github.com/aaubs/ds-master/raw/main/data/assignments_datasets/FINDEX/WLD_2021_FINDEX_v03_M_csv.zip', encoding='latin-1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Initial Data Structure Overview: Head, Info, Shape, Index, and Column Names" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | economy | \n", "economycode | \n", "regionwb | \n", "pop_adult | \n", "wpid_random | \n", "wgt | \n", "female | \n", "age | \n", "educ | \n", "inc_q | \n", "... | \n", "receive_transfers | \n", "receive_pension | \n", "receive_agriculture | \n", "pay_utilities | \n", "remittances | \n", "mobileowner | \n", "internetaccess | \n", "anydigpayment | \n", "merchantpay_dig | \n", "year | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "Afghanistan | \n", "AFG | \n", "South Asia | \n", "22647496.0 | \n", "144274031 | \n", "0.716416 | \n", "2 | \n", "43.0 | \n", "2 | \n", "4 | \n", "... | \n", "4 | \n", "4 | \n", "4.0 | \n", "1 | \n", "5.0 | \n", "1 | \n", "2 | \n", "1 | \n", "0.0 | \n", "2021 | \n", "
| 1 | \n", "Afghanistan | \n", "AFG | \n", "South Asia | \n", "22647496.0 | \n", "180724554 | \n", "0.497408 | \n", "2 | \n", "55.0 | \n", "1 | \n", "3 | \n", "... | \n", "4 | \n", "4 | \n", "2.0 | \n", "4 | \n", "5.0 | \n", "1 | \n", "2 | \n", "0 | \n", "0.0 | \n", "2021 | \n", "
| 2 | \n", "Afghanistan | \n", "AFG | \n", "South Asia | \n", "22647496.0 | \n", "130686682 | \n", "0.650431 | \n", "1 | \n", "15.0 | \n", "1 | \n", "2 | \n", "... | \n", "4 | \n", "4 | \n", "4.0 | \n", "4 | \n", "3.0 | \n", "2 | \n", "2 | \n", "0 | \n", "0.0 | \n", "2021 | \n", "
| 3 | \n", "Afghanistan | \n", "AFG | \n", "South Asia | \n", "22647496.0 | \n", "142646649 | \n", "0.991862 | \n", "2 | \n", "23.0 | \n", "1 | \n", "4 | \n", "... | \n", "4 | \n", "4 | \n", "2.0 | \n", "4 | \n", "5.0 | \n", "1 | \n", "2 | \n", "0 | \n", "0.0 | \n", "2021 | \n", "
| 4 | \n", "Afghanistan | \n", "AFG | \n", "South Asia | \n", "22647496.0 | \n", "199055310 | \n", "0.554940 | \n", "1 | \n", "46.0 | \n", "1 | \n", "1 | \n", "... | \n", "4 | \n", "4 | \n", "4.0 | \n", "4 | \n", "5.0 | \n", "2 | \n", "2 | \n", "0 | \n", "0.0 | \n", "2021 | \n", "
5 rows × 128 columns
\n", "