Pandas Profiling Report

Dataset statistics

Number of variables	5
Number of observations	210
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	0
Duplicate rows (%)	0.0%
Total size in memory	8.3 KiB
Average record size in memory	40.6 B

Variable types

Numeric	1
Categorical	4

Alerts

`AGE` is highly correlated with `AMPUTATION`	High correlation
`AMPUTATION` is highly correlated with `AGE`	High correlation
`AMPUTATION` is uniformly distributed	Uniform

Reproduction

Analysis started	2021-11-16 20:48:41.142486
Analysis finished	2021-11-16 20:48:42.048926
Duration	0.91 seconds
Software version	pandas-profiling v3.1.0
Download configuration	config.json

AGE
Real number (ℝ_≥0)

HIGH CORRELATION

Distinct	71
Distinct (%)	33.8%
Missing	0
Missing (%)	0.0%
Infinite	0
Infinite (%)	0.0%
Mean	55.0952381

Minimum	4
Maximum	89
Zeros	0
Zeros (%)	0.0%
Negative	0
Negative (%)	0.0%
Memory size	1.8 KiB

Quantile statistics

Minimum	4
5-th percentile	16.9
Q1	47.25
median	59
Q3	68
95-th percentile	80
Maximum	89
Range	85
Interquartile range (IQR)	20.75

Descriptive statistics

Standard deviation	18.58024047
Coefficient of variation (CV)	0.3372385911
Kurtosis	0.4244667576
Mean	55.0952381
Median Absolute Deviation (MAD)	10
Skewness	-0.8567484108
Sum	11570
Variance	345.2253361
Monotonicity	Not monotonic

Histogram with fixed size bins (bins=50)

Value	Count	Frequency (%)
61	10	4.8%
62	9	4.3%
69	8	3.8%
60	8	3.8%
52	7	3.3%
54	7	3.3%
56	7	3.3%
73	6	2.9%
50	6	2.9%
65	6	2.9%
Other values (61)	136	64.8%

Minimum 5 values
Maximum 5 values

Value	Count	Frequency (%)
4	2	1.0%
5	2	1.0%
7	1	0.5%
8	1	0.5%
9	1	0.5%
11	1	0.5%
12	1	0.5%
15	1	0.5%
16	1	0.5%
18	2	1.0%

Value	Count	Frequency (%)
89	1	0.5%
88	2	1.0%
85	1	0.5%
84	1	0.5%
83	1	0.5%
81	1	0.5%
80	5	2.4%
79	2	1.0%
78	2	1.0%
77	3	1.4%

GENDER
Categorical

Distinct	2
Distinct (%)	1.0%
Missing	0
Missing (%)	0.0%
Memory size	1.8 KiB

F	112
M	98

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Characters and Unicode

Total characters	0
Distinct characters	0
Distinct categories	0 ?
Distinct scripts	0 ?
Distinct blocks	0 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	M
2nd row	M
3rd row	F
4th row	F
5th row	F

Common Values

Value	Count	Frequency (%)
F	112	53.3%
M	98	46.7%

Length

Histogram of lengths of the category

Pie chart

Value	Count	Frequency (%)
f	112	53.3%
m	98	46.7%

Most occurring characters

Value	Count	Frequency (%)
No values found.

Most occurring categories

Value	Count	Frequency (%)
No values found.

Most frequent character per category

Most occurring scripts

Value	Count	Frequency (%)
No values found.

Most frequent character per script

Most occurring blocks

Value	Count	Frequency (%)
No values found.

Most frequent character per block

RACE
Categorical

Distinct	5
Distinct (%)	2.4%
Missing	0
Missing (%)	0.0%
Memory size	1.8 KiB

Asian	89
Black	57
White	29
Coloured	25
Other	10

Length

Max length	8
Median length	5
Mean length	5.628571429
Min length	5

Characters and Unicode

Total characters	0
Distinct characters	0
Distinct categories	0 ?
Distinct scripts	0 ?
Distinct blocks	0 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	Black
2nd row	Black
3rd row	Asian
4th row	Black
5th row	White

Common Values

Value	Count	Frequency (%)
Asian	89	42.4%
Black	57	27.1%
White	29	13.8%
Coloured	25	11.9%
Other	10	4.8%

Length

Histogram of lengths of the category

Pie chart

Value	Count	Frequency (%)
asian	89	42.4%
black	57	27.1%
white	29	13.8%
coloured	25	11.9%
other	10	4.8%

Most occurring characters

Value	Count	Frequency (%)
No values found.

Most occurring categories

Value	Count	Frequency (%)
No values found.

Most frequent character per category

Most occurring scripts

Value	Count	Frequency (%)
No values found.

Most frequent character per script

Most occurring blocks

Value	Count	Frequency (%)
No values found.

Most frequent character per block

DIABETES_CLASS
Categorical

Distinct	2
Distinct (%)	1.0%
Missing	0
Missing (%)	0.0%
Memory size	1.8 KiB

Type 2 diabetes	135
Type 1 diabetes	75

Length

Max length	15
Median length	15
Mean length	15
Min length	15

Characters and Unicode

Total characters	0
Distinct characters	0
Distinct categories	0 ?
Distinct scripts	0 ?
Distinct blocks	0 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	Type 2 diabetes
2nd row	Type 2 diabetes
3rd row	Type 2 diabetes
4th row	Type 2 diabetes
5th row	Type 2 diabetes

Common Values

Value	Count	Frequency (%)
Type 2 diabetes	135	64.3%
Type 1 diabetes	75	35.7%

Length

Histogram of lengths of the category

Pie chart

Value	Count	Frequency (%)
diabetes	210	33.3%
type	210	33.3%
2	135	21.4%
1	75	11.9%

Most occurring characters

Value	Count	Frequency (%)
No values found.

Most occurring categories

Value	Count	Frequency (%)
No values found.

Most frequent character per category

Most occurring scripts

Value	Count	Frequency (%)
No values found.

Most frequent character per script

Most occurring blocks

Value	Count	Frequency (%)
No values found.

Most frequent character per block

AMPUTATION
Categorical

HIGH CORRELATION
UNIFORM

Distinct	2
Distinct (%)	1.0%
Missing	0
Missing (%)	0.0%
Memory size	1.8 KiB

0	105
1	105

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Characters and Unicode

Total characters	0
Distinct characters	0
Distinct categories	0 ?
Distinct scripts	0 ?
Distinct blocks	0 ?

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique	0 ?
Unique (%)	0.0%

Sample

1st row	1
2nd row	1
3rd row	1
4th row	1
5th row	1

Common Values

Value	Count	Frequency (%)
0	105	50.0%
1	105	50.0%

Length

Histogram of lengths of the category

Pie chart

Value	Count	Frequency (%)
1	105	50.0%
0	105	50.0%

Most occurring characters

Value	Count	Frequency (%)
No values found.

Most occurring categories

Value	Count	Frequency (%)
No values found.

Most frequent character per category

Most occurring scripts

Value	Count	Frequency (%)
No values found.

Most frequent character per script

Most occurring blocks

Value	Count	Frequency (%)
No values found.

Most frequent character per block

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Count
Matrix

A simple visualization of nullity by column.

Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

First rows

	AGE	GENDER	RACE	DIABETES_CLASS	AMPUTATION
0	50	M	Black	Type 2 diabetes	1
1	47	M	Black	Type 2 diabetes	1
2	76	F	Asian	Type 2 diabetes	1
3	57	F	Black	Type 2 diabetes	1
4	67	F	White	Type 2 diabetes	1
5	56	F	White	Type 2 diabetes	1
6	66	F	Asian	Type 2 diabetes	1
7	62	F	Coloured	Type 1 diabetes	1
8	65	F	Black	Type 2 diabetes	1
9	80	F	Asian	Type 1 diabetes	1

Last rows

	AGE	GENDER	RACE	DIABETES_CLASS
200	60	M	Coloured	Type 2 diabetes
201	69	M	White	Type 2 diabetes
202	73	F	Other	Type 2 diabetes
203	59	F	Asian	Type 2 diabetes
204	75	F	Asian	Type 2 diabetes
205	48	F	Coloured	Type 1 diabetes
206	50	M	Coloured	Type 2 diabetes
207	19	F	White	Type 1 diabetes
208	88	F	Black	Type 2 diabetes
209	65	F	Other	Type 2 diabetes

Overview

Variables

Common Values

Length

Pie chart

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Common Values

Length

Pie chart

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Common Values

Length

Pie chart

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Common Values

Length

Pie chart

Most occurring characters

Most occurring categories

Most frequent character per category

Most occurring scripts

Most frequent character per script

Most occurring blocks

Most frequent character per block

Interactions

Correlations

Spearman's ρ

Pearson's r

Kendall's τ

Cramér's V (φc)

Phik (φk)

Missing values

Sample

First rows

Last rows