diff --git "a/code/eda_pandas.html" "b/code/eda_pandas.html" new file mode 100644--- /dev/null +++ "b/code/eda_pandas.html" @@ -0,0 +1,12300 @@ +Profiling Report

Overview

Dataset statistics

Number of variables5
Number of observations100
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.0 KiB
Average record size in memory41.3 B

Variable types

Numeric5

Alerts

a has unique valuesUnique
b has unique valuesUnique
c has unique valuesUnique
d has unique valuesUnique
e has unique valuesUnique

Reproduction

Analysis started2023-11-09 22:44:51.152491
Analysis finished2023-11-09 22:44:52.582947
Duration1.43 second
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

a
Real number (ℝ)

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.49957568
Minimum0.024516893
Maximum0.99068183
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size928.0 B
2023-11-09T15:44:52.619548image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.024516893
5-th percentile0.056429077
Q10.28251422
median0.45642179
Q30.74517201
95-th percentile0.96516627
Maximum0.99068183
Range0.96616494
Interquartile range (IQR)0.46265778

Descriptive statistics

Standard deviation0.29892558
Coefficient of variation (CV)0.59835896
Kurtosis-1.2342932
Mean0.49957568
Median Absolute Deviation (MAD)0.24013465
Skewness0.12337419
Sum49.957568
Variance0.089356503
MonotonicityNot monotonic
2023-11-09T15:44:52.676854image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.692727662 1
 
1.0%
0.9588277215 1
 
1.0%
0.8575828919 1
 
1.0%
0.8755208091 1
 
1.0%
0.9301186847 1
 
1.0%
0.1113302786 1
 
1.0%
0.9431724377 1
 
1.0%
0.4932534482 1
 
1.0%
0.4855559533 1
 
1.0%
0.2742635884 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
0.02451689344 1
1.0%
0.03678303327 1
1.0%
0.0529701128 1
1.0%
0.05603817851 1
1.0%
0.05635858511 1
1.0%
0.05643278679 1
1.0%
0.05938700409 1
1.0%
0.0799270728 1
1.0%
0.08669468261 1
1.0%
0.09159729141 1
1.0%
ValueCountFrequency (%)
0.9906818285 1
1.0%
0.9796626472 1
1.0%
0.9713465976 1
1.0%
0.9699754527 1
1.0%
0.9653541198 1
1.0%
0.9651563784 1
1.0%
0.9636081111 1
1.0%
0.9588277215 1
1.0%
0.9431724377 1
1.0%
0.9382258124 1
1.0%

b
Real number (ℝ)

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.505266
Minimum0.0062097007
Maximum0.99417296
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size928.0 B
2023-11-09T15:44:52.734778image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.0062097007
5-th percentile0.091460039
Q10.27123906
median0.51447781
Q30.69061454
95-th percentile0.94536935
Maximum0.99417296
Range0.98796326
Interquartile range (IQR)0.41937548

Descriptive statistics

Standard deviation0.27211408
Coefficient of variation (CV)0.53855608
Kurtosis-1.0654893
Mean0.505266
Median Absolute Deviation (MAD)0.20940444
Skewness0.025068695
Sum50.5266
Variance0.07404607
MonotonicityNot monotonic
2023-11-09T15:44:52.792052image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.128987448 1
 
1.0%
0.6507482223 1
 
1.0%
0.2227952323 1
 
1.0%
0.09167554304 1
 
1.0%
0.1115540147 1
 
1.0%
0.9941729637 1
 
1.0%
0.3476079507 1
 
1.0%
0.5700943864 1
 
1.0%
0.6508226272 1
 
1.0%
0.1839457447 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
0.006209700668 1
1.0%
0.0257783899 1
1.0%
0.05447883258 1
1.0%
0.07700319387 1
1.0%
0.0873654595 1
1.0%
0.09167554304 1
1.0%
0.09506882661 1
1.0%
0.1115540147 1
1.0%
0.1148774826 1
1.0%
0.1280873131 1
1.0%
ValueCountFrequency (%)
0.9941729637 1
1.0%
0.9817906877 1
1.0%
0.9780868341 1
1.0%
0.958209005 1
1.0%
0.9515261241 1
1.0%
0.9450453103 1
1.0%
0.9430485729 1
1.0%
0.9076302735 1
1.0%
0.9025179627 1
1.0%
0.8996143151 1
1.0%

c
Real number (ℝ)

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.49043766
Minimum0.032613158
Maximum0.95219118
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size928.0 B
2023-11-09T15:44:52.850727image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.032613158
5-th percentile0.088030713
Q10.22815811
median0.4966044
Q30.70855991
95-th percentile0.91987776
Maximum0.95219118
Range0.91957802
Interquartile range (IQR)0.48040179

Descriptive statistics

Standard deviation0.27909369
Coefficient of variation (CV)0.56907069
Kurtosis-1.3246281
Mean0.49043766
Median Absolute Deviation (MAD)0.24133101
Skewness0.066394797
Sum49.043766
Variance0.07789329
MonotonicityNot monotonic
2023-11-09T15:44:52.907499image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.8080291118 1
 
1.0%
0.6055218878 1
 
1.0%
0.247604652 1
 
1.0%
0.1889918353 1
 
1.0%
0.5539613155 1
 
1.0%
0.2029389504 1
 
1.0%
0.6742758564 1
 
1.0%
0.9009345819 1
 
1.0%
0.08829236624 1
 
1.0%
0.5794832799 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
0.03261315782 1
1.0%
0.06668776888 1
1.0%
0.06959057161 1
1.0%
0.08255066855 1
1.0%
0.08305930832 1
1.0%
0.08829236624 1
1.0%
0.09468349966 1
1.0%
0.1132428092 1
1.0%
0.1304513868 1
1.0%
0.135235747 1
1.0%
ValueCountFrequency (%)
0.9521911795 1
1.0%
0.9478424968 1
1.0%
0.9351804261 1
1.0%
0.9329653628 1
1.0%
0.9235117979 1
1.0%
0.9196864911 1
1.0%
0.9181320545 1
1.0%
0.9153990512 1
1.0%
0.9057096456 1
1.0%
0.903777777 1
1.0%

d
Real number (ℝ)

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.50789841
Minimum0.015718867
Maximum0.97805547
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size928.0 B
2023-11-09T15:44:52.971720image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.015718867
5-th percentile0.061207831
Q10.21544568
median0.52796698
Q30.75222717
95-th percentile0.96436383
Maximum0.97805547
Range0.9623366
Interquartile range (IQR)0.5367815

Descriptive statistics

Standard deviation0.29987035
Coefficient of variation (CV)0.59041403
Kurtosis-1.2775209
Mean0.50789841
Median Absolute Deviation (MAD)0.28207039
Skewness-0.058003053
Sum50.789841
Variance0.089922227
MonotonicityNot monotonic
2023-11-09T15:44:53.032713image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2555656275 1
 
1.0%
0.9584065195 1
 
1.0%
0.2367542838 1
 
1.0%
0.8578074436 1
 
1.0%
0.2030326253 1
 
1.0%
0.103727807 1
 
1.0%
0.6113663366 1
 
1.0%
0.684923487 1
 
1.0%
0.4159343424 1
 
1.0%
0.5241488975 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
0.01571886665 1
1.0%
0.02034576539 1
1.0%
0.02159492807 1
1.0%
0.03096533792 1
1.0%
0.05520795333 1
1.0%
0.06152361388 1
1.0%
0.07065053284 1
1.0%
0.08738141679 1
1.0%
0.09585290149 1
1.0%
0.103727807 1
1.0%
ValueCountFrequency (%)
0.9780554701 1
1.0%
0.9768563315 1
1.0%
0.9768076143 1
1.0%
0.9765575299 1
1.0%
0.9684752556 1
1.0%
0.9641474393 1
1.0%
0.9587455368 1
1.0%
0.9584065195 1
1.0%
0.9361862209 1
1.0%
0.915946169 1
1.0%

e
Real number (ℝ)

Distinct100
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.51927403
Minimum0.0075182289
Maximum0.99588817
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size928.0 B
2023-11-09T15:44:53.087773image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0.0075182289
5-th percentile0.059937707
Q10.31246052
median0.51282968
Q30.75658275
95-th percentile0.9462708
Maximum0.99588817
Range0.98836994
Interquartile range (IQR)0.44412223

Descriptive statistics

Standard deviation0.27661502
Coefficient of variation (CV)0.53269566
Kurtosis-1.0870807
Mean0.51927403
Median Absolute Deviation (MAD)0.21710688
Skewness-0.0092921597
Sum51.927403
Variance0.07651587
MonotonicityNot monotonic
2023-11-09T15:44:53.213202image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3871996414 1
 
1.0%
0.5432698786 1
 
1.0%
0.664230381 1
 
1.0%
0.339504853 1
 
1.0%
0.9548257011 1
 
1.0%
0.0598363319 1
 
1.0%
0.5058581683 1
 
1.0%
0.3713983015 1
 
1.0%
0.184923381 1
 
1.0%
0.4081663892 1
 
1.0%
Other values (90) 90
90.0%
ValueCountFrequency (%)
0.007518228888 1
1.0%
0.02395584274 1
1.0%
0.02606281113 1
1.0%
0.05097207848 1
1.0%
0.0598363319 1
1.0%
0.05994304291 1
1.0%
0.1134238708 1
1.0%
0.1171189808 1
1.0%
0.1174560911 1
1.0%
0.1280367266 1
1.0%
ValueCountFrequency (%)
0.9958881697 1
1.0%
0.9898059367 1
1.0%
0.9574475418 1
1.0%
0.9548257011 1
1.0%
0.9523822617 1
1.0%
0.9459491449 1
1.0%
0.932505397 1
1.0%
0.9187539656 1
1.0%
0.9106651935 1
1.0%
0.9096051544 1
1.0%

Interactions

2023-11-09T15:44:52.248419image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.214411image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.627632image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.834604image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.042019image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.292706image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.308555image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.667044image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.875624image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.087986image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.378561image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.369619image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.708496image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.915596image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.128631image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.417048image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.496078image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.751731image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.956623image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.168815image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.459449image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.588198image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:51.793185image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.001619image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2023-11-09T15:44:52.208422image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2023-11-09T15:44:53.263775image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
abcde
a1.000-0.190-0.0770.038-0.007
b-0.1901.000-0.0080.028-0.115
c-0.077-0.0081.000-0.023-0.022
d0.0380.028-0.0231.000-0.012
e-0.007-0.115-0.022-0.0121.000

Missing values

2023-11-09T15:44:52.511636image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-09T15:44:52.555979image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

abcde
00.6927280.1289870.8080290.2555660.387200
10.5856740.3556090.1767890.7550230.836276
20.0915970.8802600.2044240.4499320.932505
30.5738520.8520870.2683190.6034090.625983
40.3109150.8996140.5972810.4777270.864602
50.9372700.2098430.1508580.8646840.910665
60.3486020.3446200.5088420.9684750.568849
70.3178740.6274130.0695910.0552080.894718
80.3306870.5655350.3107260.9768080.995888
90.1540880.6762630.2507550.9780550.117119
abcde
900.3590400.6758860.8144230.9768560.320967
910.7348830.4201270.7678620.6377590.795147
920.3273850.0950690.4544020.1607210.784028
930.4157170.7046540.8353780.4509700.846659
940.4179970.8521870.9037780.6614580.820408
950.3495710.0544790.8462830.4814680.638080
960.6341860.3063110.4299970.0203460.918754
970.1015000.6446020.7870990.3787770.945949
980.9713470.5408910.7090300.6599040.665010
990.8671690.9076300.6097680.5367290.128037
\ No newline at end of file