Spaces:
Sleeping
Sleeping
| import streamlit as st | |
| st.title(":blue[DESCRIPTIVE STATISTICS]") | |
| st.caption("***Elevate Your Insights: The Allure of Descriptive Statistics***") | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/U57NME2YT_De7TsnyFPht.png",width=400) | |
| multi = '''Descriptive Statistics is a branch of statistics which summarizes or describes the entire collected data. | |
| It focuses on presenting data in a meaningful way through measures and visualizations. | |
| Descriptive statistics are commonly used in many fields including business, economics to summarize large amounts of data and provide a clear and concise understanding of its key characteristics | |
| ''' | |
| st.markdown(multi) | |
| st.markdown(''' | |
| Descriptive statistics are divided into 3 types based on measures: | |
| :violet[1. Measures of Central Tendency] | |
| :violet[2. Measures of Variability (Dispersion)] | |
| :violet[3. Distribution] ''') | |
| st.markdown(multi) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/3yOxh1hYH8zxksyrYo5R7.png") | |
| st.header("1.Measures of Central Tendency",divider="red") | |
| st.markdown('''It is used for measuring central or average value of the data. | |
| This measure help to summarize a large data set by providing a single value that represents the "middle" or "central" point of the data distribution''') | |
| multi='''There are mainly 3 types of Measures of central tendency:''' | |
| st.markdown(multi) | |
| multi=''':red[1. Mean]''' | |
| st.markdown(multi) | |
| multi=''':red[2. Median]''' | |
| st.markdown(multi) | |
| multi=''':red[3. Mode]''' | |
| st.markdown(multi) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/uFpmO9i5sXkOK4gwl3By-.png",width=400) | |
| st.subheader("1.Mean",divider="green") | |
| multi = '''Mean is one of the best central tendency measure used to find the central value. | |
| It uses all the observations of the data so that we can get accurate central data point. | |
| Mean is calculated by summing all the data points and dividing by the number of data points.''' | |
| st.markdown(multi) | |
| st.latex(r''' | |
| \text{Mean} = \frac{\sum_{i=1}^{n} x_i}{n} | |
| ''') | |
| multi = ''' | |
| There are 3 types of : | |
| :violet[1. Arithemetic mean] | |
| :violet[2. Geometric mean] | |
| :violet[3. Harmonic mean]''' | |
| st.markdown(multi) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/UZDbaPifn2kfsYA5NlSNw.png",width=300) | |
| st.subheader("a. Arithemetic Mean",divider="red") | |
| multi = '''Arithmetic mean is sum of all the observations divided by the total number of observations.The data follows the arithmetic sequence''' | |
| st.markdown(multi) | |
| st.latex(r"\text{Arithmetic Mean} = \frac{\sum_{i=1}^{n} x_i}{n}") | |
| multi = '''--->xi represents each data point | |
| --->n is the total number of data points.''' | |
| st.markdown(multi) | |
| st.subheader("b. Geometric Mean",divider="red") | |
| multi = '''Geometric mean of a series containing n observations is the nth root of the product of the observations.The data follows geometric sequence''' | |
| st.markdown(multi) | |
| st.latex(r"\text{Geometric Mean} = \left( \prod_{i=1}^{n} x_i \right)^{\frac{1}{n}}") | |
| multi = '''--->xi represents each data point. | |
| --->n is the total number of data points. | |
| --->∏ denotes the product of all data points.''' | |
| st.markdown(multi) | |
| st.subheader("c. Harmonic Mean",divider="red") | |
| multi = '''Harmonic mean is the reciprocal of the arithmetic mean.The data follows the harmonic sequence it means ratio''' | |
| st.markdown(multi) | |
| st.latex(r"\text{Harmonic Mean} = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}") | |
| multi = '''--->xi represents each data point. | |
| --->n is the total number of data points. | |
| ---> harmonic mean is particularly used for rates or ratios.''' | |
| st.markdown(multi) | |
| st.markdown('''Relation between above 3 types of mean is : | |
| ''') | |
| st.latex(r"\text{Harmonic Mean} \leq \text{Geometric Mean} \leq \text{Arithmetic Mean}") | |
| st.markdown('''If the every observation of collected data is same then mean is : | |
| ''') | |
| st.latex(r"\text{Harmonic Mean} = \text{Geometric Mean} = \text{Arithmetic Mean}") | |
| multi = '''Collected data has 2 types of mean : | |
| population mean("**parameter**") | |
| sample mean("**statistics**") | |
| ''' | |
| st.markdown(multi) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/fNxt5Geo3GZp5kZP0pHiV.png") | |
| st.subheader("Population Mean",divider="violet") | |
| st.markdown('''measure of population mean is probably known as parameter''') | |
| st.latex(r"\text{Population Arithmetic Mean} (\mu) = \frac{1}{N} \sum_{i=1}^{N} x_i") | |
| st.latex(r"\text{Population Geometric Mean} = \left( \prod_{i=1}^{N} x_i \right)^{\frac{1}{N}}") | |
| st.latex(r"\text{Population Harmonic Mean} = \frac{N}{\sum_{i=1}^{N} \frac{1}{x_i}}") | |
| st.subheader("Sample Mean",divider="violet") | |
| st.markdown('''measure of sample mean is probably known as statistics''') | |
| st.latex(r"\text{Sample Arithmetic Mean} (\bar{x}) = \frac{1}{n} \sum_{i=1}^{n} x_i") | |
| st.latex(r"\text{Sample Geometric Mean} = \left( \prod_{i=1}^{n} x_i \right)^{\frac{1}{n}}") | |
| st.latex(r"\text{Sample Harmonic Mean} = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}") | |
| multi = '''--->N represents the total number of observations in the population - size of population data | |
| --->n represents the total number of observations in the sample - subset of population data | |
| --->xi represents each individual observation.''' | |
| st.markdown(multi) | |
| st.subheader("2.Median",divider="green") | |
| multi = '''Median is one of the central tendency measure used to find the central value. | |
| Median gives the precise central value of the data. | |
| It uses only central values so we may not get accurate central point of the data.It is used mostly in ordered data.''' | |
| st.markdown(multi) | |
| multi=''':red[Odd number of observations]''' | |
| st.markdown(multi) | |
| st.latex(r"\text{Median} = x_{\left(\frac{n+1}{2}\right)}") | |
| multi=''':red[Even number of observations]''' | |
| st.markdown(multi) | |
| st.latex(r"\text{Median} = \frac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2} + 1\right)}}{2}") | |
| st.subheader("3.Mode",divider="green") | |
| multi = '''Mode is one of the central tendency value used to find the central data. | |
| Mode gives the datapoint or value which is frequently occuring.It is mostly used in categorical data. | |
| Types of Mode: | |
| There are 5 types of mode. They are No mode, Unimode, Bimode, Trimode, and Multimode | |
| :violet[No mode] - set of data which has no frequently ocuuring value then it contains no mode | |
| :violet[Unimodal Mode] - set of data with one mode (frequently repeated values are 1) is known as a unimode. | |
| :violet[Bimodal Mode] - set of data with two modes (frequently repeated values are 2) is known as a bimode | |
| :violet[Trimodal Mode] - set of data with three modes(frequently repeated values are 3) is known as a Trimode | |
| :violet[Multimodal Mode] - set of data with more than three modes(frequently repeated values are more than 3) is known as a multimode.''' | |
| st.markdown(multi) | |
| st.subheader("Outliers",divider="red") | |
| multi = '''Outliers are data points that differ significantly from the majority of the data in a dataset. | |
| Identifying and managing outliers is crucial for accurate data analysis as they can lead to misleading conclusions about central tendency and variability.''' | |
| st.markdown(multi) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/mwVV5w-qU0LjiBiVnr-H3.png") | |
| multi = '''Central tendency value can be corrupted by the outliers.If the dataset is having outlier then mean is pulled out by the outlier. | |
| --->If the outliers are in dataset we have to use median as it is not effected by the outlier.If the outliers are more than 50% then only median will be effected -in this case the data is separated and statistical measures are calculated | |
| ''' | |
| st.markdown(multi) | |
| st.header("2.Measures of Dispersion(Variability)",divider="red") | |
| multi = '''Measures of dispersion tells about how the collected data is dispersed or spread around the central value''' | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/20BD0t7MzQ9dmV3FKmjUK.png",width = 300) | |
| multi = '''Measures of dispersion is divided into 2 categories: | |
| :blue[1.Absolute Measure]:If group of data spreads have same unit then absolute measure is used as it has same unit | |
| :blue[2.Relative Measure]:If the group of data spreads have different units then relative measure is used as it is free from unit''' | |
| st.markdown(multi) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/oJgs2OuROEi-w-xFLY6es.jpeg",width=300) | |
| st.subheader("1.Absolute Measure",divider="blue") | |
| multi='''There are 4 types of absolute measures: | |
| :red[a. Range] | |
| :red[b. Quantile deviation] | |
| :red[c. Variance] | |
| :red[d. Standard deviataion]''' | |
| st.markdown(multi) | |
| st.subheader("2.Relative Measure",divider="blue") | |
| multi='''There are 4 types of relative measures: | |
| :red[1. Co-efficient of Range] | |
| :red[2. Co-efficient of Quartile Deviation] | |
| :red[3. Co-efficient of Variation] | |
| :red[4. Co-efficient of Standard Deviation]''' | |
| st.markdown(multi) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/66bde9bf3c885d04498227a0/WbyJIpIHhQQwwwVSLzQ77.png",width = 300) | |
| st.subheader("Range",divider="violet") | |
| multi = '''Range is a measure of dispersion which gives the difference between maximum value and minimum value in the data spread. | |
| It depends on unit as it is absolute range. | |
| **Range = maximum - minimum** | |
| ''' | |
| st.markdown(multi) | |
| st.subheader("Co-efficient of range",divider="violet") | |
| st.image('https://cdn-uploads.huggingface.co/production/uploads/66c760d3f1a4cc6587be7790/n-dJg048bp3IADdAYCH_K.png',width=300) | |
| st.subheader("Quartile Deviation",divider="violet") | |
| multi = '''Quartile deviation is inter-quartile range divided by two . | |
| Inter-quartile range is difference between upper quartile and lower quartile in the distribution | |
| Interquartile Range = Upper Quartile (Q3)–Lower Quartile(Q1) | |
| It is known as Semi-Inter-Quartile Range i.e. half of the difference between the upper quartile and lower quartile''' | |
| st.markdown(multi) | |
| st.latex(r''' | |
| \text{Quartile Deviation} = \frac{Q_3 - Q_1}{2} | |
| ''') | |
| st.write("Where:") | |
| st.latex(r''' | |
| Q_3 \text{ is the third quartile (75th percentile)} | |
| ''') | |
| st.latex(r''' | |
| Q_1 \text{ is the first quartile (25th percentile)} | |
| ''') | |
| st.subheader("Quartile Deviation",divider="violet") | |
| multi = '''Quartile deviation is inter-quartile range divided by two . | |
| Inter-quartile range is difference between upper quartile and lower quartile in the distribution. | |
| IQR is to know the central 50% tendency value | |
| Interquartile Range = Upper Quartile (Q3)–Lower Quartile(Q1) | |
| It is known as Semi-Inter-Quartile Range i.e. half of the difference between the upper quartile and lower quartile''' | |
| st.markdown(multi) | |
| st.latex(r''' | |
| \text{Quartile Deviation} = \frac{Q_3 - Q_1}{2} | |
| ''') | |
| st.write("Where:") | |
| st.latex(r''' | |
| Q_3 \text{ is the third quartile (75th percentile)} | |
| ''') | |
| st.latex(r''' | |
| Q_1 \text{ is the first quartile (25th percentile)} | |
| ''') | |
| multi = '''Quartile deviation only gives central 50% values that are close to median or not (it basically gives the behaviour of central data). | |
| --->**More the spread then the deviation is high** | |
| --->**spread is directly proportional to deviation** | |
| Quartile deviation has some basic terms: | |
| :red[Quantile]:To summarize the central tendency or dispersion - when quantiles(values) are dividing the data into equal parts then the part of dividing into equal parts are quantiles which are of 3 types | |
| There are 3 types of quantile which divide the data: | |
| :violet[1.Quartile]: divides the data into 4 equal parts | |
| :violet[2.Percentile]: when the data is going to divide into 100 equal parts that particular quantile is known as percentile | |
| :violet[3.Decile]:when the data is going to divide into 10 equal parts that particular quantile is known as decile | |
| ''' | |
| st.markdown(multi) | |
| st.subheader("Percentile Formula") | |
| st.latex(r''' | |
| L_p = \frac{p(n + 1)}{100} | |
| ''') | |
| st.write("Where:") | |
| st.latex(r''' | |
| L_p \text{ is the } n \text{-th percentile,} | |
| ''') | |
| st.latex(r''' | |
| n \text{ is the number of observations,} | |
| ''') | |
| st.subheader("Decile Formula") | |
| st.latex(r''' | |
| D_p = \frac{p(N + 1)}{10} | |
| ''') | |
| st.write("Where:") | |
| st.latex(r''' | |
| D_p \text{ is the } p \text{-th decile,} | |
| ''') | |
| st.latex(r''' | |
| n \text{ is the number of observations,} | |
| ''') | |
| st.subheader("Co-efficient of Quartile Deviation",divider="violet") | |
| st.latex(r''' | |
| \text{Coefficient of Quartile Deviation} = (\frac{Q_3 - Q_1}{Q_3 + Q_1})*100''') | |
| st.subheader("Variance",divider="violet") | |
| multi = '''Variance is used for measuring the dispersion or spread.Average of spread or dispersion is known as variance. | |
| -->to check the consistency of data co-efficient of variance is used''' | |
| st.markdown(multi) | |
| st.latex(r''' | |
| \sigma^2 = \frac{\sum (X_i - \mu)^2}{N}''') | |
| multi = '''There are two types of variance: | |
| :red[population variance] | |
| :red[sample variance]''' | |
| st.markdown(multi) | |
| st.subheader("Population Variance") | |
| st.latex(r''' | |
| \sigma^2 = \frac{\sum_{i=1}^{N} (X_i - \mu)^2}{N}''') | |
| st.subheader("Sample Variance") | |
| st.latex(r''' | |
| s^2 = \frac{\sum_{i=1}^{n} (X_i - \bar{X})^2}{n - 1} | |
| ''') | |
| st.subheader("Co-efficient of variance",divider="violet") | |
| st.subheader("Coefficient of Variance (Population)") | |
| st.latex(r''' | |
| \text{CV} = \left( \frac{\sigma}{\mu} \right) \times 100''') | |
| st.subheader("Coefficient of Variance (Sample)") | |
| st.latex(r''' | |
| \text{CV} = \left( \frac{s}{\bar{X}} \right) \times 100''') | |
| st.subheader("Standard Deviation",divider="violet") | |
| multi = '''Variance can't be easily interpreted because we are doubling the deviation i.e.,variance is also doubled to overcaome this standard deviation is used. | |
| --->**More the spread it means more the standard deviation** | |
| --->**spread is directly proportional to the standard deviation**''' | |
| st.markdown(multi) | |
| st.latex(r''' | |
| \sigma = \sqrt{\frac{\sum (X_i - \mu)^2}{N}}''') | |
| multi = '''Outliers are detected by standard deviation. | |
| If the points are away from 3 standard deviation they are considered as outliers. | |
| 3STD is used as treshold to check outliers''' | |
| st.markdown(multi) | |
| st.subheader("Co-efficient of Standard deviation",divider="violet") | |
| st.latex(r''' | |
| \text{Coefficient of Standard Deviation} = \frac{\sigma}{\mu}''') | |
| multi = '''As the sample data is subset of population data there is a error known as sampling error ---> to overcome this error degree of freedom is used. | |
| When the outliers are in the data the measure of dispersion is known as **MAD-Median Absolute Deviation**''' | |
| st.markdown(multi) | |
| st.subheader("Population standard deviation") | |
| st.latex(r''' | |
| \sigma = \sqrt{\frac{\sum (X_i - \mu)^2}{N}}''') | |
| st.subheader("Sample standard deviation") | |
| st.latex(r''' | |
| s = \sqrt{\frac{\sum (X_i - \bar{X})^2}{n - 1}}''') | |
| st.latex(r''' | |
| \text{Population Coefficient of Standard Deviation} = \frac{\sigma}{\mu}''') | |
| st.latex(r''' | |
| \text{Sample Coefficient of Standard Deviation} = \frac{s}{\bar{X}}''') | |
| multi = '''--->σ is the population standard deviation | |
| --->s is the sample standard deviation | |
| --->Xi represents each data point | |
| --->μ is the population mean | |
| --->Xˉ is the sample mean | |
| --->N is the total number of data points in the population | |
| --->n is the number of data points in the sample.''' | |
| st.markdown(multi) | |
| st.subheader("MAD(Median Absolute Deviation)",divider="blue") | |
| multi = '''When the outliers are in the data the measure of dispersion is known as **MAD-Median Absolute Deviation** | |
| --->It is a measure of variability of a dataset''' | |
| st.markdown(multi) | |
| st.latex(r''' | |
| \text{MAD} = \text{median}(|X_i - \text{median}(X)|)''') | |
| st.header("3.Measures of Distribution",divider="red") | |
| multi = '''It tells about the shape of data and how the data looks and to know the pattern of the data. | |
| To know whether the data is frequently occuring''' | |
| st.markdown(multi) |