Spaces:
Build error
Build error
| import streamlit as st | |
| import math | |
| from functools import reduce | |
| st.title(":red[**1 : INTRODUCTION TO STATISTICS**]") | |
| st.markdown("""_In this field we will be dealing with data by using programing language python. The term DATA | |
| ANALYSIS itself say’s that it will be dealing with data. In this we will be collecting the data and | |
| will be cleaning the data and then we will be analyzing the to get the insights from them. Now | |
| let us understand the term data._""") | |
| st.header("*What does term data refers to?*") | |
| st.subheader(":blue[DATA]") | |
| st.markdown("""Data is collection of information which is gathered from observation. There are wide | |
| sources of information. Some of the best examples of data are given below. \n * IMAGE is one of the best source of data. \n * TEXT is one of the best source of data. | |
| \n * VIDEO is one of the best source of data. \n * AUDIO is one of the best source of data. | |
| """) | |
| st.header("DATA is classified into 3-types.") | |
| st.subheader("Structured Data", divider=True) | |
| st.subheader("Unstructured Data", divider=True) | |
| st.subheader("Semi Structured Data", divider=True) | |
| st.subheader("**Structured Data**") | |
| st.markdown("""This type of data will be having a effective or well organized | |
| format.\nThis type of data is aligned in terms of row’s and column’s. Some of the best example’s of | |
| structured data are given below.\n * EXCEL DOCUMENT \n * STRUCTURED QUERY LANGUAGE DATABASE | |
| """) | |
| st.image('https://cdn-uploads.huggingface.co/production/uploads/64c972774515835c4dadd754/dSbyOXaQ6N_Kg2TLxgEyt.png', width=400) | |
| st.subheader("**Unstructured Data**") | |
| st.markdown("""This type of data will not be having any effective or well | |
| organized format. This type of data doesn’t have any row’s and column’s. Some of the best | |
| example’s of unstructured data are given below.\n * IMAGE\n * VIDEO\n * TEXT\n *Social Media Feeds | |
| """) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/64c972774515835c4dadd754/xhaNBRanDaj8esumqo9hl.png", width=400) | |
| st.subheader("**Semi Structured Data**") | |
| st.markdown("""This type of data can be called as combination of | |
| structured data as well as unstructured data. Some of the best examples of semi structured | |
| data are given below.\n * COMMA SEPERATED VARIABLE\n *JSON FILES\n * E-MAILS\n * HTML | |
| """) | |
| st.image("https://cdn-uploads.huggingface.co/production/uploads/64c972774515835c4dadd754/Nupc6BePInRVo9gJwLfWH.png", width=400) | |
| st.title("2 : INTRODUCTION TO STATISTICS") | |
| st.markdown("""_The term statistics is a branch of mathematics and also can be called as a huge field in which | |
| we are going to deal with data which involves collecting, analyzing, interpreting, and | |
| structuring the data. Statistics is classified into two types. | |
| _""") | |
| st.subheader("2.1 Descriptive Statistics",divider=True) | |
| st.subheader("2.2 Inferential Statistics",divider=True) | |
| st.subheader("**2.1 Descriptive Statistics**") | |
| st.markdown("""This Descriptive Statistics describe the main feature of data. This | |
| descriptive statistics can be performed on sample data as well as population data. Some of | |
| the key points of descriptive statistics are stated below.\n KEY COCEPTS\n * Measurement of Central Tendency which involves finding Mean, Median, and Mode.\n * Measurement of Dispersion which involves finding Range, Variance and Standard Deviation.\n * Distribution which gives how frequently the data is occurring some of examples of distribution are Gaussian, Random, and Normal distribution""") | |
| st.subheader("Measure Of Central Tendency",divider=True) | |
| st.markdown("""The measure of central tendency is used to find the central average value of the data.The central tendency can be computed by | |
| useing three ways \n * Mode \n * Median \n * Mean""") | |
| st.subheader("MODE",divider=True) | |
| st.markdown("""Mode will be giving the centeral tendency based on most frequently occuring data.The major drawback of mode is its frequecy baised it | |
| mostly focus on the data which is occuring most times.Here in this mode we might come across some situation's like """) | |
| st.markdown(''':violet[No_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,2,3,4,5] here we don't have | |
| frequency of numbers repeating in this senario we will come accross No_Mode situaton. | |
| ''') | |
| st.markdown(''':violet[Uni_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,1,1,2,3,4,5]. here by | |
| checking the list it will tend to know that the frequency of number 1 is more and it returns the value 1 as output. | |
| ''') | |
| st.markdown(''':violet[Bi_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,1,2,2,3,4,5]. here by | |
| checking the frequency in list we come across a situtaion where we will find two maximun frequecy repeated value hence the output will be Bi_Mode. | |
| ''') | |
| st.markdown(''':violet[Tri_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,1,2,2,3,3,4,5]. here by | |
| checking the frequency in list we come across a situtaion where we will find three maximun frequecy repeated value hence the output will be Tri_Mode. | |
| ''') | |
| st.markdown(''':violet[Multi_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,1,2,2,3,3,4,4,5]. here by | |
| checking the frequency in list we come across a situtaion where we will find more than three maximun frequecy repeated value hence the output will be Multi_Mode. | |
| ''') | |
| st.title("Calculate Mode") | |
| def mode(*args): | |
| list1 = list(args) | |
| dict1 = {} | |
| dict2 = {} | |
| set1 = set(list1) | |
| for j in set1: | |
| dict1[j] = list1.count(j) | |
| max_value = max(dict1.values()) | |
| count = [key for key, value in dict1.items() if value == max_value] | |
| if max_value == 1: | |
| return 'no mode' | |
| elif len(count) == len(set1): | |
| return 'no mode' | |
| elif len(count) == 1: | |
| dict2[count[0]] = dict1.get(count[0]) | |
| return dict2 | |
| elif len(count) == 2: | |
| return 'bi mode' | |
| elif len(count) == 3: | |
| return 'tri mode' | |
| else: | |
| return 'multimode' | |
| numbers_input = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 2, 3, 4):") | |
| if numbers_input: | |
| try: | |
| list1 = list(map(int, numbers_input.split(','))) | |
| result = mode(*list1) | |
| st.write("Mode result:", result) | |
| except ValueError: | |
| st.write("Please enter a valid list of numbers separated by commas.") | |
| st.subheader("Median",divider=True) | |
| st.markdown("""Median will also be giving the central tendency.But the major drawback of median is it prior foucus will be on the central value. | |
| In order to find the mean first we have to sort the give list and based on the length of the list the formula are changed""") | |
| st.subheader("Median Formula for Odd Number of Observations") | |
| st.latex(r''' | |
| \text{Median} = X_{\left(\frac{n+1}{2}\right)} | |
| ''') | |
| st.subheader("Median Formula for Even Number of Observations") | |
| st.latex(r''' | |
| \text{Median} = \frac{X_{\left(\frac{n}{2}\right)} + X_{\left(\frac{n}{2}+1\right)}}{2} | |
| ''') | |
| def median(list1): | |
| list1.sort() | |
| length = len(list1) | |
| if length % 2 == 0: | |
| mid1 = length // 2 - 1 | |
| mid2 = length // 2 | |
| return (list1[mid1] + list1[mid2]) / 2 | |
| else: | |
| mid = length // 2 | |
| return list1[mid] | |
| st.title("Calculate Median") | |
| numbers_input_1 = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 3, 4, 5):", key="numbers_input_1") | |
| if numbers_input_1: | |
| parts = numbers_input_1.split(',') | |
| list1 = [] | |
| for num in parts: | |
| num = num.strip() | |
| if num.isdigit(): | |
| list1.append(int(num)) | |
| if list1: | |
| result = median(list1) | |
| st.write("Median result:", result) | |
| else: | |
| st.write("No valid numbers provided.") | |
| st.subheader("Mean",divider=True) | |
| st.markdown(""" | |
| Mean is one of the beautiful measurement of central tendency it invovles all the data present in it.The only drawback of mean is it is | |
| effected by outliers.Based on the data we will compute the mean in three types""") | |
| st.subheader("Arthmetic Mean",divider=True) | |
| st.markdown("""Arthmetic Mean is used on data which have \n * Interval and Ratio Data \n * Symmetric Distributions \n * Data Without Outliers | |
| """) | |
| st.subheader("Population Mean Formula") | |
| st.latex(r''' | |
| \mu = \frac{1}{N} \sum_{i=1}^{N} x_i | |
| ''') | |
| st.subheader("Sample Mean Formula") | |
| st.latex(r''' | |
| \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i | |
| ''') | |
| def arthamatic_mean(list1): | |
| sum=reduce(lambda x,y: x+y,list1) | |
| return sum/len(list1) | |
| st.title("Calculate Arthmetic_Mean") | |
| numbers_input_2 = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 3, 4, 5):", key="numbers_input_2") | |
| if numbers_input_2: | |
| parts=numbers_input_2.split(",") | |
| list1=[] | |
| for i in parts: | |
| i = i.strip() | |
| if i.isdigit(): | |
| list1.append(int(i)) | |
| if list1: | |
| result=arthamatic_mean(list1) | |
| st.write("Arthmetic_Mean",result) | |
| else: | |
| st.write("No valid numbers provided.") | |
| st.subheader("Geometric Mean",divider=True) | |
| st.markdown("""Geometric Mean is used on data which have \n * Multiplicative Data \n * Percentages and Rates \n * Normalized Data | |
| """) | |
| st.subheader("Geometric Mean for Population") | |
| st.latex(r''' | |
| \text{GM}_{\text{population}} = \left( \prod_{i=1}^{N} x_i \right)^{\frac{1}{N}} | |
| ''') | |
| st.subheader("Geometric Mean for Sample") | |
| st.latex(r''' | |
| \text{GM}_{\text{sample}} = \left( \prod_{i=1}^{n} x_i \right)^{\frac{1}{n}} | |
| ''') | |
| def geometric_mean(list1): | |
| mul=reduce(lambda x,y: x*y,list1) | |
| return round(mul**(1/len(list1)),2) | |
| st.title("Calculate Geometric_Mean") | |
| numbers_input_3 = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 3, 4, 5):", key="numbers_input_3") | |
| if numbers_input_3: | |
| parts=numbers_input_3.split(",") | |
| list1=[] | |
| for i in parts: | |
| i = i.strip() | |
| if i.isdigit(): | |
| list1.append(int(i)) | |
| if list1: | |
| result=geometric_mean(list1) | |
| st.write("Geometric_Mean",result) | |
| else: | |
| st.write("No valid numbers provided.") | |
| st.subheader("Harmonic Mean",divider=True) | |
| st.markdown("""Harmonic Mean is used on data which have \n * Rates and Ratios \n * Data with Reciprocal Relationships | |
| """) | |
| st.subheader("Harmonic Mean for Population") | |
| st.latex(r''' | |
| \text{HM}_{\text{population}} = \frac{N}{\sum_{i=1}^{N} \frac{1}{x_i}} | |
| ''') | |
| st.subheader("Harmonic Mean for Sample") | |
| st.latex(r''' | |
| \text{HM}_{\text{sample}} = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}} | |
| ''') | |
| def harmonic_mean(list1): | |
| sum=reduce(lambda x,y: x+1/y,list1) | |
| return round(len(list1)/sum,2) | |
| st.title("Calculate Harmonic_Mean") | |
| numbers_input_4 = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 3, 4, 5):", key="numbers_input_4") | |
| if numbers_input_4: | |
| parts=numbers_input_4.split(",") | |
| list1=[] | |
| for i in parts: | |
| i = i.strip() | |
| if i.isdigit(): | |
| list1.append(int(i)) | |
| if list1: | |
| result=harmonic_mean(list1) | |
| st.write("Geometric_Mean",result) | |
| else: | |
| st.write("No valid numbers provided.") | |
| st.subheader("Measure Of Disperssion ",divider=True) | |
| st.markdown("""Measure Of Disperssion will give spread of our collected data around the central value.It's classifed into two types | |
| """) | |
| st.markdown(''':violet[Absolute Measure] \n absolute will give the spread of data in one unit.for example if the given data is in 'cm' | |
| the output will be in cm''') | |
| st.markdown(''':violet[Relative Measure] \n Relative will be free from unit's''') | |
| st.header("**Absolute Measure**") | |
| st.subheader("Range",divider=True) | |
| st.subheader("Quartile Deviation",divider=True) | |
| st.subheader("Varience",divider=True) | |
| st.subheader("Standard Deviation",divider=True) | |
| st.header("**Relative Measure**") | |
| st.subheader("Coefficent Of Range",divider=True) | |
| st.subheader("Coefficent Of Quartile Deviation",divider=True) | |
| st.subheader("Coefficent Of Varience",divider=True) | |
| st.subheader("Coefficent Of Standard Deviation",divider=True) | |
| st.markdown(''':orange[**Range**] is one of the measure to find the disperssion.But is not at all mostly used beause it don't focus on the entire data. | |
| ''') | |
| st.subheader("Absolute Range") | |
| st.latex(r''' | |
| \text{Absolute Range} = \text{Maximum Value} - \text{Minimum Value} | |
| ''') | |
| st.subheader("Relative Range") | |
| st.latex(r''' | |
| \text{Relative Range} = \frac{\text{Absolute Range}}{\text{Mean}} \times 100 | |
| ''') | |
| st.markdown(''':orange[**Quartile Deviation**] is one of the measure to find the disperssion.In this type the data is divided into 4 equal parts. | |
| It will mostly focus on the central data. | |
| ''') | |
| st.subheader("Absolute Quartile Deviation") | |
| st.latex(r''' | |
| QD = \frac{Q3 - Q1}{2} | |
| ''') | |
| st.subheader("Relative Quartile Deviation") | |
| st.latex(r''' | |
| \text{Relative QD} = \frac{Q3 - Q1}{Q3 + Q1} \times 100 | |
| ''') | |
| st.markdown(''':orange[**Varience**] is one of the measure to find the disperssion.It is one of the best measure to find the disperssion.The only | |
| drawback is when in Varience is in order to overcome negitive value we square them thus the distance is doubled | |
| ''') | |
| st.subheader("Absolute Variance") | |
| st.latex(r''' | |
| \text{Var} = \frac{1}{N} \sum_{i=1}^{N} (x_i - \bar{x})^2 | |
| ''') | |
| st.subheader("Relative Variance") | |
| st.latex(r''' | |
| \text{Relative Var} = \frac{\text{Var}}{\bar{x}} \times 100 | |
| ''') | |
| st.markdown(''':orange[**Standard Deviation**] is one of the measure to find the disperssion.It is one of the best measure to find the disperssion.It over comes the | |
| disadvantage occured in varience by square rooting it. | |
| ''') | |
| st.subheader("Absolute Standard Deviation") | |
| st.latex(r''' | |
| \sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \bar{x})^2} | |
| ''') | |
| st.subheader("Relative Standard Deviation") | |
| st.latex(r''' | |
| \text{Relative SD} = \frac{\sigma}{\bar{x}} | |
| ''') | |
| st.subheader("Distribution",divider=True) | |
| st.markdown(''':blue[**Distribution**] is a measure will will tell how the shape of data or in which shape the data is spread.It will help in | |
| analysis.There are few types of distribution \n * Normal Distribution \n * Uniform Distribution \n * Binomial Distribution \n * Poisson Distribution | |
| \n * Exponential Distribution \n * Chi-Square Distribution \n * T-Distribution | |
| ''') | |
| st.subheader("**2.2 Inferential Statistics**") | |
| st.markdown("""This Inferential Statistics will describe the population based | |
| on a sample data. This statistics will give predictions about a population based on sample. | |
| """) |