Spaces:
Build error
Build error
File size: 14,352 Bytes
5cca6da b37abe9 952f37e f07ab19 650d02a 87938e7 650d02a 6d2c420 72c4533 6d2c420 72c4533 6d2c420 7a56fd8 e0044bf 4f41aef e0044bf 3352063 e0044bf 3352063 f07ab19 165dead f07ab19 2cf24ac 7577268 f07ab19 aea7dab f6411d7 612242a 33c0b62 fea40fa 0996a69 33c0b62 fea40fa 6ba9e3b d359731 6ba9e3b e596332 fb64e36 f6411d7 fb64e36 cffd70d fb64e36 cffd70d a6d7122 2e9fb74 b19884c 7d72bf0 2e9fb74 7d72bf0 5b199ba 7d72bf0 5b199ba 7d72bf0 b19884c 4912b7a b19884c 952f37e 4912b7a 952f37e d746236 952f37e 4912b7a 952f37e b19884c 34e4796 4912b7a 34e4796 c38d0c6 34e4796 b19884c 1e49b33 34e4796 c38d0c6 34e4796 c38d0c6 34e4796 7ffb313 49c6fee bfbdd5e 49c6fee 7ffb313 cfeebb5 8a0a14f 476b86a ed92f5d 2a17b40 43354f9 2cf24ac 43354f9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 | import streamlit as st
import math
from functools import reduce
st.title(":red[**1 : INTRODUCTION TO STATISTICS**]")
st.markdown("""_In this field we will be dealing with data by using programing language python. The term DATA
ANALYSIS itself say’s that it will be dealing with data. In this we will be collecting the data and
will be cleaning the data and then we will be analyzing the to get the insights from them. Now
let us understand the term data._""")
st.header("*What does term data refers to?*")
st.subheader(":blue[DATA]")
st.markdown("""Data is collection of information which is gathered from observation. There are wide
sources of information. Some of the best examples of data are given below. \n * IMAGE is one of the best source of data. \n * TEXT is one of the best source of data.
\n * VIDEO is one of the best source of data. \n * AUDIO is one of the best source of data.
""")
st.header("DATA is classified into 3-types.")
st.subheader("Structured Data", divider=True)
st.subheader("Unstructured Data", divider=True)
st.subheader("Semi Structured Data", divider=True)
st.subheader("**Structured Data**")
st.markdown("""This type of data will be having a effective or well organized
format.\nThis type of data is aligned in terms of row’s and column’s. Some of the best example’s of
structured data are given below.\n * EXCEL DOCUMENT \n * STRUCTURED QUERY LANGUAGE DATABASE
""")
st.image('https://cdn-uploads.huggingface.co/production/uploads/64c972774515835c4dadd754/dSbyOXaQ6N_Kg2TLxgEyt.png', width=400)
st.subheader("**Unstructured Data**")
st.markdown("""This type of data will not be having any effective or well
organized format. This type of data doesn’t have any row’s and column’s. Some of the best
example’s of unstructured data are given below.\n * IMAGE\n * VIDEO\n * TEXT\n *Social Media Feeds
""")
st.image("https://cdn-uploads.huggingface.co/production/uploads/64c972774515835c4dadd754/xhaNBRanDaj8esumqo9hl.png", width=400)
st.subheader("**Semi Structured Data**")
st.markdown("""This type of data can be called as combination of
structured data as well as unstructured data. Some of the best examples of semi structured
data are given below.\n * COMMA SEPERATED VARIABLE\n *JSON FILES\n * E-MAILS\n * HTML
""")
st.image("https://cdn-uploads.huggingface.co/production/uploads/64c972774515835c4dadd754/Nupc6BePInRVo9gJwLfWH.png", width=400)
st.title("2 : INTRODUCTION TO STATISTICS")
st.markdown("""_The term statistics is a branch of mathematics and also can be called as a huge field in which
we are going to deal with data which involves collecting, analyzing, interpreting, and
structuring the data. Statistics is classified into two types.
_""")
st.subheader("2.1 Descriptive Statistics",divider=True)
st.subheader("2.2 Inferential Statistics",divider=True)
st.subheader("**2.1 Descriptive Statistics**")
st.markdown("""This Descriptive Statistics describe the main feature of data. This
descriptive statistics can be performed on sample data as well as population data. Some of
the key points of descriptive statistics are stated below.\n KEY COCEPTS\n * Measurement of Central Tendency which involves finding Mean, Median, and Mode.\n * Measurement of Dispersion which involves finding Range, Variance and Standard Deviation.\n * Distribution which gives how frequently the data is occurring some of examples of distribution are Gaussian, Random, and Normal distribution""")
st.subheader("Measure Of Central Tendency",divider=True)
st.markdown("""The measure of central tendency is used to find the central average value of the data.The central tendency can be computed by
useing three ways \n * Mode \n * Median \n * Mean""")
st.subheader("MODE",divider=True)
st.markdown("""Mode will be giving the centeral tendency based on most frequently occuring data.The major drawback of mode is its frequecy baised it
mostly focus on the data which is occuring most times.Here in this mode we might come across some situation's like """)
st.markdown(''':violet[No_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,2,3,4,5] here we don't have
frequency of numbers repeating in this senario we will come accross No_Mode situaton.
''')
st.markdown(''':violet[Uni_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,1,1,2,3,4,5]. here by
checking the list it will tend to know that the frequency of number 1 is more and it returns the value 1 as output.
''')
st.markdown(''':violet[Bi_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,1,2,2,3,4,5]. here by
checking the frequency in list we come across a situtaion where we will find two maximun frequecy repeated value hence the output will be Bi_Mode.
''')
st.markdown(''':violet[Tri_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,1,2,2,3,3,4,5]. here by
checking the frequency in list we come across a situtaion where we will find three maximun frequecy repeated value hence the output will be Tri_Mode.
''')
st.markdown(''':violet[Multi_Mode] \n Let's understand why this situation raises for example let's take list of numbers [1,1,2,2,3,3,4,4,5]. here by
checking the frequency in list we come across a situtaion where we will find more than three maximun frequecy repeated value hence the output will be Multi_Mode.
''')
st.title("Calculate Mode")
def mode(*args):
list1 = list(args)
dict1 = {}
dict2 = {}
set1 = set(list1)
for j in set1:
dict1[j] = list1.count(j)
max_value = max(dict1.values())
count = [key for key, value in dict1.items() if value == max_value]
if max_value == 1:
return 'no mode'
elif len(count) == len(set1):
return 'no mode'
elif len(count) == 1:
dict2[count[0]] = dict1.get(count[0])
return dict2
elif len(count) == 2:
return 'bi mode'
elif len(count) == 3:
return 'tri mode'
else:
return 'multimode'
numbers_input = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 2, 3, 4):")
if numbers_input:
try:
list1 = list(map(int, numbers_input.split(',')))
result = mode(*list1)
st.write("Mode result:", result)
except ValueError:
st.write("Please enter a valid list of numbers separated by commas.")
st.subheader("Median",divider=True)
st.markdown("""Median will also be giving the central tendency.But the major drawback of median is it prior foucus will be on the central value.
In order to find the mean first we have to sort the give list and based on the length of the list the formula are changed""")
st.subheader("Median Formula for Odd Number of Observations")
st.latex(r'''
\text{Median} = X_{\left(\frac{n+1}{2}\right)}
''')
st.subheader("Median Formula for Even Number of Observations")
st.latex(r'''
\text{Median} = \frac{X_{\left(\frac{n}{2}\right)} + X_{\left(\frac{n}{2}+1\right)}}{2}
''')
def median(list1):
list1.sort()
length = len(list1)
if length % 2 == 0:
mid1 = length // 2 - 1
mid2 = length // 2
return (list1[mid1] + list1[mid2]) / 2
else:
mid = length // 2
return list1[mid]
st.title("Calculate Median")
numbers_input_1 = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 3, 4, 5):", key="numbers_input_1")
if numbers_input_1:
parts = numbers_input_1.split(',')
list1 = []
for num in parts:
num = num.strip()
if num.isdigit():
list1.append(int(num))
if list1:
result = median(list1)
st.write("Median result:", result)
else:
st.write("No valid numbers provided.")
st.subheader("Mean",divider=True)
st.markdown("""
Mean is one of the beautiful measurement of central tendency it invovles all the data present in it.The only drawback of mean is it is
effected by outliers.Based on the data we will compute the mean in three types""")
st.subheader("Arthmetic Mean",divider=True)
st.markdown("""Arthmetic Mean is used on data which have \n * Interval and Ratio Data \n * Symmetric Distributions \n * Data Without Outliers
""")
st.subheader("Population Mean Formula")
st.latex(r'''
\mu = \frac{1}{N} \sum_{i=1}^{N} x_i
''')
st.subheader("Sample Mean Formula")
st.latex(r'''
\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i
''')
def arthamatic_mean(list1):
sum=reduce(lambda x,y: x+y,list1)
return sum/len(list1)
st.title("Calculate Arthmetic_Mean")
numbers_input_2 = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 3, 4, 5):", key="numbers_input_2")
if numbers_input_2:
parts=numbers_input_2.split(",")
list1=[]
for i in parts:
i = i.strip()
if i.isdigit():
list1.append(int(i))
if list1:
result=arthamatic_mean(list1)
st.write("Arthmetic_Mean",result)
else:
st.write("No valid numbers provided.")
st.subheader("Geometric Mean",divider=True)
st.markdown("""Geometric Mean is used on data which have \n * Multiplicative Data \n * Percentages and Rates \n * Normalized Data
""")
st.subheader("Geometric Mean for Population")
st.latex(r'''
\text{GM}_{\text{population}} = \left( \prod_{i=1}^{N} x_i \right)^{\frac{1}{N}}
''')
st.subheader("Geometric Mean for Sample")
st.latex(r'''
\text{GM}_{\text{sample}} = \left( \prod_{i=1}^{n} x_i \right)^{\frac{1}{n}}
''')
def geometric_mean(list1):
mul=reduce(lambda x,y: x*y,list1)
return round(mul**(1/len(list1)),2)
st.title("Calculate Geometric_Mean")
numbers_input_3 = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 3, 4, 5):", key="numbers_input_3")
if numbers_input_3:
parts=numbers_input_3.split(",")
list1=[]
for i in parts:
i = i.strip()
if i.isdigit():
list1.append(int(i))
if list1:
result=geometric_mean(list1)
st.write("Geometric_Mean",result)
else:
st.write("No valid numbers provided.")
st.subheader("Harmonic Mean",divider=True)
st.markdown("""Harmonic Mean is used on data which have \n * Rates and Ratios \n * Data with Reciprocal Relationships
""")
st.subheader("Harmonic Mean for Population")
st.latex(r'''
\text{HM}_{\text{population}} = \frac{N}{\sum_{i=1}^{N} \frac{1}{x_i}}
''')
st.subheader("Harmonic Mean for Sample")
st.latex(r'''
\text{HM}_{\text{sample}} = \frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}
''')
def harmonic_mean(list1):
sum=reduce(lambda x,y: x+1/y,list1)
return round(len(list1)/sum,2)
st.title("Calculate Harmonic_Mean")
numbers_input_4 = st.text_input("Enter a list of numbers separated by commas (e.g., 1, 2, 3, 4, 5):", key="numbers_input_4")
if numbers_input_4:
parts=numbers_input_4.split(",")
list1=[]
for i in parts:
i = i.strip()
if i.isdigit():
list1.append(int(i))
if list1:
result=harmonic_mean(list1)
st.write("Geometric_Mean",result)
else:
st.write("No valid numbers provided.")
st.subheader("Measure Of Disperssion ",divider=True)
st.markdown("""Measure Of Disperssion will give spread of our collected data around the central value.It's classifed into two types
""")
st.markdown(''':violet[Absolute Measure] \n absolute will give the spread of data in one unit.for example if the given data is in 'cm'
the output will be in cm''')
st.markdown(''':violet[Relative Measure] \n Relative will be free from unit's''')
st.header("**Absolute Measure**")
st.subheader("Range",divider=True)
st.subheader("Quartile Deviation",divider=True)
st.subheader("Varience",divider=True)
st.subheader("Standard Deviation",divider=True)
st.header("**Relative Measure**")
st.subheader("Coefficent Of Range",divider=True)
st.subheader("Coefficent Of Quartile Deviation",divider=True)
st.subheader("Coefficent Of Varience",divider=True)
st.subheader("Coefficent Of Standard Deviation",divider=True)
st.markdown(''':orange[**Range**] is one of the measure to find the disperssion.But is not at all mostly used beause it don't focus on the entire data.
''')
st.subheader("Absolute Range")
st.latex(r'''
\text{Absolute Range} = \text{Maximum Value} - \text{Minimum Value}
''')
st.subheader("Relative Range")
st.latex(r'''
\text{Relative Range} = \frac{\text{Absolute Range}}{\text{Mean}} \times 100
''')
st.markdown(''':orange[**Quartile Deviation**] is one of the measure to find the disperssion.In this type the data is divided into 4 equal parts.
It will mostly focus on the central data.
''')
st.subheader("Absolute Quartile Deviation")
st.latex(r'''
QD = \frac{Q3 - Q1}{2}
''')
st.subheader("Relative Quartile Deviation")
st.latex(r'''
\text{Relative QD} = \frac{Q3 - Q1}{Q3 + Q1} \times 100
''')
st.markdown(''':orange[**Varience**] is one of the measure to find the disperssion.It is one of the best measure to find the disperssion.The only
drawback is when in Varience is in order to overcome negitive value we square them thus the distance is doubled
''')
st.subheader("Absolute Variance")
st.latex(r'''
\text{Var} = \frac{1}{N} \sum_{i=1}^{N} (x_i - \bar{x})^2
''')
st.subheader("Relative Variance")
st.latex(r'''
\text{Relative Var} = \frac{\text{Var}}{\bar{x}} \times 100
''')
st.markdown(''':orange[**Standard Deviation**] is one of the measure to find the disperssion.It is one of the best measure to find the disperssion.It over comes the
disadvantage occured in varience by square rooting it.
''')
st.subheader("Absolute Standard Deviation")
st.latex(r'''
\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \bar{x})^2}
''')
st.subheader("Relative Standard Deviation")
st.latex(r'''
\text{Relative SD} = \frac{\sigma}{\bar{x}}
''')
st.subheader("Distribution",divider=True)
st.markdown(''':blue[**Distribution**] is a measure will will tell how the shape of data or in which shape the data is spread.It will help in
analysis.There are few types of distribution \n * Normal Distribution \n * Uniform Distribution \n * Binomial Distribution \n * Poisson Distribution
\n * Exponential Distribution \n * Chi-Square Distribution \n * T-Distribution
''')
st.subheader("**2.2 Inferential Statistics**")
st.markdown("""This Inferential Statistics will describe the population based
on a sample data. This statistics will give predictions about a population based on sample.
""") |