Why we need to have the knowledge of stats the number skills that we should have so till here we all have got the number skills' probability distribution and today we have central limit theorem and normal distribution to be covered right different types of numeric variables measures of central tendency mean median mode measures of dispersion standard deviation variance range interquartile range so what is the objective disposition why we need dispersion to see the spread of the data more spread more heterogeneity less spread more squeeze the data is homogeneous data is and very nice application from a marketing standpoint if the customers are more spread which means I need to cater to different target segments and according to that I need to have different offers different product designs how to show data one slider where I try to put.

How we should see data right if it is a categorical variable frequency distribution proportions cross if it is a continuous variable that measures of central tendency measures of disposition and if it is a discrete variable where we have to see frequency distribution proportions and mean mode I just saw the frequency distribution of the rating that you gave for me my last session so the mean came out to be very good 4.5 above and most of you have rated me 5 so analysis right the various codes I'm not going through the concepts of the code of probability, so we covered the concepts of probability what is the probability the various terminologies what is mutually exclusive event dependent event independent event joint probability conditional probability and to do those things we have contingency table we have got a base theorem.

So all of those things we covered in our last session I'm just going through the slides so that it gets recorded both in the Panoply and also that registered in your memory and then I said there is the application of conditional probability Market Basket analysis I hope you've got time to go through this slide or probably it will get covered or might have got covered in BA in your marketing based theorem extension of the conditional probability right I covered this example of the disease right one of the very good examples to understand a Bayes theorem Association of attributes you'll squid ancient of the association I had mentioned okay anyone got time to do the calculation of use coefficient of association okay distributions I covered I got lucky seven examples yes so that reflects rate I did that so probability distributions binomial boys on distribution and this is where I had stopped me now we get into a sampling okay.

Sampling distribution from a large data I take a very small sample from sample I get a feel of the population what I mean by feel is I'm making inference about the population so sampling from a population we will take a sample and from the sample, we inference about the population very simple okay and the reason why have to why we take many decisions based on sampling is its many times not practical to analyze the whole data okay now let's understand a term which is called a random variable and here before that objective of sampling is to derive inference from a sample about the population.

When I take a sample it has to be a random sample okay if it is a biased sample you will not get proper inference about the population now let's understand what I mean by random and what I mean by biased assume I have got the data and I sort the data by gender and I take ten percentage of observation from the top what will happen the only male of female William right if I've got a high concentration to assume 50 50 percentage male-female and I did take 10 percent of initial observation assuming data is large in that case what will happen is only male or only female will come now any inference I draw from that will it make sense No so the sample should be random only then the inference that you derive of the population will form the only then the inference you derive from the sample will be able to relate to the population now the question comes how will a random sample give me inference about the population okay.

So typically what we say is assume you have taken a sufficiently large random sample from the population in that case the mean of the sample will be very close to the population mean assumption is sufficiently large a random sample now the move in the word random comes up many people think how will the random sample give me a right inference about the population now let's rephrase the same statement you say random in statistics means not biased so now rephrase the statement as you may take a sufficiently large good the quote-unquote unbiased a sample random means unbiased sample assume we take sufficiently large code unbiased sample code clothes from the population, in that case, the mean of the sample will be representative of the population mean so random means unbiased depending on the use case and dependent of the population to that sample size calculation.

Now let us understand point estimator as population parameter mean when you talk of mean does it give us one value to you when you compute mean or for data does it give her one value that's why it is called a point estimate okay range range is not a point estimate okay two main point estimates of a population are mean and proportion what is proportion if I have a variable gender can I compute mean now in that case the parameter estimate becomes proportion what proportion of observations are made a female right so in that case the point estimate becomes the proportion these are the two important population parameter estimates mean proportion mean for a sample is represented by x-bar notations are important notations are important why because if you know the notations when you study some literature some work for example you are trying to find out understand some machine learning algorithm and in machine learning algorithm you come across x-bar you will come across X cap will come across moon and what happens is when we go through those documents have the time we simply are no other document because it's lots of math which is written thing is we don't understand many of those rotations that's why we feel it's too complex okay.

So notations are important x-bar is used to represent sample mean mu is used to represent the population means proportion is represented by p cap and population proportion is represented by pi okay the sample mean x-bar is considered as an unbiased estimator of the population the sample mean is considered as an unbiased estimator of the population.