When it's a numeric variable okay why it is called an unbiased estimator of the population coming to your point if we take many samples from a population and we compute the mean of those samples and then we compute the mean of mean that mean of mean will be equal to the population mean and that is what is central mid theorem if I take many samples I compute mean of each sample and then I compute mean of mean for example I take a hundred samples from the population for each of the 100 samples I compute a mean I get how many means 100 means can I now do an average of those 100 mean I can the mean of those mean will be equal to the population mean this is central limit theorem okay some people are thinking how it will happen will validate with simulation, okay a mean is called as an unbiased estimator as opposed to an estimate and this unbiased estimate is called sample statistic not sample statistics the unbiased estimator is called the sample statistic.

You should know these jargons okay but what was either G of X but what was expected of all the eggs boss okay yeah because if you compute the expected of those values the expected of those values will be equal to population okay that's why it is called an unbiased estimator it gives you a value which is representing the population value okay sampling distribution conceptual framework the probability distribution of all possible values a sample statistic can take is called sampling distribution the probability distribution of all possible values a sample statistic can take is called the sampling distribution.

Let's assume what what this means in great length for last two years probably some 20 batches has happened okay we collect data of all the people who participate in this assume average batch sizes 50 so 20 into 50,000 observations so I've got data of thousand observations from this thousand observations I take a random sample and of 50 students that is sample 150 student and I compute the me say the mean age comes out to be 28 okay this is the mean age mean age in yes then I take another sample s 2 again 50 observations this time around it may come 30 I keep taking the samples I repeat it hundred times this observations when I plot basically this is a continuous number for the timing let's break it into buckets off rounded ears for the discussion let's assume we break it into pockets of rounded ears in that case the plot would come something something like this okay the plot will come something like this what this graph is trying to say overall if I would have taken the thousand of the region the mean would have come say twenty-eight.

But now when I take the sample will the mean come exactly twenty eight no well now sometimes it will come twenty eight point above or sometimes it will come below and it can sometimes probably even take a value which is 22 maybe and there is a very thin chance it may take the value which is 38 think chance but still there okay if I make a distribution of all the samples if I take a distribution of all the samples the probability of I getting a value which is say 27 or I getting a value which is 29 is the probability of out of 100 samples I would have got more number of times the value which is very close to 27 28 29 definitely the probability around 28 would be the highest as I go away from this mean I might probably get one out of 100 observations where I got this 22 so it will be one person and I would have got probably one out of 100 observation we just study it if you compute a probability distribution of this continuous number that is called as the sampling distribution that is called a sampling distribution are you getting I take samples from the population i compute mean of the samples and make a distribution plot of those mean value that is called the sampling distribution okay.

Another clear yes we are going to prove this point with simulation your understanding is straight but the distribution is though the mean will receive mean of the mean mean of the sample means that is this is the sampling distribution in the sampling distribution okay this is called sampling distribution everyone your other till here everyone it's okay yes sir this is the question that you're asking this will be validated so you can note down that this will be validated I will true through simulation next ocean the population now the second question is what the second question what he is trying to say is if I make distribution of the population which means I take up distribution of this the distribution of this data need not be like this the distribution of this might be this may be the distribution of the population this if I make a frequency Omega frequency plot rain rounded ears I may get this kind of a graph which is not a normal distribution kind of thing okay this is called population distribution.

Now irrespective of how is the population distributed the sampling distribution will always be a normal distribution that is what we are going to the truth that is also what we are going to prove to simulation okay distribution we have got mean in the number of samples in the so this is what I explained take many samples from population compute mean x of each sample and then plot a distribution of the mean or taken over many samples and that is called sampling distribution this is exactly what I have explained here the three bullet points yeah sampling error now typically when you do your work are you going to take hundred samples from population and then on each sample you will build a regression model in that case you will end up with hundred regression models then the question will be of the hundred regression models which regression model should I use in future to predict right so we we do this kind of thing when we actually go in real-life implementation logically think and answer.

No, we cannot so what we are going to do is we'll take one sample build a model, and say the sample is representative of the population so this model should hold on to the whole population but if I take one sample my mean was coming here 28 and say that mean of that sample comes out to be 23 and I built regression model on that in that case model may not hold for the population this gap is called sampling error will solve one by one this gap is called sampling error I take one sample practically you all agree that we cannot take hundreds of sample build hundreds of models and then again will lead to that conclusion of the hundred which equation to take so practically it's very clear we have to take one sample build model on that sample and use that equation for future purpose there is no two opinion on this agreed.

So if I take one sample and my value comes 23 the difference between 23 to 28 will be called as sampling error now the question is am I willing to tolerate this much of sampling error if no then what is now answer is No answer to deities increase the sample size probably with 50 I may get more dispersion in my sampling distribution graph with 50 I may get more dispersion more range probably I increase it to hundred in that case the graph will become slightly narrower okay, in that case, my chance of sampling error will reduce by increasing the sample size I take a sample through some systematic process it is biased.

## Post a Comment