I did not properly snip the image yeah Slovakia - n - case that's right the form is because you are doing summation across all of them you're in one shot you are doing summation across all of them this way the degree of freedom is summation vice you will take no in the numerator you have to take all of them numerator you have to take no okay when you do standard deviation sample standard deviation formula what is the symbol standard deviation you have 30 observations what do you do X minus X bar the whole square right in numerator do you eliminate any observation no so here now assume this data point is data only for data points is available.

Now I ask you to do computer variability what you will do X minus X bar the whole square in the numerator you will take also are you going to eliminate anyone no that's what I am doing in the numerator know what is the formula for standard deviation n minus X minus X bar the whole square divided by n minus right so we do n minus one so here K minus one yeah so far clearer simpler English making all of these calculations we more we go into it more complexity but I am trying to detail out as much as possible okay one snapshot all of that will going to sink in I understand no conceptual thing if you understand if you don't understand this in the first go I'll say ignore but if you don't understand this in the first go I say not ignore you know you should know when to apply ANOVA inoa is a test of mean and whether mean of the groups are same or not why it is called ANOVA not and anm because if you are not just simply testing the mean if we simply testing the mean.

I simply go and see the mean value eight nine ten eleven former numbers are different conclude but here the diagrammatic time I am a trying to explain that the mean may be same but the segments are different right so we are really trying to prove are the segments in different we are not just trying to prove other means equal okay so so all means are same is the hypothesis alternatives at least one of the meanest different if you understand and this is an application where there are more than two groups there is an application okay real application with something like this real application is something like this I am trying out something I am trying out that into multiple parts and then I am comparing them the real application is that is the real application okay.

Now for sake of the discussion, I'm going to take my hypothesis test data so here the application is in a banking domain I want to see whether the balance is maintained by a salaried profile a salad in a count holder versus a professional account holder versus a businessmen account holder versus an illiterate account holder other balance is significantly different logically it should be different I don't need to do a test also logically it should be different but I want to prove myself so I can take the entire bank data segment-wise right a segment-wise I can take the mean and standard deviation and apply all over to that fact right is banking behavior that is the usage of a cash deposit or usage of a DJ's payment or usage of internet banking significantly different for salaried self-employed I want to value it how can I validate I can't write.

So some applications what do we do first import the file we input it again I know what makes the assumption it's a extension of t-tests only so it makes the assumption that samples are normally distributed that is the first assumption second assumption is the data that you are trying to analyze should have equal variance so this example which I try to explain here the variance are unequal but that was the object to to highlight the concept why variance so here the second assumption is the data that you are analyzing will have equal variance so I can have a scenario where I have this distribution I can have a scenario where I have this ocean and where I can have a scenario where I have more or less radiance maybe same but this may be skewed on one side this may not be skewed on other side okay because of that mean my still overlap okay so the second assumption is are the variance equal ANOVA makes the underlying assumption that the data is normally distributed and variance is more or less equal okay.

I check the normality using a QQ plot what did I say QQ plot should give a line at a 45-degree straight line is it giving a 45-degree line no which means the balance variable for which I have made up this is a balance variable from hypothesis test for which I have made a plot it's not normally distributed can I apply a Nova no I cannot apply no but for the training purpose I'm going to apply it on oh wow I don't have any other data right I need not for each example I need not get surgically correct data and then say I apply on one Guinea Anna so I'm up to going to apply ANOVA irrespective of whether the data is not normal how do I test equal variance we saw earlier a test of equal homogeneity of variance was living test now there is another test which I am giving you but latest ok I can use a Bartlett test okay.

So here mat latest null hypothesis is there are homogeneous alternate hypothesis is not homogeneous I see the p-value it is very small e minus 2 1 scientific from it which means alternate hypothesis has to be accepted and alternate hypothesis is this non-homogeneous means again the radiance criteria are feeling I can again not apply the ANOVA test we are two important tests and we have to apply on a No is one is radiance should be equal and all variable should be all the variable the parameter that you're measuring should be normally distributed both of them are feeling but as I said for training I am again going to apply this right visually inferencing data I made a box plot again from visual inference it is very clear that there is an unequal variance I need not even have to go to a butler test when we heard in fencing also you see there is more scattered more scattering smooth Katha again there is an outlier.

So typically what you should do is out light treatment when there are lots of outliers typically what you should do is outlier treatment because outlier is going to impact the mean so what I am doing is I'm doing outlier treatment also so I'm saying okay any value which goes about some for lack those here I see more or less fuller so any values which are going above fold line I am bringing it down to for life what will have the benefit of that will make it will reduce the variability it will reduce the variance okay so I'm doing outlier treatment also yeah having done all of that I'm now applying one way ANOVA see all criteria have failed but despite all radius field I am just for the sake of the training purpose.

I am now applying a Noir I apply ANOVA I get p-value as this is a print F this is an F test value the formula is f and one way there is not one way ANOVA f is for F test, okay so f is coming one two three and P is coming II - 79 not iPod Isis is coming 0.2 78 zeroes and something what can we conclude null hypothesis rejects alternate hypothesis accept what is an alternate hypothesis at least one of the mean is significantly different.