If you want to use stop words such as ease the off for I don't want to count all this as my features so just remove them from my corpus so if you want to use those kind of stop words you can store all those words in a text document and then load the text document here so you can do a lot of variable customizations on creating a feature vector so once I have executed this tf-idf function let us now convert a train data undertaste a tie into feature vectors so I'm just going to directly use this function on my train and text Rita and after doing this if I'm running the staff ID of plane features okay so there are 664 records in my train data and there are thousand 148 unique words which are which have been chosen by the tf-idf model and it has created a 664 by 1 1 1 4 4 8 sparse matrix suppose I want to look at the data inside it this is how the data will look like.

So on each and every record and on each and every word this T of log n + 1 / NW + 1 + 1 this formula has been a play then we have got floating point as the output here okay similarly we are we have done the same thing on thee tf-idf test features as well so we have 328 records in total and thousand 148 features in that so we have got a 328 by thousand 148 spas Mattox as the output okay so someone is asking me how voice recognition is done by ear NLP there are many libraries which are already available within Python which can convert your audio data which is in the wav format based on the for instance if we take up the Google speech recognition there is a library called speech recognition so you can import that particular library and there are functions which can convert your audio data from any language into the text a tie in any language you want to so you can make use of such inbuilt libraries.

Okay so from now once we have done our tf-idf train and test features once we generate it will also add few more functions here first to calculate the matrix so what is mean by matrix no I am just running this matrix function right now I am NOT going to explain what this matrix function will do once we apply our machine learning model we'll be able to understand what this particular function will do so here we have written a function called train predict evaluate and evaluate the model so what is meant by machine learning models so we know what our machine learning models since we just have a one line explanation previously in this session but there are machine learning models can be either supervised or unsupervised models which can learn from the data and identify the patterns there are various supervisor learning algorithms such as but there are many regression and classification algorithms and there are algorithms like multinomial name barriers and a decision tree random forests etc so all these algorithms saw or working with the background of mathematical calculations and probability and statistics inside them so let us not go into the details of those algorithms right now.

We will see how to use those algorithms here so this is a train predict and evaluate function where we are going to have a classifier and we are going to take the training data as saw the features as our training input and the labels of the training data and we are going to take the test features and the labels of the test data as the input and let us see what it does first we are going to build the model classifier dot fit so you are going to take a classifier as the input and you are going to filter the training features and the training labels so once you fit it on the classifier what will happen as the model will start learning it will learn look at the patterns what kind of data what kind of records have been classified into positive sentiments what kind of records have been classified into negative sentiments it will classify according to that and it will learn from the data once we do this saw classification and learning then we can predict the data by passing only their test features we don't have to pass the test labels as input so your test data has got 368 reviews sorry 328 reviews and there are 328 sentiments in that either positive or negative sentiments.

So the sentiment column is going to be omitted we are just going to pass the reviews and based on the learning from the classifier the predictions will happen on the data and this gate matrix so this previous function where we have created a function called gate matrix where we have we are printing the accuracy precision recall and the effin score on the data so basically in simple terms what we are doing here is we are comparing the data we are comparing the actual versus a prediction and we are giving that as the result in the matrix so we are calculating an accuracy based on how many is actually correct how many is predicted correctly now we'll train a model called multinomial nave bias so this is one of the machine learning models which can take in your training data as input and it cannot classify based on the probability of occurrence of the of any particular label as the output so again.

I am NOT going into the actual methodology of the saw multinomial named bias but how you can use this particular algorithm is you can go to socket line and you can select name bias import multinomial only this is the name of the classifier so I'm going to run this okay before the senator on the previous yeah so let us under support the function then we are going to run multinomial new bias as a model and we'll store it inside M and B so if I'm running the same NB underscore best okay if you look at this you can see the description so multinomial name bias classifier is suitable for classification with discrete features so example word counts for text classification so for this kind of a NLP techniques this is also one of these suitable algorithms and you can look at the detailed explanation and the parameters which are used inside the saw classifier here so you just need to do a shift tab on this particular function and you can see the details okay now we are going to run our train predict evaluate model function on this data so we have given the training features training data test features test data everything as input here and we have an accuracy of around 78% with a precision of 73 percent and a recoil of 78 percent and an f1 score of 74 percent.

So accuracy is directly the simplest logical where actual was as predicted and you are taking a percentage of it what is meant by precision so precision is nothing but the ratio between true positive and the sum of the true positive values plus false positive values what is meant by this if we run the confusion matrix okay so we have a test saw there is a in the x-axis we have 0 & 1 by x is 0 & 1 so what is the what is in this x-axis it is the actual values of 0 & 1 and y axis is the predictor values of 0 & 1 so if we look at their talk to negative is 12 and predicted negative is 12 here and true negative is 12 and predicted negative predicted positive is 56 so here true positive 16 predicted 244 so what is actually calculated here under precision and recall is nothing but you are taking a ratio of true positive / true positive plus false positive in the precision and in the recall you are taking a ratio of true positive they will be troopers 2 plus false negative so why am i using male bias and why am I not using other model.

So I just wanted to show one of the examples so I picked nave bias here Livius is not the only model which you can use you can use any other model as well so since it is a classifier classification you can also try it with a decision tree or you cannot tie it with the SVM exit or any other model you can use any classification algorithm here okay so now we are seeing numbers all over here and there is I assume you would have got a bit of clarity on how we can handle text using various or machine learning algorithms I mean how to handle our text using natural language processing so now let us see what is actually in the output and how can we see that with the actual output instead of looking at numbers I have just written a small function here which will take in all our test data and the test labels and the prediction algorithm.

Post a Comment

Previous Post Next Post