When you are converting this data into bag-of-words saw then it becomes a victor for instance we have 20 reviews on a product in our data set and there are a total of 20,000 unique words in the complete data set and if you are creating an by M sparse matrix with reviews SN and the unique words as M and then you are populating the number of occurrences of each word as thematic stable then it is called as a feature vector and this concept is also known as word to vector aa feature ization technique where you are converting your text into numbers and the numbers are the ones which are any model can understand the next concept is py is tagging so please keep posting your questions I'll answer your questions so once we go to the Q&A section yeah so the next concept which we are going to cover is P were is tagging and I'll also be explaining the example of how we will implement it using Python so we'll be looking at it in the Python code for those who are having questions related to the concepts which we are covering now yeah and okay coming back in our English grammar classes so it is the same thing and nothing new for example let us consider a sentence I am satisfied with the customer service.

So this is the review which a customer has posted so if you are doing a POS tagging on this then I would be a pronoun and satisfied will be a verb wit will be a preposition and D will be a helping word or determiner and customer service will be a noun so we are tagging text to its part of speech and it can help the machine learning model to determine the weight age of each word and handle feature vectors accordingly and the last concept here is classification so now that we have done a lot of processing on the text we have processed the raw data then we perform tokenization then normalization then stemming then converting it into bag of words and then tagging using parts of speech etc so once we do all these activities on the texture the final thing will be to feed it into a machine learning model and train the algorithm so this classification part is where we are choosing the correct class level for a given input for example.

Let us consider when a new customer gives a refused review stating I'm not happy with the customer service so by looking at the statement we can understand that this is a negative sentiment from the customers side so what does machine learning model a classification algorithm such as a supervisor or learning machine learning algorithm what this algorithm will do is it will look at the patterns of similar to a search or statements from different customers in the past and it will classify this particular raw statement I am not happy with the customer surveys as a negative sentiment so application of all these concepts together would become a natural language processing algorithm now that we have seen what each of this concepts means now we'll go to a case ready so this looking at a case study would help us understand the concepts better so let us take a simple case study where we'll cover the subset of these concepts we will not be covering all the concepts in detail because at a beginner level will cover how we can handle text wrap using in NLP okay so to demonstrate a case study of NLP on banking product reviews I have downloaded Bank Bazar reviews on personal loan across all the banks.

I am going to apply some of the NLP techniques to predict the customer sentiments on personal loans offered by the different banks let us look at it first let us see what kind of data we are going to deal with okay so I have just gone to a bank Bazaar calm website here and I have chosen ratings and reviews and the product type as personal review if you look at this page you can see a lot of reviews posted by different customers and there are many pages here one to 63 pages or their pages of reviews are there so what I have done here is I have stored each of these reviews as HTML pages so if you see here I have stored from page 1 to page 50 I have a stored all the reviews here we are going to now access these reviews and apply NLP using Python ok so the interface which I am currently using to do this exercise so is called as Jupiter notebook and before going through this exercise let us see what are the steps we are going to follow here.

Okay so of these simple steps which we are going to use to implement this case study first we are going to load up the downloaded the HTML files which we had stored in our local directory and then we are going to use a library called beautiful soup and we are going to parse this hash tml file and then we are going to convert the data into data frames so for converting data there is a library called pandas in python and we are going to use that library then we are going to write a simple sentiment analysis function just to define now the sentiment based on the ratings criteria and then we are going to split the data into train and test we are going to do it at a 70 by 30 ratio then we'll use one of the feature extraction technique quality of IDF so in our raw NLP concepts i was explaining about bag of words bag of words is one of the techniques for feature extraction similarly we are going to use tf-idf here so while we are running the code you'll be able to understand what it is and then we are going to build a simple machine learning model so we'll be picking one of the machine learning models.

We'll build it then we'll create a confusion matrix or to check whether our model has performed as expected then we will see what is not correctly predicted and then we'll think of alternative approaches how else we can do it so going back to the jupiter notebook first we are going to load the reviews in the form of HTML files so I am going to use a library named as glob yeah so what this glob will do is it will parse all the directories and subdirectories and it will take the HTML files we have only one directory here Bank bazaar data we are going to use Bank Bazaar data and we are going to parse all the HTML files using store dot HTML so once we run this all the HTML files get loaded into a single document I mean we will get the path of all the HTML files first then we need to read the HTML files so if I am just executing this we can see that now all the HTML file path has been read inside this list.

We'll import the Codex so we are importing Codex because we need to read the files and we need to parse them and store them as a HTML array or a list so what I have done here is I have added a separator called space and I'm clubbing reading all the HTML files and appending them one after the other by a space character so now we got a single HTML file which is a combination of all the 50 HTML files and to look at what is inside the HTML file if you are running the HTML file here it will be a huge document and it might hang so what I will do is I will count the reviews inside the HTML file so there is a class called ellipsis underscore text in the HTML file which is used for storing the reviews so I am going to just count how many reviews are inside it so there are nine ninety six reviews inside this HTML file now let us parse them using beautiful soap.

Post a Comment

Previous Post Next Post