So revenue per category in the last minute and this again moving on right every minute you have different data maybe this minute books are very hot books over the next minute maybe you know a Footwear or Footwear you know promotion went live because of its people order the food so then the next minute so one type of you know real down data analytics is getting technical stats from recent events the recent events is the real-time part aggregated stats is the analytics part located the next type of real-time data analytics which is done is essentially applying a machine learning model on this stream of events so let us say you know Amazon or in e-commerce website.

Let us say offline producer model which based on what the user is doing on the web side right now and predict whether he will actually go ahead and make purchase in this session or in this interaction with the Dipset let us say a model like this is paid by all the you know machine learning gurus at that company now what you essentially want to do is you know at any point in time take the events that users are doing on your website that stream of events and put this machine learning what on top of it and see how many of those people have just window-shopping versus how many of them are actually liking to make a purchase soon right so that is one other way in which real-time data analytics system is you pay off line models machine learning models for certain business problems.

As these events happen you basically you know run these events through the model to predict or to you know to figure out how many users in this case how many users actually make a purchase right the other type the other way the machine learning models can be applied another use case is you know the high frequency stock trading so there is a machine learning model which says that if a stock you know price change if a stock is of this type of company and if it's price changes like this much amount then you should go ahead and buy this much quantity of that stock now as the stock prices keep changing in the stock market those are the raw events that is the trigger of events and this is the model that is sitting in between which is making a decision whether you should go ahead and buy the stock or you should sell the stock or not buy it all right.

So the second type of real-time data analytics is you know building offline models using the traditional batch phrase batch based analytics that we got spoke about take those models and put it on this real-time stream of data and use it to make business decisions the third and very interesting kind of machine the type of real-time data analytics you can do is basically as the stream of events are happening you actually build your model with that it should I do so the second example the second type.

We discussed here models are big offline and then they were put on top of this river of data or the stream of data and you know those making some business issue but what if you know your model cannot has to be very recent your model cannot be a day old it cannot be even on our own maybe your use case your business the data that you are dealing with is very very sensitive to time and you know it has to be the model has to be made on the fly and used on data coming immediately after so let's say you build a model based on gathering data for five minutes and use that to predict for the next half an hour then again the stream is flowing the Audion build the model for the next five minutes.

I build a new model for the next five minutes and use it for the data flowing for the next half an hour right so these are B you know in my head these are the three types of you know be real-time data analytics that I have encountered one is getting aggregate stats from recent events the other is applying a flying machine learning models on this stream of events and the third is if your use case demands it build the model using the recent data and use it to predict data the rules come next cool right so now we have essentially covered the basics in terms of what is real-time data analytics what how is it different from traditional batch processing and you know how can we model data as a stream of events so that we can think about how to do real-time data analytics now let's come and you know discuss how spark stream enables this.

How would spark swimming you can do real-time analytics right so essentially what SPARC does is you know do the exact modeling that we spoke about it tells it basically asks you to point it to a stream of data and it models it as a sequence of data sets so the stream of data that you pointed it will start receiving events from the stream and what will give out is the sequence of data sets and each data and each data set is essentially sequenced by a time frame or in other words each data set is basically even from the last eight seconds or last five seconds last ten seconds last five minutes last one hour again the real-time last - depends on the use case so you did it okay for me.

I want this is the stream of data and I want you to you know give me the stream of data sequence of data sets where each data set contains data from the asked X minutes or last five seconds in this case so now what we have taken is we have taken a continuous stream and we have you know in a way discretized it we are made chunks out of it logical chunks one after the other and now that that the stream of data has been converted into a sequence of datasets and how much data goes into each dataset is defined by you in terms of how much time you won't spawn to monitor that stream in this case events in the last five seconds.

If you say that then it will collect data for five seconds that will be your first dataset then you collect data for the next five seconds there will be a second dataset then we collect data for the next five seconds then your third dataset and so on so essentially we have taken a continuous stream and discretized it into words sequence of data sets now once we do this you know you will see how each of the three types of analytics that we wanted will become easy we'll discuss that in the slides but moving on.

## Post a Comment