Dear creatives, big data enthusiasts and AI scientists, in my previous article I’ve been discussing key concerns on AI science and the necessity of a mechanical analogous in the area of simulating intelligence unified with markets and software perspectives based on my own paradigm. Continuing the spirit of AI I will continue with the two basic methods of AI Learning, the necessity of understanding Bayes as well as advancing with paradigms from the H2O.ai platform!
 
Supervised Learning and Unsupervised Learning occur as independent machine learning methods. Supervised Learning occurs where we take types of input data and as an output, we predict or classify, we classify by the attribution of a label. For example, predict the price of stocks or predict area clicks at a website or classify in terms of e.g. distinguish fairies from elves.
Unsupervised Learning occurs where there are no labels or correct output but the task is to discover structures or patterns in studying big sets of data such as grouping of data in clusters or reducing the data to a small number of important dimensions.
 
Although the unsupervised learning narration seems more fantastic, the majority of problems for humans including the phenomena supported by the H2O.ai software occur with supervision, meaning there is a ‘right or wrong’ answer in the end indicating … not to do your own thing in your fiefdom. Many of you shall ask and pose the importance of math and calculations. If all of us were programmers we’d have to invent a lot of things and many many wheels from scratch. With visual development installations, the mathematic party starts at an independent sub-level of the platform, as an inner mechanism whereas the interfaces deal with math as a concept and model selection phase, even though we need experience of where the sub-symbolic data processing ends and where we have to come up with math experience. That’s the exact experience of the H2O.ai solution as well!
 
The Bayes rule, indicating a lot of cases as well as studying on the previous, determines a likelihood of whether we accept an event, a mathematic event or a law by creating the before and the after, the before of a-priori odds and the after of a-posteriori odds! A-priori odds are the odds for something happening against it not happening. When new information is available we update the a-priori odds with a-posteriori odds, based on two mathematic calculations. We know for example there is a Likelihood Ratio equal with the probability in case of an event over the probability in case of no event. Then we know that if we multiply this Likelihood Ratio with a-priori odds we get the a-posteriori odds. The important and the necessity of all that, that seem very formalistic, is underlined by the necessity of calculating a likelihood of whether a phenomenon, an event or a mathematic law “actually” happens… To the point, decide on an algebra of events…! 
 
By this and that, you can understand the previous by picking up methods and functions. From a mathematic point of view the previous sound good. But when the underlying field is AI, here’s where big data start to play a role in the game. General AI that is a field of science fantasy, attempts to penetrate our subconscious by believing on a holistic and generalized version of data, for example, big data is everywhere. In AI, due to the fact that we need training data and these data need to possess quality to work perfectly, these data shall be super-focused and targeted and can be researched from public datasets. From the very moment of viewing catalogues of such sets we understand how small or big we can be. Again, training data are focused. 
 
For example, “sentiment analysis in Tweets”, “identifying influential bloggers”, “a tailor-made targeted prediction” (you have to specify where), or “a clicks prediction” at a specific area of a website. If there wasn’t such a targeted research we would endlessly attribute an idealistic dimension to science that is not the case of modern AI. It means that we’ll eventually bypass the grand questions on intelligence and focus our efforts on picking up (big for our sake) the ‘right problems’… For example, there should be millions of Tweets before we predict sentiment and if we use public datasets, we enter the arena of big data from day one. Then you could put your own Tweet in the machine and come up with an emotion…!
 
Furthermore, at the University of Helsinki as well as from general experience from math, we entangled with linear and logistic regression. There, we knew that e.g. the activation function in logistic regression is the sigmoid. And that’s true cos when you have to come up with math and science in business and Silicon Valley technologies, AI, you have to know for sure equations are the same and not something else. Nevertheless, the platform H2O.ai uses the ‘Generalized Linear Model’. GLMs estimate regression models for outcomes following exponential distributions. In addition to the Gaussian distribution, these include Poisson, Binomial, Gamma distributions and many more indicated by a GLM suite of distributions. So now you see that math in universities versus math in technology differ…!
 
So far I have not yet researched of whether such a catalogue is free for end users, having us picked up the methods, because there must be a rationale in terms of picking up a function. Or perhaps that by doing the data work and not the model work we feel delightful by comparing and contrasting data of many mathematic behaviors and this is how should AI work in the end! That depends on what is allowed, technically of course. In developing data there are two other concerns: Overfitting and Data Quality.
 
Overfitting occurs where we try to be too smart in terms of fitting perfectly, using very flexible methods and adapt to almost any pattern in the data, cases practically impossible. Possible though unless we come up with enormous amounts of data, such as the example of millions of Tweets. It applies to supervised learning. Take the data, label them, train an AI method to automatically recognize correct labels. Or, as we’ll discuss in the future try to somehow distinguish training data from test data and compare the predictions to actual outputs.
 
Second, data quality matters. In order to build a model that generalizes well to data outside of the training data, the training data need to contain enough information that is relevant to the problem at hand, and there is “no single best method for all problems”. In any case, getting to know the science we’ll know afterwards where we expand, override and where we alternate. Have a great time folks!