Hello folks, creatives and science fans! Today I would like to initiate the discussion of what it means to create AI science solutions, influenced from tools and platforms freely offered to us, how it all begins and what it means to grow it to the next levels. For this reason, I will also discuss on my experiences from the University of Helsinki studying AI, as well as platforms influenced from my IBM AI classes including those of the same company! Using experiences from my own example can become a guide of where to begin and where to stop with different tools and science ideas.
A scientific AI solution begins with a first concept, meaning that at first steps, the AI idea has to be explained in a scientific paper subject to evaluation. So far so good, you may think. Scientific papers are subject to copyright, from The US Copyright Office and declare intellectual property of the owner. This is also the case of an AI solution. An AI solution like all other software can be protected as literary work, again by The US Copyright Office, but that’s closer to the final algorithm, not the initial concept.
Listening to the above, you may start to fear your paper’s exposition. How am I protected? You may ask. I will begin using the example of Excel. In the market, there have been dozens of books written about Excel but none of them implies you may know the slightest details of how these cellular databases are being created and how different sheets are being linked together, forming what I like to call many times as multi-generational data, written by default code. Where if you try to create multiple sheets of complex calculations, ten pages are not enough to write the model... You read about the end user utilities, widgets and multiple applications, but the subject of protection is not this. What is protected is the unification of the end product with the inner mechanism. All the above mean that it’s our own responsibility to discover the denominator of discussion in papers, e.g. pure science. Sooner or later, I will face an analogous case in terms of submitting my idea to the University of Helsinki, so in this post, I would like to post the rationale of an AI idea created with the standards of my school.
The topics we’ll ask you to elaborate are:
Your idea in a nutshell: Name your project and prepare to describe it briefly.
Background: What is the problem your idea will solve? How common or frequent is this problem? What is your personal motivation? Why is this topic important or interesting?
Data and AI techniques: What data sources does your project depend on? Almost all AI solutions depend on some data. The availability and quality of the data are essential. Which AI techniques do you think will be helpful? Depending on whether you've been doing the programming exercises or not, you may choose to include a concrete demo implemented by coding, using some actual data!
How is it used: What is the context in which your solution is used, and by whom? Who are the people affected by it? It’s important to appreciate the viewpoints of all those affected.
Challenges: What does your project not solve? It’s important to understand that any technological solution will have its limitations.
What next: How could your project grow and become something even more?
Acknowledgments: If you’re using open source code or documents in your project, make sure you give credit to the creators. Mention your sources of inspiration, too.
Be aware folks that before we proceed with further analyses, programmers who write default code by hand may have to invent the wheel multiple times more if compared with analysts using tailor made platforms in terms of visual development without writing a single line of code. You can understand that, in the attempts of people trying to program websites and cloud apps by hand if compared with those using the backend of WordPress or Blogspot of Google. The backend is a serious objection when it comes to what a developer will deliver to you…
OK now we say, I have to develop an idea, respect copyrights and explain what my solution does. Reality though is much much more advanced when you rely this idea on visual development platforms such as those of IBM’s, or H2O.ai, or get to know what exists out there in terms of existing technology. I have already started using the later and it’s all coming back to me now. What happens, exactly? For example, AI computations with basic functions rely heavily on the input and output variables like in all math, but that’s the developer’s mindset that he has to structure on a multi-layered level if he ever wishes to deliver it to end users. Dealing with visual development from day one, we know for example that H2O.ai accepts as inputs e.g. transformed Word files and transformed Excel files, these are our inputs or any combination of those in clusters.
So now you start saying, OK, I have the input data, I have an exotic material structure already in my hands, so this could possibly mean that anyone with a drag and drop of 20 widgets in a matter of weeks can become a high profile funded business. AI solutions such as those of H2O.ai as well, depend on training data. This means they depend on data that in my case, they’re scripts or numbers organized as files. H2O.ai offers a catalogue of algorithms that is known to scientists as supervised learning algorithms, unsupervised learning algorithms, as well as miscellaneous algorithms known as TF-IDF (Term Frequency Inverse Document Frequency) or the Word2vec algorithm. TF-IDF is being taught in University of Helsinki as well and both of these miscellaneous algorithms pose the potential of transforming scripts into vector representations of numbers forming the core and the fundamentals of computational linguistics. Furthermore, if we could possibly study the TF-IDF functions and embed number files in terms of the vocabulary gaps we will confront, then, you may think, we would have reliable ranges that could be flexible for many patterns of data.
Getting on the other side of mere drag-and-drop of widgets means exactly that. That your solutions in order to be productionized and marketed have to fit your customers’ pattern of data. Let me explain how this works, which is equivalent with the cloud structures of Blogspot and Wordpress when developing a website. e.g. Google offers you the cloud environment and the cloud engine to create your web presence. I have created an e-Disneyland model. Google is not responsible for the final design of the website but just offers tools and functions to do it on your own. Google is not responsible for your website’s design, subject matter, or industry verticals if being developed as a business. The content, daily work and future growth opportunities are completely our responsibility. The same goes with open-source AI environments.
Scientists can reach higher or lower than big data, I say again, big data, depending on how flexible they want their solution to be…! The success of an AI product means that it addresses businesses, institutions and people that they will find it flexible and adjustable to analyze their data. This is not offered by the drag and drop of the mother company. Unless AI scientists create a combination of structures in e.g. my case of H2O.ai, the platform gives you no clue of how it is fit. You will drag and drop a mathematic widget but it’s the prediction accuracy that makes it relevant, which is a matter of data quality…! Transforming between scripts and numbers, if we take into consideration that the quality of script samples is limited and math affect everything nowadays, may also pose questions on what is the best combination in terms of big data or smaller data optimization. Possibly, we now get a few concepts on solutions, are we done? Of course not. Bearing in mind we have to rely heavily on the data work and not the model work, additional structures like those described in the project statement of University of Helsinki exist and have to be answered by all of us as well.
Those of you that will rely in visual development may find common issues with my case, talking about the IBM AI potentiality. IBM offers dozens of AI solutions that could possibly be embedded in a cloud environment known as Start Up with IBM. Visual development nowadays overrides many developers who try to invent the wheel. That’s because there are hybrid solutions of visual development and default code combined and if you still crave for the flexibility of manual code, why not combine it with Silicon Valley potentiality products?! Start Up with IBM exists in the website. Many of the products embedded can be studied independently. A scientist who begins can know for example: If there is chance to download a product and not mingle with the mess and the whirling vortex of the entire IBM, why not focus on the solutions that can be used in later stages as well? Even quantum computing IBM applications folks, can be embedded in your Start Up. It all depends on your research and what tools you’d like to activate.
All along, the feasibility of this discussion of mine was to lighten up the AI opportunities of the 21st century (we don’t deal with hardware robotics in this article) for you to know that there have been already made products in science, combining science with pre-existing technology. Impressive applications that will embrace your dreams without having to invent the wheel from day one, imagine an entire NASA in your room alone and know where to begin and where to stop when it comes to science that can’t function properly unless it is supported by technological magic. Have a great time folks!
0 Comments