Ready for Artificial Intelligence? (Part 1)
Part 1 – I, Robot
Sometimes, when writing a word document, I would spell a word so wrong, that MS Office spell check would just give up on me. It would say to me, “Lady, I know all the words in the English dictionary, and I have no clue what you want to say.” I would then copy the same misspelled word into Google search engine, and Google kindly and gently say, “Did you mean this?” And I would say, “yes! This is what I meant.”
This is the difference between the old world, and the new world of Artificial Intelligence. Artificial Intelligence technology can deduce what the user needs, and apply it as necessary.
For an AI technology to be successful and penetrate human and business processes is largely dependent on two dimensions:
- The human ability to accept the presence of AI
- The readiness of the technology
Let’s discuss the readiness of the technology first. This requires three conditions:
- Enough data exists on the topic that AI is solving
- Enough processing power exists to be able to process the data in a meaningful timeframe
- Enough mathematical and scientific algorithm tools to analyze the data in interesting ways
My Position on the Issue of Artificial Intelligence Technology Readiness
My colleagues and I believe that AI technology has reached the point of maturity, where it is ready for use in data management. Let’s check it using the three criteria above:
Do We Have Enough Data?
We at Informatica collect metadata (data about the data – not the data itself) across a broad spectrum. This metadata is used as training sets to train the AI engine. The metadata collected includes:
- Technical metadata – such as database tables, column information and data profiling statistics.
- Business metadata – such as the meaning and business context around the data as well as who owns the data.
- Operational metadata – such information about systems and process execution. Examples would include metadata such as when was the data last updated, when was the load process last run, when was the data most recently accessed?
- Usage metadata – such as metadata about user activity including data sets accessed, ratings, search results clicked on or comments provided.
All of this data is used to train the different algorithms and scientific tools to create a tool that can identify, recommend and automatically process steps for the user without the user being actively involved.
Informatica Intelligent Cloud Services (our iPaaS offering) has approximately 7,000 separate organizations, running an average 500 mappings per day and each of those mappings includes at least 20 data transformations. This totals to a Terabyte of metadata information about how users manage their data processes. This is more than enough data to train our AI engine to provide meaningful recommendations and automation for our customers.
Do We Have Enough Processing Power?
Thanks to Moore’s Law, we have more than enough processing power available today. Many of the early AI startups back in the 1980s failed because they simply did not have enough horsepower to process all the data they collected. This has changed thanks to PC gamers who have paid for the progress of Graphic Processing Unit (GPU) chips. While CPU power continues to advance, we are seeing even greater growth in the processing power of GPU, FPGA and ASIC chips for deep learning and mathematical/computations.
Informatica takes advantage of a variety of tools form mathematical algorithms, the growth in data and the growth in machine power had enabled the development the use of applied mathematics. Algorithms and other machine learning mechanism (such as NLP, neural networks, etc.) have flourished. In recent years, these new techniques are achieving state-of-the-art results in many natural language tasks, for example in language modeling, parsing, and many others[i] and are available for commercial use.
So, Why Do You Need AI for Data Management?
Data is the foundation for digital transformation initiatives. But the challenges have never been greater for IT leaders who are looking to deliver the data the organization requires in the timeframe and quality-levels required. Clearly, a different approach is needed, the old tools will not be enough in the new world. You need tools that thrive on the benefits of these technology advancements.
An integrated, end-to-end, data management platform like Informatica’s Intelligent Data Platform embeds Machine Learning,and other AI technologies, it provides the foundation of technical, business, operational and usage metadata that can be used by AI and machine learning tools such as our CLAIRETM engine to provide intelligent recommendations, suggestions and automation of data management tasks.
To find out more download the Infomatica CLAIRETM white paper.
 Goldberg, Yoav (2016) A Primer on Neural Network Models for Natural Language Processing. Journal of Artificial Intelligence Research 57 (2016) 345–420