The Magic of Artificial Intelligence
“Every great magic trick consists of three acts. The first part is called “The Pledge”. The magician shows you something ordinary: a deck of cards, a bird or a man. He shows you an object. The second act is called “The Turn.” The magician takes the ordinary something and makes it do something extraordinary. Now you’re looking for the secret… but you won’t find it. You wouldn’t clap yet. Because making something disappear isn’t enough; you have to bring it back. That’s why every magic trick has a third act, the hardest part, the part we call “The Prestige”. 
This is the opening of one of my favorite Christopher Nolan movies, “The Prestige”. This quote appeals to the logical person in me, as it provides a simple yet accurate prescription for the owe we feel when we see magic, so I thought I can apply it to Artificial Intelligence.
Here is the object – it is a weblog file. It is a typical weblog file. It is generated by the browser and contains information about user activities in a given website. It is complex to understand, and requires manual labor and developer skill to turn into a format that can drive real value for the business.
With machine learning, a business user or an analyst, can transform the unreadable format to simple, known and familiar table format.
But we all know that there is not just one file, and it doesn’t always have the exact same format. In Data Management terms, we call this “data drifting” and it is a term that is commonly used to depict the fluctuation in the format, the pace, and the content of data in these new data types. Many variables impact the content of the data: the machine, the OS version, the date, the geographic location, the browser used and many more, posing a serious challenge to organizations that are trying to collect and make sense of the new data. In a survey done in 2016, 25 percent of respondents said they discard data to get analytic insights because they couldn’t scale to process the data being collected. The CLAIRETM engine from Informatica provides artificial Intelligence that can dynamically and automatically transform the file into the familiar table format.
Math is not Magic
Unlike the quote from the movie “The Prestige”, CLAIRE uses mathematical algorithms, not magic (as far as you know!), to earn its Prestige.
The approach is simple. If the data is a machine generated file, then a machine should be able to “learn” it and recognize repeatable patterns in it. For this purpose, CLAIRE utilizes a mathematical methodology named “Genetic Programing”. These algorithms are using the concept of “evolution.” The machine is seeking to find a structure in the file based on a repeatable pattern inside the file. The repeatable patterns enable the machine to construct a structure for the file format. The structure is then scored based on several factors, such as input coverage or derived domains. It then enters a “mutation” phase where several changes are made to the structures, for example, combining substructures to see if the scores improve. This is the evolution phase, and it terminates the process when it determines appropriate fitness of the structure to the data. This process neither requires user input to define the structure of the file nor is specific to a set of industry file formats.
Learn more about genetic algorithms here.