Most of the big data discussions have been on the technology or the numerously re-played business discoveries used as examples of big data’s power. Many companies are still in the experimental stages of big data, asking for guidance regarding what their benefits would be, how they can re-align themselves to take advantage, and what new processes might be helpful to make them successful with these powerful new capabilities.
One company in the social media space, advanced quickly in all of these areas, and discussed their key learnings at a recent TDWI breakout session. At this standing-room only talk, the company explained how they were analyzing vast amounts of web-log data and learning a tremendous amount about their customers, the relationships between their customers and how they can improve the success rate with their customers by improving their algorithms and their website. This highly rated breakout session had two key takeaways:
- First, this company made their data warehousing Software Development Lifecycle (SDLC) a two-stage process. In the first stage, they had a tiny IT team do quick-and-dirty processing of JSON files and turn them over to sophisticated data analysts in a query-able form to determine whether the data held analytic interest. If it did not, the data was thrown away. But if the analysts found potentially interesting information, they fed requirements to the second stage IT data warehousing team who cleansed, conformed and loaded this data into a properly modeled data warehouse for ongoing loading. This two-phase approach allowed the business to get involved early in the lifecycle, getting their feedback within hours of the availability of the data, and yet the second phase allowed the data warehouse to maintain a well-designed and maintainable architecture.
- Second, when an attendee asked the question: “Did you need to hire any new skills to accomplish what you just described?” The presenter answered “No”, and you could sense the shock and excitement in the room. To elaborate, the presenter stated that they were using Informatica HParser to parse JSON files into query-able data objects. It was so simple that their already-trained Informatica developers were able to become proficient on it within a day and were able to turn around this big data with the people and skills they already had. This caused quite a buzz in the room, because in the area of big data, one of the biggest areas of concern is a lack of appropriate skills in the industry.
So big data is something that can be explored today, with the skills you already have on hand. Get the business involved as soon as possible, and use technology like HParser to use the people you already have to get this data to the business much more quickly.