I was at an IT conference a few years ago. The speaker was talking about application testing. At the beginning of his talk, he asked the audience:
“Please raise your hand if you flew here from out of town.”
Most of the audience raised their hands. The speaker then said:
“OK, now if you knew that the airplane you flew on had been tested the same way your company tests its applications, would you have still flown on that plane?
After some uneasy chuckling, every hand went down. Not a great affirmation of the state of application testing in most IT shops.
Thank goodness airplanes, and in fact virtually all systems and infrastructure we use in the real world are not tested the way we test our applications. And yet we depend on those applications in much the same way as we depend on machines and infrastructure in the physical world.
If we look at any physical system we depend on, such as our water supply, electrical supply, the phone system, subways and traffic lights, we see that all of them have two things in common to manage their operations.
- Prior to any changes in the system, there are extensive tests to make sure there are no negative impacts from those changes
- Ongoing monitoring of those systems to catch issues before they become problems
We’ve all seen pictures of control rooms like the one below with people monitoring activities and catching issues before they become costly problems.
But problems still do happen. For example, in 2003 a software bug caused a broad power outage to the north-eastern and north-central US and parts of Canada affecting over 40,000,000 people.
Imagine what would happen if these monitoring systems didn’t exist? And yet, when it comes to critical IT applications, we don’t see this same type of diligence to ensure the applications are working as expected.
Most companies have systems monitoring or network monitoring applications to ensure the hardware, operating systems, databases, applications etc. are up and running. But few companies, if any, have automation in place that actually focuses on the completeness and accuracy of the data that is being processed and output by those systems.
Just because an application is running, it doesn’t mean it’s running correctly. And if the data produced by an application is incorrect – as presented in a report or data warehouse extract or in a business or customer facing application is incorrect, IT will hear about it hopefully sooner, or much worse, later.
Why? Because the later the incorrect data is noticed, the more harm that erroneous data could have inflicted and the more expensive it will be to correct that harm.
So let’s take a lesson from the physical world:
- First let’s put sufficient testing procedures in place so that, at minimum, we are truly confident in the robustness, scalability etc. of the applications that are put into production.
- Second, let’s ensure that we have monitoring systems in place to catch production application data issues as soon as they occur, and definitely well before they metastasize into something much larger that materially affects the business.
Not only is it an issue of the trust the company has in the data, but it’s also an issue of the trust the company has in the IT staff whose job it is to ensure complete and correct data is delivered to the business.
In subsequent posts, I will drill down further into these topics and talk about strategies and actions that can be applied to ensure the completeness and accuracy of data in enterprise applications.