Gonzo QA IV: Mistakes, I’ve made a few

Dateline: London, September 2007

My biggest mistake was to delete the UK master invoice file of the major chemical company I worked for at the time, shortly before going home for the evening. I had been promoted to Database Administrator (DBA) a few weeks before and I was carrying out routine house-keeping activities – or so I thought. It turned out that my recently-departed predecessor had not been naming database objects logically, had not been carrying out routine house-keeping activities and further, the database management system (DBMS) was quite capable of deleting files in use without warning or protest. The result of all this was I went home unaware. Further, the overnight batch file which wrote the days invoices to the master file and then deleted itself wrote the days invoices to null and then deleted itself. When I came in the next day, Accounts Receivable staff had just been told that as well as inputting the day’s invoices, they would have to re-input the previous day’s too, essentially doing two days work in one day and nobody was to go home until it was done. They gave me the cold shoulder, the Finance Director gave me an earful which included the full cost of my error rounded to the nearest five thousand pounds and the IT Director sent two of his people down to give me a kicking on his behalf. The Senior Systems Programmer beat me up himself; he always was a hands-on kind of guy. The previous night’s dump had been restored in my absence but it turned out that transaction logging had never been enabled, so roll-forward until a few minutes before I had accidentally deleted the file was not possible. At the time I did not know that you could run a DBMS without transaction logging enabled. My response of “how about that?” was not appreciated by sysadmin staff at all.

This incident was my first true insight into the importance of recognising, assessing and managing risk, whether or not it is part of your job description.

I failed to recognise the risks I was accepting during the hand-over from my predecessor. I had assumed that he had been approaching his work logically and by the book. He had not. He had an idiosyncratic approach which worked for him but, rather spectacularly, did not work for me. I had assumed that the DBMS contained a series of checks and balances which would aid me in my work. It did not. It did what it was told immediately and without question. I had assumed that transaction logging was turned on. It was not. An investigation the previous year had concluded that transaction logging required too many valuable CPU cycles and too much valuable disk space, i.e. it was too costly to implement when compared to the potential return.

Having made a whole series of false assumptions, I had failed to identify and assess the risks inherent in my new job. What was I required to do that might be risky? How was I going to do it in a way more likely to succeed than fail? What would be the impact of failure? What would I do if things failed? What could others do if things failed in my absence? What alerts would signal failure and who would receive them?

Having failed to recognise and assess the risks, there can be no surprise that I was not managing risk in a meaningful fashion. All of which changed after this incident, of course. Everyone involved now recognised that a risk existed. Senior management recognised that they might have been complacent in vetoing the cost of transaction logging. The revenue lost by my error was approximately twice the cost of installing and running transaction logging for a year. I was encouraged to buy and build a suite of tools to minimise the risk of human error when carrying out my tasks. I was also given permission to build a test installation; now I didn’t have to do everything straight on to live, I could safely practice somewhere else first.

Although this story is twenty years old, it is still relevant today. I work in a fast-paced, high-pressure environment and it is not unusual for a project to go through a complete change of personnel on both the agency and client side as it races from a bright idea to a shiny finished product. A new team member, especially one subbing for another, will typically assume that all is well with the project so far, and want to build their contribution on top of the sound foundations built by their predecessors. I encourage staff to recognise that this is not necessarily so and to assess the situation in as much detail as possible, given the circumstances. This then allows them to consider mitigating, eliminating or otherwise insuring against the major risks they have identified. Should the only reasonable course of action turn out to be to tolerate the risk, then at least this is done in an informed way rather than by default (which is what I did). I consider this to be part of on-going quality planning. As a software quality assurance professional, I have seen many deep issues identified during quality control that can be traced back to new project team members unknowingly building their structures on sand. What a waste.

For further articles on how I apply lessons learned from my mistakes in the past to my current position, please keep visiting.

P.S. Anyone who thinks that hand-over is only a risky business in the world of IT should speak to members of a 400 meter relay team.

First posted in: The Tester, September 2007, page 14.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s