I’m hoping that you are not familiar with the U.S. food stamp program. This is a government funded program that provides people who are living below the poverty line with money that can only be spent on food. Clearly it’s a critical program that demonstrates the importance of information technology and the people who are enrolled in it desperately need it. That’s why it’s unacceptable when the IT systems that support the program stop working. Clearly the IT managers who are in charge of this project are the ones to blame…
What Went Wrong With The Food Stamp System
The core of the problem is that the company Xerox is responsible for providing the back office IT systems that run the U.S. government’s food stamp program. The Electronic Benefits Transfer (EBT) system allows recipients of government food stamps to purchase goods using a digital card with a set spending limit. The other day, a power outage during a routine maintenance test caused the temporary glitch in the food stamp program.
One of the results of this glitch was that shoppers were able to sweep through the aisles at stores and buy as much as they could carry because their preset spending limit had been removed. This caused a great deal of concern at Walmart stores when shoppers started to show up at the checkout with fully loaded carts.
However, another side effect of the glitch was that other food stamp shoppers were unable to purchase any food. The glitch caused food stamp recipients in 17 states to lose access for much of a Saturday to the electronic system used by stores to verify their benefits. This left many unable to buy any groceries.
What Should Have Been Done
Clearly this situation should never have been allowed to happen. The Xerox team that designed the food stamp system has not done the required amount of testing. It appears as though they got themselves caught in the IT equivalent of a perfect storm: during a routine test of a backup system, a power glitch hit and that placed the system into a previously unknown state.
The reason that I’m holding Xerox and their IT Managers, responsible for this is that we all know that events like this can happen. No, we can’t predict exactly what they’ll look like, but we can almost certainly predict that they’ll happen. That’s why it’s the IT Manager’s responsibility to make sure that the IT systems that they are responsible for have the ability to deal with unplanned circumstances.
There were two problems associated with this outage: the granting of unlimited spending to food stamp program participants and the inability of people to access the system. The removal of spending limits is a simple programming bug and effective code reviews would have detected this long ago. Much more unacceptable is the extended outage that a brief power outage caused. This is a fundamental system design problem that should never have occurred. Xerox needs to go back and fix things. Improving their code review procedures would be a good start, but redesigning the food stamp system to improve its reliability is a must.
What All Of This Means For You
The U.S. food stamp program is a critical system that allows people to buy food who could not otherwise afford to do so. This means that it is a mission critical system and always has to be there to support these people who really can’t speak for themselves. However, the system recently experienced an outage that prevented people from purchasing food for a period of time.
The outage is reported to have been caused by a routine test of the system’s back up capabilities. As IT professionals, we can all understand how this type of testing can cause a ripple effect that could cause a system to shut down. However, when a system is a mission critical system, the design of the system has to take events like this into account and needs to have ways to prevent it from impacting the vulnerable end users. Clearly this was not the case.
IT manager in charge of the program has some answering to do. As the IT manager it is their responsibility to evaluate the level of risk associated with all of the systems and clearly this has not been done for the food stamp application. Let us hope that they now realizes the importance of this system and that design changes will be made that will prevent an outage like this from ever happening again.
– Dr. Jim Anderson
Blue Elephant Consulting –
Your Source For Real World IT Management Skills™
Question For You: What do you think Xerox’s first step should be to prevent this from happening again?
Click here to get automatic updates when The Accidental IT Leader Blog is updated.
P.S.: Free subscriptions to The Accidental IT Leader Newsletter are now available. Learn what you need to know to do the job. Subscribe now: Click Here!
What We’ll Be Talking About Next Time
There is one universal rule to being a successful IT manager: your team has to be working on the right things if you ever want to be successful. This is one of the IT manager skills that we all need to have. You would think that this would be pretty elementary – of course you’re working on the right things. However, you’d be surprised at just how often this turns out to not be the case. There are a lot of reasons why you might find yourself in this situation; however, let’s spend some time talking about how you can avoid this problem all together.