The Need for Continuity through Software Disaster

The National Preparedness Commission have been working for several years to bring attention to the damage being done by software failures, and the work needed to prepare to reduce the damage they cause. They call this work ‘resilience’, which includes the need for business continuity in the case of failure.

Recently I attended one of their events in central London. There, Ed Steinmueller presented nine challenging statements based on focus groups and round table discussions. Although some of these are captured in their ‘Elephant in the Room’ report,  I was particularly impressed by his simple summary of the problem. The following section explores each statement in turn, looking at the evidence.


1. Information Technology (IT) is now a utility

We no longer regard computer systems as enhancements to our life and organisations; nowadays we simply expect them and the services they supply to be there, to deliver. Without IT, the services will not happen. In essence, then, our IT systems are a utility, a basic service on which we rely, and without which society is unable to survive in its current form.

2. IT failures are significant to the economy at the GDP level

For the United Kingdom, with a GDP of £2,600 Bn, the cost of software failures has been estimated at 35 Bn, a massive 1.3% of GDP. If we regard GDP as the value added by our human work, then that represents 40 minutes lost per working week due to IT failures.

3 Software is inherently fallible: it fails

As virtually any software professional will confirm, it is never possible to get all the ‘bugs’ out of software. In addition, there will always be situations where the ‘correct’ programmed response is not appropriate—where what seemed logical functionality does not have the desired effect.  Most concerning are situations, such as failed software upgrades and ransomware attacks, that can stop a whole system working. 

4. Software has a long life and most new software is not developed in house

Modern software has been being developed since 1970 or so; some of the systems still in use have components going back that far. Almost all modern software uses software ‘components’ created, and software services hosted, by external organisations and communities. 

5. Most companies are not addressing resilience issues

This is an observation from the professionals involved with the round tables – industry, government, voluntary section. Most companies have not designed for the consequences of catastrophic failure in their software systems. It was confirmed by our own future study where cybersecurity thinkers confirmed that their biggest concern for Critical National Infrastructure (CNI) is inappropriate human responses to disasters. 

6. Why IT professionals don't build in resilience

IT professionals are trained to ‘engineer’: to deliver appropriate solutions for appropriate cost. Good IT professional conduct says not to devote effort or resources to aspects that do not benefit the client or employer. Building in resilience has costs; if companies do not value it, IT professionals will not deliver it.

7. Cyber-attack is not the principal threat for resilience

Recent work we did supporting risk assessment for software developers found that, in the UK, of many cyber-related risks faced by UK organisations the only risk involving a hostile adversary in the top six by likelihood was untargeted ransomware. 

8. Al won't fix resilience problems

AI can no doubt be used to help engineers and management improve organisational resilience, though only if the organisation prioritises it as a requirement. As discussed above, this is not happening yet. Meanwhile, it seems highly unlikely that AI-based systems will be inherently more resilient—as anyone familiar with ChatGPT’s outages will tell you. 

9. This is an organisational not an IT department issue

In our CNI cybersecurity future study, the experts confirmed that the solutions, which they called ‘sociotechnical resilience’ involve a combination of software and organisational design and planning.


So there’s a definite need to support and encourage organisations to address the very real and immediate problem of continuity through software failures!

 

- Charles