Reflections on New Year’s predictions

Source: Nikkei Asian Review

Haiti on Jan. 12 marked the fourth anniversary of the magnitude-7.0 earthquake that left between 100,000 and 200,000 dead. The nation’s government estimated that 250,000 residences and 30,000 commercial buildings collapsed or were severely damaged. The Haitian people have barely recovered.

Earthquake Map of Japan Four years and one day after that disaster — on Jan. 13, 2014 — a  magnitude-6.4 earthquake occurred about 40km north of Puerto Rico. Although some buildings were damaged, no one was killed.

Neither of these events should have been too surprising given the history of quakes in those areas. Yet it seems both Puerto Rico and Haiti were unprepared.

This brings us to efforts to predict and prepare for Japan’s next big shake. Just before New Year’s, the Japanese government’s Central Disaster Prevention Council said there is a 70% likelihood that an earthquake of magnitude 7.3 will hit below south-central Tokyo during the next 30 years. The council said the worst property damage and loss of life would occur if the quake comes on a winter evening with winds blowing at 8 meters per second.

What, exactly, does a 30-year period mean? And should we believe such predictions?

To start with the first question, a “return period” of 30 years is often used in the insurance industry, mostly because the lifetime of certain insurable properties is 30 years. In any given 30-year span, an event may occur once, twice, more often or not at all.

To annualize a 30-year return period you simply divide the likelihood by 30. In this case, a 70% chance in 30 years means a likelihood of about 2% per year. Certainly, 70% in 30 years sounds scarier than 2% per year.

Moreover, given odds of 70% in 30 years, what is the likelihood of the worst-case scenario? Since earthquakes happen independently of winter evenings and wind speed, we must factor these in. When you do that, the result is a likelihood of about 0.06% per year, or roughly 2% in the next 30 years. That is a far cry from 70%.

In maps we trust?

Tokyo sits on a very complex system of tectonic plates. Therefore, it is difficult to predict which part of the Tokyo metropolitan area will be most affected by the next major earthquake.

The CDPC task force created hazard maps for 19 separate quakes of three different types, all with intensities in the magnitude-7 class. The 7.3 earthquake under south-central Tokyo is a worst-case scenario because the areas near the epicenter are packed with industrial facilities and districts full of wooden houses. Massive fires could erupt.

But how accurate are the maps? Dr. Robert Geller of the University of Tokyo’s Department of Earth and Planetary Science points out that the good news is that we have methods for making earthquake hazard maps. The bad news is that the methods have not been verified. And the worst news is that the hazard maps do not agree with the data.

“Science works by formulating hypotheses and testing them against observed or experimental data,” Geller goes on to say. “Most fail and are rejected. The few that appear to succeed are further tested. Most of these, too, are eventually rejected. … Problems can arise when unvalidated hypotheses are adopted as the basis for public policy without the recognition that they may be on shaky ground.”

Does giving predictions, which may or may not be based on solid science, help or hinder the public?

Naoshi Hirata, a professor at the University of Tokyo’s Earthquake Research Institute, seems to think it helps. “I want people to take disaster management measures,” he said, “aware that any part of Tokyo could be directly hit by an earthquake.” Hirata took part in selecting quakes for the task force’s projections.

As the years pass without a major earthquake, the public’s vigilance will lessen.  This is called ‘safety drift’. Imagining a huge disaster can dangerously lower our guard for smaller disasters that can have cascading consequences.

Geller again: “It is time to tell the public frankly that earthquakes cannot be predicted. … All of Japan is at risk from earthquakes, and the present state of seismological science does not allow us to reliably differentiate the risk level in particular geographic areas. We should instead tell the public and the government to ‘prepare for the unexpected’ and do our best to communicate both what we know and what we do not.”

Nikkei Asian Review: Regular Columnist

WoodySince November 2013, Woody has been a regular columnist for the Nikkei Asian Review, a news publication comprised of an extensive network of contributors including academics, government leaders and captains of industry. Nikkei Asian Review works to give the world a fuller picture of business in Asia, from every angle.

To date, Woody’s published articles are listed below, linked to their original source:

Death by software

Source: Nikkei Asian Review

On Dec. 7, along with hundreds of others at Paris Orly Airport, I spent the day waiting for a flight to London. As we found out after several hours, all flights to the London area had been canceled because of a communications glitch. The cause: a software problem in the internal phone system of the U.K.’s National Air Traffic Services.

As the BBC reported, the software failure happened when controllers working overnight were due to make the handoff to the day team at around 6 a.m. “To be clear, this is a very complex and sophisticated system with more than a million lines of software,” an NATS spokesman was quoted as saying. “This is not simply internal telephones, it is the system that controllers use to speak to other (air traffic control) agencies both in the U.K. and Europe and is the biggest system of its kind in Europe.”

Eurocontrol, which manages European air safety, said around 1,300 flights, or 8% of all air traffic on the Continent, had been severely delayed. Almost 10% of flights at London Heathrow Airport were canceled.

A gentlemen en queue with me turned and said, “Well, at least no one dies from software problems.”

If only that were true.

Computers are increasingly being introduced into safety-critical systems and, as a consequence, have been involved in accidents. Two of the most widely cited software-related accidents involved computerized radiation therapy machines.

The first involved the Therac-25 radiation machine. From June 1985 to January 1987, there were six known cases in which the machine gave massive overdoses, resulting in deaths and serious injuries.

The Therac-25 tragedy partly stemmed from a simple update to the user interface. The update allowed users to edit individual data fields, where before all data fields had to be re-entered if any error occurred. The software did not recognize the changes due to timing problems.

Years later, in January 2006, then 15-year-old Lisa Norris was a patient at the Beatson Oncology Centre in Glasgow, Scotland. While undergoing radiation therapy for a relatively rare and complex brain tumor, it was discovered that the 17 dose fractions Norris received were some 58% higher than the prescribed dose fractions. Norris died in October 2006, hastened by the overexposure.

Dangerous secrets

While much can be learned from such accidents, fears of potential liability or loss of business make it difficult to find out the details behind serious engineering mistakes.

“Placing barriers in the way of widespread dissemination of relevant details of adverse events is a way of preventing learning in any organization,” said Dr. John Wreathall of the Resilience Engineering group. “Bear in mind that one hallmark of a resilient organization is that it is prepared not only for its own failures, but those which it can learn from others. The more resilient an organization is, the larger are the lessons it has learned from others.”

This can manifest itself in several ways. One is recognizing a broader set of challenges that the organization can face, including those it creates for itself as a result of its own activities. This helps the organization better understand “what went wrong” and calibrate itself against the experiences of others.

Of course, just having the data available will not in itself ensure safety. But cutting off the public dissemination of data will ensure that accidents can be repeated.

As Dr. Nancy Leveson wrote in her Therac-25 investigation report: “Most accidents are system accidents; that is, they stem from complex interactions between various components and activities. To attribute a single cause to an accident is usually a serious mistake. We want to emphasize the complex nature of accidents and the need to investigate all aspects of system development and operation to understand what has happened and to prevent future accidents.”

These problems are not limited to the medical sector. It is still a common belief that any good engineer can build software, regardless of whether he or she is trained in state-of-the-art software-engineering procedures.

Software engineering is a young discipline. A liberal estimate puts its age at 63 years. We should not be surprised that software and its practitioners have few construction standards, procedures, guilds, review boards, licensing regimes, acceptable manufacturing practices or continuing education requirements such as we may find in structural engineering, law, architecture and even massage therapy.

After all, when civil engineering was the same age as software engineering is now, the wedge had not yet been invented.

Woody Epstein serves as manager of risk consulting at Lloyd’s Register Consulting Japan. From 2011 to 2012 he was also a visiting scientist at the Ninokata Laboratory of the Tokyo Institute of Technology, where he was involved in analyzing the Fukushima disaster. The opinions expressed in this regular column do not reflect those of his employers, their affiliates or clients.

How to take safety to the next level: Expect the unexpected

Source: Nikkei Asian Review

Risk assessment is conceptually very simple. We are looking for the answers to three questions: What can go wrong, how likely is it, and what are the consequences?

But can risk analysis, and standard safeguards based upon such assessments, protect us from the unexpected?

Imagine a complex and potentially dangerous facility, such as a nuclear power station or an offshore oil rig. Assume that the plant’s equipment is highly reliable, its workers and managers are vigilant in testing and other procedures, and training is thorough. If an unforeseen accident does occur, will these high standards lower the likelihood that it will be severe? Surprisingly, the answer is no.

Origins of failure

In 1991, I was doing an assessment of the software for the main engines of NASA’s space shuttle. I came upon an article by Herb Hecht of SoHaR, an U.S. provider of reliability software. In the article, titled “Rare Conditions and Their Effect on Software Failures,” Hecht makes four interesting points:

  1. In well-tested systems, rarely executed code has a higher failure rate than frequently executed code;
  2. Consequences of rare failures in well-tested systems are more severe than those of other failures;
  3. When there is a failure in a well-tested system, it is significantly more likely to be caused by a rare event;
  4. The inability to handle multiple rare conditions is a prominent cause of failure in well-tested systems;

To put it plainly, we have tested out all of the easily found errors. What we are left with are rare errors with severe consequences.

Do Hecht’s observations about software apply to other technological systems? I believe they do.

Nuclear plants, for example, regularly assess the risk of unwanted events that should be eliminated entirely. Through exceptional planning, maintenance and organizational development, the foreseeable problems are vanquished.

But they have to draw the line somewhere. Some events are of such a low likelihood — say, one out of 1 million — that they are considered acceptable.  These are the unexpected, rare events. If there is a failure, chances are it will start with such an event.

Snowball effect

Hecht makes another noteworthy observation in his study: All software that failed from three rare events also failed, perhaps less severely, from two. And three-quarters of the software that failed as a result of two rare events also failed, again perhaps less dramatically, from one.

Think of it this way: If all proper procedures and conditions are in place, and if symptoms of an unwanted event begin to take us onto a failure path, it could very well be the start of a severe accident. Perhaps more failures will occur to compound the situation and form a scenario that may have never been imagined, or was previously dismissed as improbable. There will be no procedures, experience nor training to aid in recovery.

Basically, the first rare failure has a good likelihood of being a harbinger of a much worse situation. The three-stage accident at Japan’s Fukushima Daiichi nuclear power plant was exactly this type: earthquake, tsunami and hydrogen explosions.

Risk assessments focus almost entirely on known dangers. As a result, procedures, training, regulations and methods of operation are all designed to guard against these same threats. Rarely does an organization explore novel possibilities for failure — scenarios that change critical assumptions, have slightly different symptoms, or include multiple failures. The myth of safety only reinforces this attitude.

To be sure, without this focus on checklists and protocol, controllable situations could easily escalate out of control, undermining day-to-day safety. Still, a second culture is also needed — a culture of expecting the unexpected.

This requires playing “what if” with the risk model, questioning assumptions and looking at possible (if unlikely) scenarios. When there are initial indications that a system may be going astray, the second culture should kick in. This is called having “requisite imagination.”

Safety is connected not only to risk but also to expectation. In operations like nuclear power plants, oil refineries or chemical production facilities — all of which are in the well-tested category of engineering enterprises — we must be ready for the rare events in order to defend against them.

Woody Epstein serves as manager of risk consulting at Lloyd’s Register Consulting Japan. From 2011 to 2012 he was also a visiting scientist at the Ninokata Laboratory of the Tokyo Institute of Technology, where he was involved in analyzing the Fukushima disaster. The opinions expressed in this regular column do not reflect those of his employers, their affiliates or clients.

Post-Fukushima: Time to refocus Asia’s nuclear debate

Source: Nikkei Asian Review

October was an extraordinary month in Japan. A record five typhoons hit, with a near miss by a sixth. A 7.1-magnitude earthquake struck off the coast of Fukushima Prefecture on Oct. 26, just as one of the storms was arriving.

A worried world focused on the possible effects of all this on the Fukushima Daiichi nuclear power station. And indeed, although the earthquake did no further damage to the stricken plant, the heavy rainfall contributed to more contaminated water flowing into the facility’s port area.

According to the Japan Meteorological Agency, an average of one to two typhoons approach Japan in October. And since 1996, Japan has had an average of one earthquake greater than 7.0 every five months, or about two per year.

We would be correct in saying that a month with five typhoons, one near miss and a 7.1 quake was an unexpected confluence of unlikely events. But are all unlikely events really unlikely? Is there any way to expect — and guard against — the unexpected?

Unlikely yet likely?

Let us look at nuclear power generation and the possibility of a core damage accident. Over the years, the International Atomic Energy Agency and most regulators have endorsed several safety goals. Two such goals stipulate that reactor operators should ensure that for each unit, the likelihood of a core damage accident is no greater than one per 10,000 years; the likelihood of a large release of radiation should be no greater than one time every 100,000 years.

These seem like pretty unlikely events, right?

Let me put this in perspective. As of March 10, 2011, there were 438 commercial nuclear power generating units around the globe. If each unit was operating 70% of the time, and each one was safe according to the goals I have just described, the likelihood of a core damage accident somewhere on the planet was about three times every 100 years.

So a core damage accident, although unlikely in a single nuclear reactor, does not seem so improbable if we look worldwide. In fact, in my lifetime there have been such accidents at three commercial reactor sites: Three Mile Island in the U.S., Chernobyl in Ukraine and Fukushima Daiichi.

Now, core damage accidents do not necessarily result in radiation releases. At Three Mile Island, no radiation escaped. If we use the same reasoning for large releases of radiation, then we should expect a Fukushima-like accident about once every 330 years.

Please remember, too, that I am only talking about the likelihood of an accident, not the consequence that people will get sick or die from one. The contamination surrounding Fukushima Daiichi is measurable, but according to the United Nations Scientific Committee on the Effects of Atomic Radiation, the disaster is unlikely to cause any health effects in the future among the general public and the vast majority of workers.

Still, consider this: In the next 15 years, the number of new nuclear units will increase significantly worldwide. Therefore, we should expect the odds of core damage accidents to increase worldwide as well. In Asia alone, China has 29 reactors under construction, adding to 15 operational units.

So here is my question to you: Can you accept this risk?

Keep in mind that energy-related risks will not disappear if we abandon nuclear power. Look at all of the gas, oil, and liquefied natural gas tanks on the coasts of almost every Asian country. We have done studies on how large earthquakes, tsunamis, and typhoons could impact the chemical, oil and gas industries. Believe me, there are accident scenarios with environmental, social and economic consequences that could approach those of Fukushima.

The Fukushima accident has clearly shown that nuclear power inevitably carries risks. It is time to change the terms of the debate from the oversimplified safe/unsafe dichotomy to an honest and open discussion of what the risks are and what is being done to mitigate them.

In this context, the risks of nuclear power have to be considered in a balanced comparison with other risks. At the end of the discussion, the public and the leaders they have elected — rather than technical experts — should make the final decision.

Woody Epstein serves as manager of risk consulting at Lloyd’s Register Consulting Japan. From 2011 to 2012 he was also a visiting scientist at the Ninokata Laboratory of the Tokyo Institute of Technology, where he was involved in analyzing the Fukushima disaster. The opinions expressed in this regular column do not reflect those of his employers, their affiliates or clients.