Blog - Woody Epstein (7 Jan 1948 - 5 Feb 2019)

Scandpower Risk Management

It is a great pleasure for me to announce that beginning February 1st, 2011, I will begin work for Scandpower Risk Management, Inc. as a Senior Principal Consultant, Manager of Risk Consulting in Japan.

Scandpower is a leading independent risk management company with 40 years of experience in providing consulting services and software to the international market. Scandpower offers services in the following areas:

Health & Safety
Risk Analysis
Quality
Environment
Reliability & Maintenance
Flow Assurance

Over the past 40 years Scandpower has successfully completed a large number of projects for oil & gas companies, nuclear power entities, regulatory authorities, companies in the process, shipping and transport industries, service companies and research institutions.

From 31st of December 2009 Scandpower became part of the Lloyd’s Register Group. Together we form a world leading independent risk management organization that works to help to improve our clients’ quality, safety, environmental and business performance throughout the world, because life matters.

Woody’s Projects

Download as PDF

MRJ70 / MRJ90 Preliminary System Safety Assessment (PSSA)

ABSC in 2009 performed a PSSA for Mitsubishi Aircraft (MJET). This PSSA will be used to qualify the safety of the new regional jet for use worldwide and for FAA Type certification.

RISKMAN for Nuclear Safety

Woody is the lead developer and architect for the most widely used probabilistic ® RISKMAN safety assessment (PSA) software for nuclear power and fuel facilities in Japan.

Terrorist Risk – MIDAS-AT Software

For over 20 years, MIDAS-AT software has been used in nuclear power, chemical industry, by US Marine Corps, and in US Embassies to manage in real time, chemical, biological, and radiological attacks by terrorists. Installed at over 70 sites worldwide, this year MIDAS-AT was chosen by the Japanese Ministry of Defense for use by the Ground Self Defense Force.

NASA Challenger Disaster

After the Challenger disaster, ABSC was asked by the US Government to institute a risk assessment program for the space shuttle and to perform a pilot study to understand and quantify the chances that a fatal accident might again occur. Our study, finished in 1988, predicted that the chances of a loss of vehicle (LOV) accident was 1/75. The current LOV accident rate is 1/64.5.

Chemical Weapons Disposal

At the end of WW2, the Japanese
Imperial Army abadoned more than
300,000 chemical weapons in
Northern China. The weapons were
buried in two large burial pits near
Haeberling, Now, The Japanese
government must excavate and
dispose of these weapons. From
2003-2007, ABSC has advised the
Japanese government of the risk
involved to the public, the workers,
and the environment, should an
accident occur.

Software Safety Assessment of the Main Engines of the NASA Space Shuttle

ABS created analysis methods to assess the risk of real time software by integrating the analysis techniques of software engineering, software reliability, software system safety, and classical risk analysis. We used this methodology for NASA to help understand the risk of failure of the main engines of the Space Shuttle.

PSA業務の目的（続き）

THE MARS EXPLORATION ROVER PSA for JPL

PSA 業務の目的（続き）
PSA のトレーニング
JP に PSA とは何で、何ができるのかという理解を深める
MER ミッションの EDL （突入、降下、及び着地）の部分に焦点を当てる
アプローチ
配備

VERN: The Virtual Emergency Response Network

VERN is used in Japan to provide real time disaster information to aid emergency response after seismic events.

Woody’s Perspective

Risk Analysis is a formal discipline, and only that: it gives the analyst a rigorous form into which he casts insights, expertise, measurements, and scientific and engineering
knowledge.
By providing a formal structure, it leads and guides the analyst to approach problems in a reviewable and transparent way, which is just good science.
To solve large problems, software tools are necessary, but they are no substitute for intimate domain knowledge, historical knowledge of risk science, an general facility with mathematics.
Quantification, or measuring, the risk/safety of a situation is not the goal of a PRA. And to believe the numbers is folly.
The act of trying to measure the risk involved is the source of knowledge. The acts of trying to assign values, combining them, questioning their accuracy, and building the risk model are the great treasure of PRA: the key to the treasure is the treasure itself.
Uncertainty is not some noisy variation around a mean value that represents the true situation. Variation itself is nature’s only irreducible essence. Variation is the hard reality, not a set of imperfect measures for a central tendency. Means and medians are the abstractions.
Too often risk is defined as risk = likelihood * consequence and safety = 1-risk This can misinform: acceptable risk is a consideration of likelihood AND consequence, not a simple multiplication with safety as the additive inverse of risk. Acceptable risk and safety are normative notions, changing with situations and expectations, and must be assessed accordingly.
Safety cannot be measured by an absence of accidents result which is largely dependent on luck. Safety is the heir of constant, active identification of hazards and elimination. Near misses are NOT testimonials to practices.
I often say that when you can measure what you are speaking about, and express it in numbers, then you know something about it; but when you cannot measure it, when you cannot express it in numbers, your may knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced to the stage of science, whatever that may be. — Lord Kelvin, 1891
If you think you can measure it, then something is probably wrong, anyway. — Woody Epstein, 2010

Can We Trust PSA?

Download the full document as PDF

Qui a vist Paris et noun Cassis, ren a vist.
If one has seen Paris, but not Cassis, one has seen nothing.
— an old Provencal expression

W. Epstein [ref]ABS Consulting, Koraku Mori, Building, 1-4-14 Koraku Chome, Bunkyo-ku, Tokyo, 112-0004, Japan, sepstein@absconsulting.com [/ref] and A. Rauzy [ref]IML/CNRS, 163, Avenue de Luminy, Case 907, Marseille, 13288 Cedex 9, France, arauzy@iml.univ-mrs.fr[/ref]

Abstract : The Fault Trees/Event Trees method is widely used in industry as the underlying formalism of Probabilistic Risk Assessment. Almost all of the tools available to assess event tree models implement the “classical” assessment technique based on minimal cutsets and the rare event approximation. Binary Decision Diagrams are an alternative approach, but they were up to now limited to medium size models because of the exponential explosion of the memory requirements. We have designed a set of heuristics which make it possible to quantify, by means of BDDs, all of the sequences of a large event tree model coming from the nuclear industry. For one of the first times, it was possible to compare results of the classical approach with those of the BDD approach, i.e. with exact results. This article reports this comparison and shows that the minimal cutsets technique gives wrong results in a significant proportion of cases. Hence, our question in the title of this article.

1 Introduction

Katatsuburi
soro-soro nobore
fuji no yama
oh snail
climb Mount Fuji,
but slowly, slowly
— Issa

The Fault Trees/Event Trees method is widely used in industry. Probabilistic Risk Assessment in the nuclear industry relies worldwide almost exclusively on this technique. Several tools are available to assess event tree models. Almost all of them implement what we call the “classical” approach: first, event tree sequences are transformed into Boolean formulae. Then, after possibly applying some rewriting rules, minimal cutsets of these formulae are determined. Finally, various probabilistic measures are assessed from the cutsets (including probabilities and/or frequencies of sequences, importance factors, sensitivity analyzes, …). This approach is broadly accepted. However, it comes with several approximations:

In order to assess probabilistic quantities from the cutsets, the rare event approximation is applied. Under certain conditions the min-cut upper bound approximation can be used, but only when the boolean equation does not have negation and all basic event probabilities are quite low, at least smaller than 10^-2.
In order to minimize cutsets, and therefore avoiding combinatorial explosion, probability truncation (hereafter referred to as simply truncation) is applied.
Finally, in order to handle success branches, various recipes more or less mathematically justified are applied.

Since, up to now, all of the assessment tools rely on the same technology (with some variations indeed), it was not possible to verify whether the above approximations are accurate for large real-life models, especially since to compute error bounds, the exact solution is necessary.

In the beginning of the nineties, a new technology was introduced to handle Boolean models: Bryant’s Binary Decision Diagrams (BDD for short) [Bry86,Bry92]. One of the major advantages of the BDD technology is that it provides exact values for probabilistic measures [Rau93,DR00]. It does not need any kind of truncation or approximations. BDDs are however highly memory consuming. Very large models, such as event trees of the nuclear industry, were beyond their reach. Nevertheless, the methodology can be improved by means of suitable variable heuristics and formula rewritings.

Recently, we were given a rather large event tree model (coming from the nuclear industry). We designed a strategy, i.e. a sequence of rewritings, that made it possible to handle all of the 181 sequences of the model within reasonable running times and memory consumptions. For one of the first times, it was possible to compare results of the classical approach with those of the BDD approach, i.e. with exact results. As the epigram to this section intimates, we should not draw definitive conclusions from a single test case. But a single example suffices to ring the alarm bell: the classical approach gives wrong results in a significant proportion of cases. This is true for sequence frequencies and, although to a lesser extent in the problem under study, for component ranking via importance factors.

The remainder of this article is organized as follows. Section 2 is devoted to terminology (Boolean formulae, event trees, …). Sections 3 and 4 present respectively the classical and the BDD approaches. Section 5 gives some insights on the test case we used for this study. Section 6 reports comparative results for the computation of sequence frequencies. Section 7 extends the comparative analysis to importance factors. Section 8 considers briefly the complexity, runtime, and space considerations when trying to solve large problems. Finally, section 9 presents our preliminary conclusions.

2 Terminology

Proper notation, the basting that holds the fabric of mathematics in shape, is both the sign and the cause of clear thinking.
— to paraphrase Lynn Truss in EATS, SHOOTS & LEAVES

2.1 Boolean Formulae

Throughout this article we consider Boolean formulae. Boolean formulae are built
over a denumerable set of variables and the connectives and, or, not, k-out-of-n, and
so on. Their semantics is defined, as usual, by means of the truth tables of
connectives. We denote by var(F) the set of variables that occur in the formula F. In
the example to be studied, F represents a top event and the variables represent
component failures, or basic events. We use the arithmetic notation for connectives:
F.G denotes the formula “F and G” and F+G denotes the formula “F or G”. The formula “not F” is denoted either by -F or by F .

A formula is coherent if it does not contain negations. From a strict mathematical viewpoint, this definition is too restrictive, e.g. --F is coherent (assuming F is). However, it is sufficient for our purpose.

A literal is either a variable or its negation. A product is a conjunct of literals. It is
sometimes convenient to see products as sets of literals. A minterm of a formula F is
a product that contains either positively or negatively each variable of var(F). If n
variables occur in F, 2ⁿ minterms can be built over var(F). In other words, minterms one-to-one correspond with truth assignments of variables of F. By abuse of notations, we shall write π(F) = 1 (resp. 0) if the truth assignment that corresponds to the minterm π satisfies (resp. falsifies) F. We shall say that π belongs to F when π(F) = 1. A formula is always equivalent to the disjunction of its minterms.

Let π be a (positive) product and F be a formula. We denote by π^c_F the minterm of F built by adding to π all the negative literals built over the variables of F that do not occur already in π. For instance, if var(F)={a,b,c} and π=a, then π^c_F = a.b.c — We shall omit the subscript when the formula F is clear from the context.

Let π be a positive product and F be a formula. π is a cutset of F if π^c_F satisfies F. A cutset π is minimal if no proper subset of π is a cutset. We shall denote by MCS[F] the set of minimal cutsets of F. The reader interested by a more thorough treatment of minimal cutsets should refer to [Rau01].

2.2 Event Trees

The Fault Tree/Event Tree method is probably the most widely used for risk assessment, especially in the nuclear industry. We assume the reader is familiar with this method (see [KH96] for a good introduction).

Fig. 1 (left) represents an event tree. As usual, upper branches represent successes of the corresponding safety systems, lower branches represent failures. In the fault tree linking approach (the one we consider here), each sequence is compiled into the conjunct of the top events (for failure branches) or negation of top events (for success branches) encountered along the sequence. The Boolean formulae associated with the sequences of the above event tree are given on the same figure (right), assuming that the failures of each safety system are described by means of a fault tree whose top event has the same name as the system.

It is worth noticing that the above compilation is an approximation. In our example, safety systems `F`, `G` and `H` probably don’t work simultaneously, but are rather called in sequence. We shall not consider this issue here. The reader interested by mathematical foundations of event trees should refer to Papazoglou’s important article [Pap98].

3 The classical approach to assess event trees

Da Vinci was so steeped in his own tradition that each step he took trancended it.
— Scott Buchanan, EMBERS of the WORLD

3.1 Principle

By construction, sequences of event trees are mutually exclusive. Therefore, they can be treated separately, at least for what concerns the computation of their probabilities.

The classical approach to assess event trees works as follows.

First, sequences are compiled as explained above.
Second, some rewriting is performed on the formula associated with each sequence (e.g. modularization) in order to facilitate their treatment.
Third, minimal cutsets of each sequence (or group of sequences) are determined. Classical algorithms to compute the minimal cutsets work either top-down (e.g. [FV72, Rau03]) or bottom-up (e.g. [JK98,JHH04]).
Fourth, probabilities/frequencies of sequences are assessed from the cutsets. More generally, cutsets are used to get various measures of interest such as importance factors of components, sensitivity to variations in basic event probabilities, …

In this process, three kinds of approximations are used:

Sequences, including success branches, are quantified by means of minimal cutsets (which, by definition, do not embed negations).
Truncation is applied to limit the process, and therefore reduce the possibility of combinatorial explosion.
Probabilities are evaluated using one of two first order approximations: the rare event approximation or min-cut upper bound.

In the remainder of this section, we shall discuss the consequences of these three kinds of approximations.

3.2 The rare event approximation

Let us assume, for a while, that minimal cutsets represent exactly the sequence. The rare events approximation is used to assess the probability of the sequence. Namely, for a sequence S (or more exactly the Boolean formula S that represents the sequence), the probability of S is assessed as follows.

The rare event approximation is actually the first term of the Sylvester-Poincaré development to compute the probability of a union of events:

The rare event approximation gives an upper bound of the probability, it is therefore conservative. By computing the second term of the development, one gets a lower bound of the probability (these two values constitute the first pair of so-called BooleBonferroni bounds):

When the number of cutsets is large, the computation of more terms is intractable. The rare event approximation gives accurate results when the probabilities of basic events are low. In the presence of relatively high probabilities (say >10^-2) and/or many minimal cutsets, the approximation is no longer valid. Consider for instance a 3-out-of-6 system S, with p(e)=0.1 for each basic event e. The exact probability of S is 0.01585. The Boole-Bonferroni bounds given by equation (3) are respectively
0.01009 and 0.02, a rather rough approximation in both cases. The min-cut upper bound approximation is also no longer valid when relatively high probabilities are present, either from negation or embedding alignment frequencies in the fault trees.

3.3 Truncation in minimal cutsets determination

In general, sequences of large event trees admit huge numbers of minimal cutsets. Therefore, only a subset of the latter’s can be considered (the most important ones, in terms of probability, one expects). Algorithms to compute minimal cutsets apply truncation to keep only few thousands cutsets (beyond computations are intractable). The choice of the right truncation value is a result of trade-offs between accuracy of the computation and resource (time and memory) consumption. Expert knowledge about the expected probability of the sequence plays also an important role in that choice.

It remains that, by applying truncation, one gets an optimistic approximation. Moreover, there is no way to ensure that this approximation is accurate. For instance, if we keep a thousand cutsets of probability 10^-9 and by the way we ignore a million cutsets of order 10^-11, then we underestimate the risk by a factor 10. This problem is largely ignored by most of the practitioners.

[page7]

A Short History of Open PSA

Download as PDF

This document was initially presented as a slide-show and is currently available for downloading in PDF

It was in Osaka, eight years ago, on the first evening of the late autumn rains. After a hard day at PSAM 6, my head ached.

I sat in a small yakatori joint in Kita Shinchi, nursing a sake, when a guy wandered in and ordered a shochu straight up. From the look of his shoes, I knew he was French. I soon found out that we had more in common than just Japanese booze and chopsticks.