FFORT Fault Trees Input Format

Fault trees in FFORT are provided in an extended version of the Galileo textual format.
The original format is described in the Galileo manual . The version used in FFORT is extended with additional gates and attributes for repairable fault (maintenance) trees.

We assume here a familiarity with fault trees and their structure. An overview of fault tree analysis can be found in this paper.

Basic structure

A fault tree is a directed acyclic graph, in which the leaves describe basic events and the internal nodes describe gates. One of the root nodes is called the top level event, describing the system failure being analysed.

Unlike standard Galileo, we do not require an FT to have a unique root. This has two purposes: First, it more easily allows the analysis of multiple failure types in one FT, as one only needs to change the top level event. Second, it allows gates that do not have meaningful outputs (such as FDEPs) to be included without cluttering the tree with dummy outputs.

This structure is encoded as follows: Each non-empty line of a Galileo file describes one node in the FT. The top level event is marked by the special toplevel <name>; line. Gates are described using <name> <gate> <child1> <child2> <...>;. Basic events are described as <name> <attr1>=<val1> <attr2>=<val2> <...>;.

Syntax details

Every non-empty line ends with a semicolon. A name is a sequence of letters, numbers, underscores, and/or hyphens. Optionally, a name can be enclosed in double quote marks (FTs in FFORT always do so).

We support the following types of gates (extensions beyond standard Galileo are denoted in blue):

Gate Type	Symbol	Description
OR	`or`	Fails when any child fails.
AND	`and`	Fails when all children fail.
K of M (voting)	`KofM`	Fails when any K children fail.
Spare	`csp`/`wsp`/`hsp`	See section 'Spare gates'.
Priority-AND	`pand`	Fails if and when all children fail in left-to-right order.
Sequence enforcer	`seq`	Fails when all children fail, enforces failures in left-to-right order.
Functional dependency	`fdep`	Never fails. When the leftmost (trigger) child fails, all other children fail at the same time.
Stochastic inspection module	`NinspR`	Performs periodic inspections. See section 'Maintenance'.
Exact inspection module	`inspT`	Performs periodic inspections. See section 'Maintenance'.

For basic events, we support the following attributes listed in the table below (extensions beyond standard Galileo denoted in blue). We denote natural numbers by N, positive real values by R, and probabilities (real values between 0 and 1, inclusive) by P.

Attribute	Syntax	Description
Failure rate	`lambda=R`	Rate of the exponential or Erlang distribution governing failure times.
Failure probability	`prob=P`	Failure probability of the event.
Dormancy factor	`dorm=R`	See section 'Spare gates'.
Restoration factor	`res=P`	Probability that a component failure has no effect and is immediately repaired.
Phase count	`phases=N`	Number of phases in the Erlang distribution governing the failure times.
Inspection threshold	`interval=N`	Phase of the Erlang distribution at which inspection observes degradation. See section 'Maintenance'.
Repair rate	`repair=R`	Rate of the exponential distribution governing repair rates. See section 'Maintenance'.

Example

An example can be seen below (the HECS-1-1 FT from FFORT), with the graphical representation on the left and the Galileo description on the right.

toplevel "System";
"System" or "Processor" "Memory" "Bus" "Interface";
"Processor" and "PG1" "PG2";
"PG1" wsp "P1" "Ps";
"PG2" wsp "P2" "Ps";
"Memory" 3of5 "M1" "M2" "M3" "M4" "M5";
"Bus" and "B1" "B2";
"Interface" or "Hw" "SW";
"P1" lambda=1.0e-4;
"P2" lambda=1.0e-4;
"Ps" lambda=1.0e-4 dorm=0.4;
"M1" lambda=6.0e-5;
"M2" lambda=6.0e-5;
"M3" lambda=6.0e-5;
"M4" lambda=6.0e-5;
"M5" lambda=6.0e-5;
"B1" lambda=1.0e-6;
"B2" lambda=1.0e-6;
"HW" lambda=5.0e-5;
"SW" lambda=6.0e-5;

Failure time distributions

The failure times of basic events can be governed by several probability distributions. We currently support discrete probabilities, exponential distributions, and Erlang distributions, as well as combinations of the discrete distribution with the others.

A discrete distribution is governed by a single failure probability p. If the event fails, it is failed for the entire time being analysed (i.e., it's failure time is 0, and it cannot be repaired).

An exponential distribution is governed by a failure rate λ, specifying that the probability of the basic event failing before time T follows the equation P(T ≤ t) = 1-e^-λt.

An Erlang distribution is governed by a number of phases N and a failure rate λ, specifying that failure of the event occurs after N successive exponential distributions have expired, each with rate λ.

A combined distribution is formed when both a failure probability p and time distribution D are specified. In this case, with probability 1-p, the BE never fails. With probability p, the BE fails at times as specified by the distribution D.

Spare gates

Spare gates describe cases where spare components may be used to replace primary components if these primary components fail. Initially, the spare gate uses its primary (i.e., first) child. When this child fails, it attempts to use its second child. If this child is either failed, or already being used by some other spare gate, it attempts to use the third child, and so on. If none of its children can be used, the gate fails.

Dynamic fault trees distinguish three types of spare gates: cold spares (csp) specifying that the spare component do not fail when not in use, hot spares (hsp) signifying that unused spare components fail at the same rate as when they are used, and warm spares (wsp), signifying that unused spare components fail at a reduced rate than those in use (specified by the dormancy factor.

There is a potential for ambiguity if a component is a child of different types of spare gates. We specify the behaviour of a component as follows:

Every FT element is either active or dormant at any point in time.
An active basic event follows its normal failure distribution. A dormant basic event follows the failure distribution where its failure rate is multiplied by the dormancy factor.
A (direct or indirect) child of a cold spare has a dormancy factor 0, unless the BE has a different explicit dormancy factor. Similarly, a child of a hot spare has a default dormancy factor 1, and of a warm spare a dormancy factor 0.5. A basic event that is a child of different types of spare gates must have an explicit dormancy factor.
The top level event is always active.
Gates propagate activity as follows:
- A direct child of a spare gate is active if the spare gate is active and using that child.
- The first direct child of an FDEP gate is always active.
- Sequence enforcers and inspection modules do not affect activity of their children.
- The direct children of a gate of any other type are active if the gate is active.
- If none of the above specify that a BE should be active, it is dormant.

Maintenance

Repairs and maintenance are specified using repair rates and inspection modules. These behave as follows:

If a BE with a discrete probability distribution fails, it is not affected by repairs.

If a BE with a continuous probability distribution fails, and the BE is not the child of any inspection module, and the BE has a repair time distribution specified, then the BE will be repaired after the time governed by the repair time distribution.

An inspection module performs inspections at times governed by its rate distribution: an NinspR module has times governed by an Erlang distribution with N phases and rate R per phase, an inspT module has times that are all integer multiples of T. At the time of an inspection, the module checks whether any of its children have failed or are governed by an Erlang distribution and have degraded to or past their threshold phase (specified by the interval property). In such a case, all child BEs begin repairs immediately. If a child does not have a repair time distribution, it is immediately repaired to as-good-as-new condition. If a child does have a repair time distribution, it will return to as-good-as-new condition after the repair time distribution elapses (it may degrade further or even fail during this repair time, but will still be repaired when the repair time elapses).