Making systems

Volume 2: Life cycles

Richard Golding

Release: 0.4-review

Part V: Development methodology and life cycles
Chapter 21: Introduction
21.1 Purpose
21.2 Tasks
21.3 Defining tasks
21.4 How work grows and changes
21.5 Uncertainty and choosing tasks
Chapter 22: Development methodologies
22.1 Purpose
22.2 Characteristics
22.3 Commonly-discussed methodologies
22.4 Practical considerations
Chapter 23: Life cycle patterns
23.1 Introduction
23.2 Life cycle and development methodology
23.3 Key ideas
23.4 Purpose of life cycle patterns
23.5 A model for patterns
23.6 Documenting life cycle patterns
23.7 Work steps and artifacts
23.8 Life cycle and teams
23.9 Life cycle and planning
23.10 Principles for a life cycle pattern
Chapter 24: Example life cycle patterns
24.1 Introduction
24.2 Whole project life cycle
24.3 System development patterns
24.4 Post-development patterns
24.5 Detail patterns
24.6 Comparisons and lessons learned
Part VI: Reference life cycle
Chapter 25: Introduction
25.1 Projects with proposals
25.2 Project-wide decisions and reviews
Chapter 26: Project preparation
Chapter 27: Project support
Chapter 28: Development
28.1 What is development?
28.2 The development phase
28.3 Purpose development
28.4 Concept development
28.5 Development methodology
28.6 System feature development
28.7 Recursion to component development
28.8 Feature development variations
28.9 Acceptance
Chapter 29: Operation
29.1 System production
29.2 System production examples
29.3 System deployment
29.4 System deployment examples
29.5 System operation
29.6 System operation examples
29.7 System evolution
29.8 System evolution examples
29.9 System retirement
29.10 System retirement examples
Chapter 30: Project ending
30.1 Project cancellation
Chapter 31: Using the reference life cycle
31.1 Meeting life cycle principles
31.2 Relation to NASA life cycle
Bibliography

Part V: Development methodology and life cycles

What life cycles and development methodologies are. It covers what goes into a life cycle pattern and how the patterns relate to other parts of how a project operates. It defines development methodologies and how these relate to life cycles. The last chapter lists several example life cycle patterns, which lead into a comprehensive reference life cycle in Part VI.

Chapter 21: Introduction

18 April 2025

21.1 Purpose

Project operations is about the tasks that people do in the project. The next chapters focus on project operations, expanding on the material in Chapter 20. I discussed there how project operations can be broken down into parts to make it more tractable: development methodology, life cycle, procedures, planning, and tasking.

Development methodology (Chapter 22): The general approach the project takes to organizing its work, such as whether to work in iterations or linearly, how large tasks should be, and whether to work on tasks independently or synchronously.
Life cycle patterns (Chapters 23 and 24): the sequences of steps and tasks used to complete various phases of the project, and how those steps and tasks depend on each other.
Plan (Chapter 62) and tasking (Chapter 66): tracking the overall path to completing the project, and making decisions about who should work on which tasks at what times.

The reasons to organize work include making efficient use of people and resources, with little wasted work and avoiding idle time, and ensuring good quality in the final system. Section 7.3.5 and Section 20.1 provide further details.

The model of work I am using focuses on creating and organizing system artifacts (Chapter 15). The artifacts are mostly organized around components, with each component described by a defined set of artifacts. (A few other artifacts are used in project operations or project support.) The work is performed as tasks that take in some artifacts and do work to generate other artifacts.

Operations is about working out the right tasks to be done, then when to do each one, and keeping going as the tasks change.

The rest of this chapter begins with defining tasks: how most of them involve building the artifacts that define each component, and how one artifact builds on information in other artifacts. The project’s development methodology influences how the work of building the artifacts is broken into tasks. Building a component includes tasks for checking work and decision points along the way. The tasks will include decision points that cut across many threads of development.

The discussion next moves to how the known set of tasks changes as a project moves along. As the team defines new components, they implicitly define new tasks to design and implement those components. As the work progresses, there will be problems to fix or changes requested, each of which involves more tasks.

Finally, the chapter wraps up with a discussion of how to choose what tasks to work on when. These decisions can be framed in terms of uncertainty, and how different choices lead to increasing or decreasing the uncertainty in the system’s artifacts.

21.2 Tasks

A task is a unit of work that one or more people will perform. The common understanding of “task” is generally sufficient. The work creates and updates the artifacts that make up the system.

In Section 20.4, I introduced the general idea of a life cycle pattern: a set of phases or steps, with dependencies from one phase to another. Each phase or step is made up of some tasks, tasks having dependencies among themselves. Phases can have milestones; milestones are themselves tasks. Tasks are assigned to and performed by one or more people.

The work done in a task uses some artifacts as inputs and produces or updates some artifacts as output. The dependencies between tasks come through artifacts that are outputs and inputs to tasks.

21.3 Defining tasks

The easiest way to understand how tasks are defined is to suspend disbelief for a bit and pretend that one knows all of the components in the system. This is, of course, unrealistic: the only time when one actually knows everything is at the end when the system is complete. One can begin by looking at all the components and mapping out the work to be done with full knowledge, and then see how to work in a more realistic situation where much of the system is not yet known.

The basic approach to defining tasks starts by working out the system artifact graph, based on the components in the system, and then working out the tasks needed to build all those artifacts well.

21.3.1 Component breakdown structure

Defining tasks starts with the component breakdown (Section 11.3). At the end of the project, it names all the components in the system and shows how one component is a part of another. The system as a whole is the top level of the breakdown structure. The breakdown structure will not be complete until late in the project; components will be added and removed as the work proceeds.

This example is of a simple small spacecraft. A real mission system would include many system components in parallel, including ground systems and launch vehicle.

21.3.2 Artifact patterns

For each kind of component, there is a map of the artifacts involved and the dependencies between them—how one artifact builds on previous ones.

Dependencies include tracing: a fact recorded in one artifact is the reason that something exists in a following artifact. One of the principles of a good set of system artifacts is that every part of every artifact can be traced back to a need in the system purpose, and that every part of the system purpose is realized in implementation artifacts and verified in other artifacts.

Bear in mind that these are artifacts, not tasks. Tasks make and update artifacts. One could do a simple mapping from artifacts to tasks, having one task to make each artifact in turn. In practice that doesn’t work, as I will discuss in a later section.

The artifacts involved depend on the kind of component though they have elements in common. All components start with a purpose and end with verification. Each specific project may use a different variant of these artifacts, depending on its needs. Here are four examples.

A simple regular component. Most components that are developed in house follow this base pattern of artifacts. It starts with a purpose, proceeds through concept and specification, then to design and implementation on one track and implementing and performing verification on another track.

A component supplied by a vendor. When a vendor designs and implements a component, there are additional artifacts related to selecting the vendor and the contract with them. The vendor may be responsible for the component’s design and implementation, but the project will still maintain a design and implementation that are received from the vendor.

System as a whole. The major artifacts for the whole system are similar to those for a component, with two main differences: records of how the project worked out the system’s purpose, and validation of the system against the stakeholder needs.

System built under contract. A system that is built under a competitive contract to someone else has additional artifacts (and the order of work is somewhat different). The project develops a proposal based on a request for proposals; the proposal typically includes at least the system concept and some parts of specification or design. If the team is selected, there is often negotiation that results in a contract. This is discussed more in Section 25.1.

21.3.3 Mapping artifact patterns onto breakdown structure

This leads to a map of all the known artifacts, the product of the breakdown structure and the component artifact maps (In the diagrams that follow, I will use initials rather than the full names for each artifact in order to manage visual complexity.)

One can make this artifact graph by starting with the component breakdown structure, which is has a tree structure, and replacing each component in the tree with its corresponding graph of artifacts. When a component has subcomponents (including the system as a whole), its implementation artifact is expanded with the artifact graphs of each of its subcomponents.

Applying the system and regular component artifact graphs in the previous section to the small spacecraft component breakdown presented earlier yields the following system artifact graph. Note how the patterns recurse, with subcomponent artifacts nesting within component artifacts.

The number of artifacts grows rapidly, the the point that even for the components listed for the simple spacecraft—most of which are still components with further unlisted subcomponents—the graph is too large to print and view in one piece. Using tools that organize the artifacts and using common patterns helps people navigate all the information without seeing it all in one place.

21.3.4 Development methodology and life cycle patterns

The next step is to define the tasks that will make and update the artifacts. I use development methodology (Section 20.3) and life cycle patterns (Section 20.4) to define patterns of the tasks the make artifacts. The development methodology defines the general style of how the project will work: how it chooses to iterate, and how it prioritizes one kind of work over another. The life cycle patterns complement the development methodology with clearly-defined patterns that guide the team through steps of the work.

As I noted earlier, a simple pattern of building each artifact in turn does not work well: good practice includes reviewing or checking work at regular intervals, the project will likely have milestones where decisions must be made, and building in order (“waterfall methodology”) has known problems.

Basic life cycle patterns. One life cycle pattern defines, for example, the sequence of tasks involved in building a component. The basic flow might mirror the dependencies of that component’s artifacts. The pattern adds reviews and decision points to the tasks.

The flow in this pattern shows the tasks that are to be performed as the component’s artifacts are created or updated. When someone performs one task, changing some artifacts, this triggers the tasks that follow in the pattern. For example, if someone changes the design, tasks for the design review, implementation, implementation review, verification, and acceptance review follow. When the design is changed, however, the verification design and implementation don’t need to be changed as so no tasks are performed.

The pattern as drawn does not address versioning and baselining. Presumably each task creates a version of its output artifacts, and those versions are baselined after passing the next review checkpoint. Note that this is similar to the workflow practices implemented in several software version control systems, where changes are made on a branch specific to a set of changes and promoted to a shared branch or the main line through a controlled “push” mechanism that requires review and approval.

Cross-cutting life cycle patterns. Other life cycle patterns add milestones or decision points that cut across multiple components. For example, the project might have a milestone for demonstrating a plausible technical approach in order to get funding to proceed to in-depth design. The project might impose a review and decision milestone to get approval before starting an expensive and irreversible implementation step, such as starting to build an aircraft’s airframe.

The flow below shows how an overall system design review (of all components to the third level) is added to the overall flow of tasks. A review like this would be used relatively early in a project to check that there is a plausible approach for building the system; this is similar to a NASA Preliminary Design Review (Section 31.2.4).

Development methodology. While the life cycle patterns define in general what tasks need to be done and how they flow into each other, the patterns do not address the bigger picture of how to organize work. Should each task be done one after another? Should multiple tasks be done in parallel? Should some tasks be done at the same time, with people working together?

The project’s development methodology answers these kinds of questions. It defines the basic working style for the project.

As discussed in the next chapter (Chapter 22), development methodologies reflect choices about whether the team works in feature-by-feature iterations or in one linear flow through the system; how big iterations are (if used); how far forward planning looks; and how people working on related parts work together.

One possible methodology—as mentioned earlier—is to proceed linearly through building a component, performing each task in the life cycle pattern one after another. This is often called the basic waterfall development methodology. The illustration above for a basic component life cycle pattern shows how this works. When the specification is complete, design tasks can begin. When a design step is complete, implementation can begin. If the results from one task do not pass their review, the work repeats parts of the previous task to address the problems that were found.

This linear methodology is simple, but as many people know well, it works poorly in practice—and I have never seen a project actually operate this way. The problem is that each step commits the work to a particular path, even if the choices made lead to problems later. This is why the North American transcontinental railroads were built using multiple steps (Section 4.7): one group would range ahead finding a rough route through mountains; a second group would follow a long way behind to survey a specific route; and later groups would actually build the railway. If the railroad had been built in a more linear fashion, with construction following not far behind route planning and surveying, early decisions would have forced it into crossing high mountains or building long tunnels under them, which was beyond the capabilities of the time. The alternative would be to abandon a lot of constructed track way and begin building on a different route.

Alternative development methodologies provide greater flexibility at the cost of somewhat less simplicity. They break up the work of building a component into multiple iterations. The iterations can be seen as slicing up the life cycle pattern into multiple repetitions. Each iteration might step through the life cycle pattern, focused on adding some set of features to the component. The iterations might not all be identical; concept might be done first, and iterations only cover specs through verification. Each iteration has one or more goals for what should be implemented and verified at the end, with each iteration building on the next until the component is complete.

An iterative methodology might also include an initial “iteration” for identifying hard problems, and prototyping or doing trades, before proceeding to finish specification and starting development. This is similar to a railroad project looking for mountain passes and feasible river crossings far in advance of committing to a particular route.

A methodology like this addresses some of the problems with the linear approach. The second iteration in the example involves working on the concept, specification, and designs together. This can mean, for example, refining a first version of the concept by sketching out specification and designs. While sketching them out, the team learns about the component and investigate design ideas to see what choices there are to make, what technologies or subcomponents are available, and begin looking at safety or security issues. By first sketching these without trying to make a proper specification or design artifact, the team can learn about problems that might have caused problems later. At the end of the tasks in the second iteration, the team has a concept that is likely to be feasible, and they have notes or sketches for specification and design. In the third iteration, they can evolve the specification and design sketches into a first draft of specification and design artifacts.

Some methodologies also organize work on related components so they are done concurrently—at least during some iterations. This might be done to develop and verify integration before building out detailed implementations, in order to minimize the risk of finding integration errors late in development when they are more expensive to correct. For example, two components that communicate over a wired connection would be designed at the same time, possibly by one person or possibly by a couple people working closely together. In these methodologies, one often uses mockups and stubs in place of functionality that will be designed and built in future iterations.

Spiral development methodologies, on the other hand, break system development up into large iterations. During one large iteration (one turn of the spiral) a lot of people focus on some shared functionality that touches many parts of the system. In one project involving multiple spacecraft, my team identified several system-wide functional milestones that built on each other, then worked toward one after another. We first demonstrated that we had a working development and testing environment, then built a skeleton of the software infrastructure, then built some simple applications on that infrastructure. This approach incentivized people to work together on shared goals that could be achieved in a month or two, rather than having everyone work in a different direction.

The whole system development effort can be treated a similar way. Most projects have milestones that cut across all the work: early decisions of whether to proceed on the project or not, demonstrations of work in progress for funders, and so on. These can fit into the development methodology by treating those milestones like the endpoints of development spirals.

Summary. This sequence shows how one can start with a component breakdown structure then apply life cycle patterns and a development methodology to work out the tasks that the project will be performing.

This first step has been unrealistic because it is written as if all the components are known at the beginning, and that nothing changes. In the next section I discuss how to handle discovering the component structure and adapting to changes.

21.4 How work grows and changes

In reality, one doesn’t know all a project’s components or milestones at the beginning. They are discovered bit by bit as development proceeds. More work is discovered as problems are found that need to be fixed, or as stakeholders ask for changes.

At any given time, only so much of the system’s structure has been worked out. People will know only a few artifacts that are needed early in the project, and over time they will learn the rest of the system structure.

To understand how the view of the component breakdown and associated artifacts changes, consider four scenarios:

Discovery of system structure as design proceeds;
Needed rework, because of resource negotiation;
Needed rework, because of design or implementation problems; and
Change requests.

System structure discovery. Discovery works generally downward and outward from what is already known. Looking at a component breakdown structure early in the project, one might see the top level system and a few of its first level components; some of the first level components might have some of their subcomponents identified; the rest of the system would be unknown. Work would then add more first level components, flesh out their subcomponents, and begin to work out relationships among the components (Chapter 12). Over time all the components and their relationships get identified.

This means that the high level structure—the top couple of layers of components in the breakdown structure and their relationships—are defined before the details of all those components are worked out. The high level structure is expensive to change once it has been established and other components are defined in its terms. This means that it’s worth spending some time sketching out, modeling, or prototyping some of the key parts of the high level components before committing to their structure.

Consider the evolution of the design for an electrical power system, or EPS, for a small spacecraft. It will be known from the beginning that some kind of EPS is needed—after all, all of the avionics need power. Many of the objectives and constraints are known early on as well, such as the size and volume of the spacecraft, rough mass limits, rough power needs of different subsystems, the mission’s approximate orbital geometry leading to estimates of how much time in sun and how much in eclipse. Several of these will be negotiated and refined as the project goes on, so changes will happen.

After this first step, the high level spacecraft design can be committed, or baselined: the spacecraft will definitely need an EPS and further work can proceed, assured that there will be one. The design of the EPS and the specific demands on it are still unknown, but it is now reasonable to proceed to work to discover that next level of information.

When the time comes a little later to design the EPS, it can then be broken down into a few components, following a common pattern for small spacecraft. It will include solar panels for generation, batteries for storage, a power distribution unit to control how power is used, wiring to move energy around, a safety mechanism to disable all power until the spacecraft has been deployed, a mechanism to permanently drain and disable power, and various sensors.

This collection of components is initially a tentative proposal. The rough design is not yet ready for people to put in greater effort because there are likely many open questions: Are there appropriate parts available? Can the approach meet general requirements? (Launch safety requirements are a common source of EPS complexity.) Can the EPS likely provide the power needed to run spacecraft systems? The team investigates questions like these, typically including modeling the EPS, looking for parts or suppliers, and constructing prototypes.

As these proposed components are tentatively added to the system, they add to a tentative version of the component breakdown structure. This implies that the artifacts involved in those components are tentatively added to the system artifact graph, and the lifecycle patterns apply to create a number of tentative tasks. The result is a large number of tentative artifacts and tasks.

This is similar to techniques used in software version control. In these tools, people make changes to local copies of files, then check the changes in to working branches. The working branches are a view of a new tentative system version. At some point the contents of working branches are pushed to a master branch, representing the committed (baselined) information that everyone is working with. There are usually several independent tentative versions (working branches) being developed in parallel, and they will need to be reconciled at some point. The tools enforce checks and reviews before changes are merged into the master branch (baselined or committed).

The distinction between artifact versions that are tentative and those that are committed or baselined is how much others can rely on the state of the artifact remaining stable. If artifact A is stable, then someone can use it when working on another artifact B without risking that the work on B will become obsolete because A changes. Using other words, the distinction is based on the uncertainty in A.

Uncertainty is measured in varying degree. One thing can be more or less uncertain than another; almost nothing will be completely certain and few things will be completely uncertain.

An artifact version’s status as tentative or baselined, on the other hand, is a binary condition. The status thus obscures some of the information about a version of an artifact. The status does, however, reflect a decision: whether the uncertainty in a version is good enough for others to work from or not.

Needed rework due to resource negotiation. Many systems will have some resources that must be shared among multiple components, and the demand from the components must be met by the supply of the resource. Mass, power, bandwidth, and space are common shared resources. A maximum mass provides a constraint on the sum of the masses of all components. The power available at different times must be sufficient to meet what electrical components will use; the ability to generate electricity and the demand for it change as the system goes through different activities. The demand for data transmission is constrained by the capacity of communication channels. Physical parts must be able to fit within the volume available for them. Chapter 43 goes into more detail about these kinds of resources.

Because the resources are shared, a change to one component can affect the others. A component may be designed to meet its specification for a particular maximum mass, for example, but if another component is over mass the properly-designed component’s specification (and thus design and implementation) may have to change to meet the shared constraint.

Consider a system that has fifteen components and a constraint that the overall mass must be less than $M$ . Each component $i$ is given a specification of its share of mass, $m_{i}$ . The shares are set so that $\sum_{i}m_{i}<M$ . (Best practice is to reserve some fraction of $M$ as margin, to absorb estimation errors.) As component designs proceed, three components come in over their budgeted share, two well under their share, and the total is more than $M$ . The team needs to find savings in some to components bring the total below the maximum. This involves negotiating among people responsible for the different components; the team investigates alternate designs that might reduce mass. The end result is that the specifications for some components will change to give them a lower share $m_{i}$ , and some overweight components may get a larger share.[1]

The tasks involved can be expressed in a life cycle pattern. The pattern starts with a task to identify and report on the resource problem, followed by investigating and negotiating ways to change resource allocations. The investigation is usually done by people handling multiple affected components working together, not independently. Once the team has found a new set of allocations, it gets reviewed. Once approved, the specification for each affected component is updated and the pattern of tasks for building the component is reiterated. This means that designs and implementations are changed, reviewed, and verified.

During the investigation and negotiation, the team creates several new, tentative versions of component designs. These are typically rough concepts or designs, not fully worked out. They are hypothetical; they represent some way that the team might change designs to reduce resource usage. Some of the alternatives will be no better than the current approach. Some alternatives may use less resource, but come with a tradeoff, such as changing some functionality or moving it to some other component.

It helps the team if they keep track of these hypothetical designs, along with the rationale of why an alternative might be an improvement. If the alternative involves a potential change that will affect other parts of the system, it is helpful to keep track of both how the design changes the component’s specification and what it does to other components. The team uses this information when choosing the new approach for all affected components. They also use this information to update each component’s specification and design.

Needed rework, because of problems found. No real team is always perfect; people will find problems in work that has already been done. A project should have a general approach for how people handle finding and fixing such errors.

Dealing with a problem begins with detecting that the problem exists. This might come from observing an error during verification; it might come during a review of a design or implementation; it might come when interpreting a specification to build a design or interpreting a design to implement a component. The people who found the problem create a problem report artifact to record what they have found.

The next step is to investigate to determine where the sources of the problem lie. This might be within the implementation of one component—the easy case—but it might reflect a specification or design flaw crossing multiple components. The example of the Mars Polar Lander loss discussed in [Leveson11, Chapter 2] shows a case where individual components behaved according to their specification, but the components were not mutually consistent. In that case, a sensor could produce transient signals that led the control system to conclude that the spacecraft had landed and thus shut down propulsion; in fact it was still at some altitude and thus crashed. This example shows where a problem results from an incorrect emergent behavior, and and the “cause” is distributed across multiple components.

Following the investigation, there is a decision about how to fix the problem. This might include changing the specification of one component, and changing the designs of a couple others. The decision may include evaluating multiple possible ways to solve the problem before deciding. The evaluation might include prototyping fixes and testing them.

Once decided, the steps to make the changes follow. If the decision is to change a component’s specification, this leads to changes in the design, implementation, verification design and implementation, and eventually verification of that component against the revised specification.

This sequence of events can be divided into two parts: a first part involving detecting a problem and deciding on a fix, and a second part that is a reiteration of the steps involved in making or updating components. The investigation and decision part involves some new artifacts: records of the problem report, of the analysis, and of the decision. The second part involves updates to existing artifacts, starting with whatever artifacts need to be fixed and then following the normal life cycle pattern for building the affected component. The overall flow is similar to that for handling resource overage.

Note that making a change to one component’s artifact usually leads to updating artifacts that follow from it: changing a specification means changing design and verification approach to match. Changing a design means changing implementation. Changes to implementation lead to a need to verify that the component is still correct; this re-verification may continue up through several levels to ensure that the changes integrate properly with other parts of the system. These artifact changes also mean that tasks like performing reviews or getting approvals will need to be repeated, depending on the associated life cycle patterns.

Many times when I have received a software problem report, the problem has been fairly simple. There has been a simple typo in a user interface message, or an off-by-one error in a computation. It hasn’t required a lot of investigation or decision-making about how to solve the problem; it’s a few characters changed in one line of source code. This leads to an abbreviated version of the task flow above: get the report, look at the code, fix the one line, check that it works, done. The tools and processes involved should make easy situations like this easy to handle. In software development, many of the tools streamline these processes: a problem report system connects with a version control system, and the version control system enforces getting reviews and approval before a version is baselined.

At the other extreme, some problems are complex enough that they will need careful analysis first, and implementing fixes will require several people to work together. A simple mapping of the artifacts involved to tasks to update each one doesn’t reflect that some changes have to be done together: co-designing changes to components that interact, for example.

Change requests. When a customer or other stakeholder changes what they need, the system may need to be changed as a result. A request for a change is not an indication that there is a flaw in the system, but only a request for something different than what they asked for before. Changes don’t always happen just because a stakeholder asked; sometimes the project decides not to make the change.

Change requests follow a similar flow to the one for fixing problems. Someone realizes the desire for a change, and creates a change request artifact. The team evaluates the change request to determine whether to investigate the request or not. They then investigate the scope of the change to learn how much work it might be to build; this work often includes (at least informally) creating new versions of the system purpose and concept. The investigation also includes updating analyses to see if making the change will cause problems for other stakeholder needs—for example, requiring more investment from funders to make the change, or interfering with safety or security needs.

The reporting and investigation steps record information in a number of artifacts:

If the team then decides to pursue the change, they begin to do tasks that update the system purpose and concept, flow into system specifications, and so on down to affected components.

Note that by definition change requests from stakeholders only involve changing the system as a whole. The internals of the system are the project’s responsibility, and stakeholders only know about what’s inside when they need to perform some validation activity. For example, certification of an aircraft validates that its design and implementation meet airspace regulations. The regulator only gets to specify the standards an aircraft must meet to be considered acceptable, and on validation they can find that the design does not meet those standards, but the regulator cannot request a change to a specific component’s design (though it may sometimes seem like they do that).

21.5 Uncertainty and choosing tasks

All of these ways that systems change share something in common: unknowns. While a system is being designed, parts of it will be unknown. When resources need to be traded or when problems need to be fixed, it is not known how the situation can be resolved. Before parts of the system have been verified, there will be flaws, but what and where are unknown.

I have discussed how new work is tentative while it is in progress, and at some point the work is baselined so that others can use that work. Promoting an artifact from tentative to baseline depends on whether that artifact is likely stable enough that changes aren’t going to come along too often, causing additional work on the artifacts that depend on this one. In other words, being baselined depends on how much is still unknown about the artifact.

These aspects of work are reflections of uncertainty. Uncertainty about an artifact is a measure of what is not yet known about it, or what changes may happen in the future.

Uncertainty matters because the areas of uncertainty are where work will be done—and because the amount of work is hard to estimate.

Uncertainty is a normal and inescapable part of building a system. After all, at the start of a project nothing is known about the system that will result except a vague idea of what it might be for. System-building can be viewed as a process starting with everything uncertain, then step by step resolving uncertainties over time as more and more artifacts are understood, designed, and checked. At the end of a project, when a system has been implemented, verified, and accepted, there are by definition no uncertainties left. The challenge is to deal with uncertainties in a way that resolves the most important ones early and keeps the team’s work from becoming chaotic.

Uncertainty takes many forms. Sometimes it is just something that hasn’t been built yet, like implementation of a component. Sometimes it is something deeper, like what some set of components should do or how some key part of the design might work. Sometimes it is whether things will pass verification, especially whether parts will integrate together properly and produce the desired emergent behaviors. Many forms of uncertainty fall into four categories: unknown content, unknown feasibility, unknown errors, and unknown integration.

Kinds of uncertainty. There are many things that can be unknown, and some of them will be “unknown unknowns”—things that one isn’t even aware yet that the things will be needed.

Unknown content. These are artifacts, or parts of artifacts, that haven’t been built yet. For example, it may be clear that a spacecraft’s power system needs energy storage, but when it hasn’t been specified or designed, that part of the power system is unknown. Or the design of the energy storage may be started, but it has not yet been completed because the safety-related requirements haven’t been worked in or evaluated yet. This is a kind of known unknown: it can be clear when part of an artifact hasn’t been developed yet.

Unknown feasibility. This occurs when part of one artifact has been developed, and it creates a situation where some dependent artifact cannot be developed. In the spacecraft power system example, the specification may have been written to include a maximum mass constraint and a minimum energy storage amount. When the energy-per-mass requirement in that specification is greater than the capacity of any available storage technology, that specification is not feasible. For another example, the design of the battery and its connectors and wiring might meet specification, but be impossible to manufacture because the wiring is buried inside some structure. This is a kind of uncertainty that one can suspect without knowing that it’s there.

Unknown correctness or errors. A component can get through design and implementation, but until it has been verified nobody knows whether it actually works. After an implementation passes verification, there still may be errors because verification is a probabilistic evaluation: it can catch many problems but often there is no guarantee that it will ensure that the component will behave correctly in every situation. (Some analytic methods can reach this level of assurance.) Witness the number of flaws found in well-tested software, for example. When verification steps have not been defined or verification has not been completed, the correctness is a known unknown. Verification then leaves some amount of unknown unknowns for the scenarios that have not been checked.

Unknown integration. Two components can be designed and implemented correctly to their specifications, but not interact correctly. When they are designed, implemented, and verified individually, it is still uncertain whether they will work together. In my experience, many expensive errors in systems come from problems with integration. Interacting components work together to create emergent behaviors (Section 12.4), which arises from the combination of their separate behaviors. These emergent behaviors are verified at higher levels, either by analysis of the components together or by testing them together. Until the higher-level verification steps have been defined and performed, it is uncertain whether the combination of components will have flaws. As with verifying a component in isolation, the verification steps not completed are known unknowns, while unknown unknowns can hide in the set of conditions not verified.

Effects. Uncertainty can lead to problems when people try to build on something that is uncertain, which is why an artifact is not baselined until the uncertainty is low enough. When a specification is uncertain, people will design the wrong thing, and if they proceed to implement that design the amount of work that will need to be retracted and redone increases.

The team can use information about uncertainty to guide decisions about where to spend effort. Consider an artifact A that has some uncertain aspects, and an artifact B that depends on it. Where should effort be spent? This is an example of unknown content.

One possibility is to focus effort on completing A before working on anything in B. This would likely reduce the chances that something in B will get built that doesn’t match A after it has been completed. For example, if the specification for a component is only partly done, the design of that component could go off in some strange direction that doesn’t meet completed specifications. This is an example of how uncertainty in one artifact leads to uncertainty in its dependent artifacts.

However, what if there is someone available who can work on artifact B—the component design? They will be sitting idle until A is complete. A second possibility for spending effort is to have someone work on those parts of B for which A is fairly certain, having informed them what aspects of A are not yet complete. If a component’s specification is incomplete, the next person might work on parts of the component design that won’t be affected by the missing parts of the specification.

If one person works on artifact A to completion and then moves on to artifact B, it is uncertain whether it is possible to build artifact B. For example, someone might work through a rigorous and complete specification for a component—where the specifications call for something impossible with current technology. This is an instance where it is uncertain whether artifact A, the specification, is feasible rather than just about not being completed. To avoid these kinds of problems, people often do some investigating about design or even prototyping before committing to a specification.

Also when someone completes work on artifact A, what are the chances that there are errors as yet undetected in that artifact? A specification might have an inconsistency or a missed requirement; a design or implementation something similar. The uncertainty about correctness leads to uncertainty about how much rework or fixing will be needed.

A final problem comes with integration: when multiple components are being built independently, they often will not interact properly when put together—that is, will not lead to the desired emergent behavior of the components put together. This is an example of uncertainty about integration, rather than uncertainty about content, feasibility, or correctness. Teams often address this kind of uncertainty by co-designing components that will interact, or by mocking up parts of components so the interaction can be checked before investing in finishing design and implementation of other parts of those components.

Estimation uncertainty. The uncertainty discussed here is related to uncertainty in estimating cost or schedule [McConnell09, Chapter 4]. Note that this is the aggregate uncertainty over the whole project’s schedule or cost, not the uncertainty in specific parts of the work. Estimation practices have developed the cone of uncertainty: the way in which the uncertainty about estimates changes over time as more of the system is worked out, and requirements (specification) and design are completed. Studies based on historical experience show that the cone of uncertainty has a 4x range of variability for a well-run software project.

While estimation uncertainty is not the same as artifact uncertainty, there are lessons to learn from estimation. The cone of uncertainty only narrows when the project is well run (the solid lines in the graph above). McConnell gives examples of the kinds of development process mistakes that lead to uncertainty staying high (or increasing), including failing to work out specifications well, having unstable requirements, poor design and implementation problems that lead to errors to fix, and failing to plan the work. Unfounded optimism about the team’s capability or progress and bias about uncertainty often cause problems with estimation. These can lead to a project where the uncertainty does not converge to zero, or even increases (the dashed line in the graph). Similarly, the uncertainty in system artifacts will only decrease in general if the project is run to avoid these kinds of problems.

Ideally, one would like able to measure uncertainty, but it’s generally not possible. Estimation practices for well-understood kinds of projects, such as developing software of a particular kind, can establish a likely lower bound on variability—but that is different from being able to say what and how much uncertainty exists in a specific artifact, or providing a bound for a new kind of system.

Uncertainty thus has to be treated qualitatively. One can usually say that some component has a little uncertainty, medium, a lot, or complete uncertainty. (I have sometimes called this “how scared am I about this?”) That’s enough for making good enough decisions about how to direct work effort.

Note that the known uncertainty can be quite different from the actual uncertainty. Consider two points in time, one a couple weeks into a project and another two months later. At the first time point, the team will still be discovering stakeholder needs and starting to work out the system purpose and concept. There are probably a small number of large uncertainties then: what the system will be like is unknown, and there may be some indication of particular capabilities that may require unknown solutions. Two months later, the team might be will into the system concept and the number of needed technology solutions will have ballooned. It will likely feel like the project has become far more uncertain in those months. In fact all those unknowns were always there but hadn’t been discovered yet; they hadn’t gone from unknown unknowns to known unknowns.

Using uncertainty. I will discuss in a later chapter (Chapter 22) how to use uncertainty to help guide a project’s work. In short, putting effort where uncertainty is greatest is often a good heuristic for choosing what to work on. Where the greatest uncertainty lies changes over the course of a project: early on, it is in developing an understanding of what the purpose and concept are (unknown content). Later, it moves to working out some of the key decisions about system structure and key enabling technology (unknown content and feasibility). As work progresses into design and implementation, the ability of the high-level parts of the system to integrate together become the most important uncertainty (unknown integration).

The choices for development methodology and life cycle patterns reflect how a project chooses to address uncertainty. The heuristic of focusing on greatest areas of uncertainty affects the choice of development methodology, as I will discuss in the next chapter. It affects how the life cycle patterns include checking for uncertainty (or its resolution) in the flow of tasks for building parts of the system.

21.5.1 Uncertainty versus risk

Uncertainties are different from risks (in the project management sense).

A risk is generally seen as “the potential for performance shortfalls, which may be realized in the future, with respect to achieving explicitly established and stated performance requirements” [NASA16, p. 138]. Risks, in this sense, are fundamentally potential: they may or may not happen. Many risks are external to the system being built: the risk that the project will not get funding, the risk that a vendor will not deliver a component on schedule. Risks are generally categorized by the likelihood and consequence of occurrence; they are defined in terms of scenarios that could happen.

An uncertainty, on the other hand, is a lack of knowledge about part of the system or its artifacts: now knowing how to design a component to achieve some behavior, or not knowing what will be in a specification artifact. An uncertainty about a system artifact has in fact happened, even if no one is yet aware of the lack; it is not a potential lack of knowledge. The consequences of that lack may or may not be predictable.

Uncertainty and risk are certainly closely related. It is uncertain whether a risk’s situation will occur (and thus result in the risk’s consequence). An uncertainty will lead to a consequence, even if that consequence is only doing the expected work to complete something.

A team handles uncertainty and risk differently. Uncertainty is handled by working through parts of artifacts where there is uncertainty—prototyping, analyzing, or just doing the work to build and check the artifact. Handling uncertainty is thus a matter of the ordinary work of the project.

Handling risk, on the other hand, is a matter of planning for the potential events. It is like handling safety or security: one starts by working out what consequences (harms) one wants to avoid, then working out the scenarios or conditions (hazards and environmental conditions) that will lead to an event where the consequences happen (accident). One then works out steps the project can take to reduce the likelihood or severity of the consequence if it does occur.

Chapter 22: Development methodologies

17 April 2025

22.1 Purpose

A development methodology is the overall style of how a project decides to organize the steps in developing the system. This includes decisions like whether to develop the system in increments of functionality, whether to design everything before building, whether to synchronize everyone’s efforts to a common cycle, and so on. These decisions are reflected in obvious ways in the life cycle patterns a project uses.

There are many methodologies named in the literature: waterfall, spiral, agile, and so on. Different sources interpret each of these differently, and they are rarely compared on a common basis. Some of these, like waterfall methodology, have evolved over time and do not have a single clear source or definition. Others, such as agile development, have a defining document (manifesto) to reference.

All of the methodologies I know of have come to be treated as dogma, and are more often caricatured than treated thoughtfully. This is unfortunate because each of the methodologies has something useful to offer, while all of them are harmful to project effectiveness if taken as dogma or used without thoughtful understanding.

22.2 Characteristics

These methodologies can be organized and compared based on a few characteristics. A project can choose a methodology with the characteristics it needs.

Size of design-build cycle. Methodologies like waterfall use “big design up front”, where the entire system is specified and designed before implementation begins. Other methodologies break up development into many specify-design-implement cycles.

Size of design-build cycle.

The argument for doing as much design up front as possible is that errors are easier and cheaper to catch and correct before implementation than after. The arguments against are that in some complex systems the design work is exploratory and requires implementing part of the system to learn enough to know how to design—or not design—critical system parts.

Many iterative methodologies claim to be better at supporting adaptation as system purposes change.

Coupled or decoupled design-build. Some iterative methodologies plan to complete adding a feature to the system in one iteration, by executing an entire specify-design-build-integrate cycle for that feature. Other methodologies break up that cycle into multiple steps, and allow those steps to spread across multiple iterations.

Coupled or decoupled design-build.

Advance planning. Some methodologies emphasize planning out work activities as far as possible into the future, while others focus on planning as little as possible in order to adapt as needs change.

Planning to different horizons.

The argument for planning as far as possible into the future is that it gives the team stability: they have a reasonable expectation of what they should be working on now and have a sense of how that work will flow into other tasks soon after.

The argument for planning to shorter horizons is that someone will come along and change priorities or system purpose, and so the work will need to be changed to adapt. Planning too far ahead is wasted effort, it is argued, and gives teams a false sense of stability.

Regular release or integration. When a methodology uses many design-implement cycles, at the end of each cycle it can require that new implementations be integrated into a partially-working system, or it can go farther and require that the partially-developed system be releasable. Most iterative methodologies recognize that very early partial systems may not be releasable because they are too incomplete.

Regular release is feasible for products that are largely software, where a new release can be put into operation for low effort. It is less feasible for products that involve a large, complex hardware manufacturing step between development and putting a system into operation.

The choice of whether to release regularly or not is often dictated by the relationship with the customer(s) and whether the system is still being implemented the first time, or is in maintenance. Once the system has been deployed, development is likely either for fixes or for new features; these are often released and deployed as soon as possible.

Synchronization across project. Some methodologies that break up development into multiple iterations align all the work being done at one time so that the iterations begin and end together. Other iterative methodologies allow some work iterations to proceed on different timelines from other work.

Synchronized versus unsynchronized tasks.

Synchronizing work iterations across the whole project can provide common points to check that work is proceeding as it should and to share information about progress. However, it can also break up tasks that run far longer than others and result in a perception that the synchronization is wasteful management overhead rather than something useful.

Shared short-term purpose across project. Iterative methodologies can focus the entire team on one set of features across all the work going on at one time, or they can allow different streams of work to have different focuses in the short term.

The argument for this practice is that the more people share a common goal, the more they will be motivated to work together to meet that goal and to defer work that does not address that common goal. The argument for having multiple work streams with different focuses is that too often a project will involve work from different specialties and on different timelines: mechanically assembling an airframe and building a flight control algorithm have little in common.

Shared purpose.

22.3 Commonly-discussed methodologies

I present three of the most commonly discussed development methodologies in order to illustrate how they can be characterized. Each of these methodologies has many variants, and all are the subjects of debates comparing tiny details of each variant. The purpose of this section is to illustrate how they can be analyzed, not to capture all nuances of every methodology in use.

Waterfall characteristics
Cycle size	One design-build cycle for the whole project
Coupled design-build	One cycle, so implicitly coupled
Planning	Plan as far as possible, especially after design
Release and integration	At end of project
Synchronization	n/a
Short-term purpose	n/a

Iterative and spiral characteristics
Cycle size	One set of features crossing the whole system
Coupled design-build	Generally add to design and implementation for the feature(s) in the iteration
Planning	At the beginning of each iteration; variants maintain a roadmap of iterations or spirals
Release and integration	Either; every iteration ends with an integrated working system
Synchronization	All work synchronized to the iteration
Short-term purpose	Shared within the iteration

Agile characteristics
Cycle size	One short iteration with many independent features and tasks, bounded in duration
Coupled design-build	Some agile practices focus on features, with a design-build cycle within one sprint to implement a feature. Other agile practices decouple designing, building, and verifying, allowing those to be spread over multiple iterations
Planning	At the beginning of each iteration; variants have a longer-term general plan
Release and integration	Either; every iteration ends with an integrated working system
Synchronization	All work synchronized to the iteration or sprint
Short-term purpose	Each task has its own purpose

Summary of common development methodology characteristics.

Waterfall development. This approach to development follows the major life cycle phases in sequential order. It begins with concept development, moves through specification to design, and only then begins implementation.

Waterfall development is well suited to building systems that have decision points that are difficult or expensive to reverse. The NASA project life cycle (Section 24.2.1) follows a waterfall-like sequence for its major phases because there are three decision points that do not allow for easy adjustment: getting government funding approval; building an expensive vehicle; and spacecraft launch.

This methodology can be inefficient when the system cannot be fully specified up front. When the system’s purpose changes mid-development, or when some early design decision proves to have been wrong, the methodology does not have support built in for how to respond. Projects using this kind of methodology are known to have difficulty sticking to schedules and costs that were developed early in the project, usually because some unexpected event happened that was not anticipated from the beginning.

In one spacecraft design project I worked on (Section 4.1), the team assembled a giant schedule for the whole project on a 20-foot-long whiteboard. This schedule detailed all the major tasks needed across the entire system. That schedule ended up requiring constant modification as the work progressed.

Waterfall development requires great care when building a system with significant technical unknowns. The serial nature of execution means that some important decisions must be made early on, when little information is available on which to base that decision. When those unknowns are understood, the project can put investigation or prototyping steps into the specification or design phases in order to gather information for making a good decision. On the other hand, if the team does not learn that some technical uncertainty exists until the project is into the implementation phase, the cost of correcting the problem can be higher than with other methodologies. In addition, the sequential nature of execution can create an incentive for a team to muddle through without really addressing the unknown, resulting in a system that does not work properly.

In the spacecraft design project I mentioned, there were technical problems with the ability for spacecraft to communicate with each other. These problems were not properly identified and investigated in the early phases of the project. As the team designed and implemented parts of the system, different people tried to find partial solutions in their own area of responsibility but the team over all continued to try to move ahead. In the end the problems were not solved and the spacecraft design was canceled.

Iterative and spiral development. This development methodology is characterized by building the system in increments. Each increment adds some amount of capability to the system, applying a specify-design-build-integrate cycle. Typically the whole team works together on that new capability.

Early increments in such a project often build a skeleton of the system. The skeleton includes simple versions of many components, along with the infrastructure needed to integrate and test them. Later increments add capabilities across many components to implement a system-wide feature.

Teams using iterative development often plan out their work at two levels: a detailed plan for the current iteration, and a general plan for the focus of the iterations that will follow.

This methodology provides builds in more flexibility to handle change than does the waterfall methodology.

Iterative development can be used to prioritize integration (Section 8.3.2), in order to detect and resolve problems with a system’s high-level structure as early as possible. This involves integration-first development, where the team focuses on determining whether the high-level system structure is good ahead of putting effort into implementing the details of the components involved.

Agile development. The agile methodologies—there are many variants—focus the team on time-limited increments, often called sprints. The approach is to maintain a list of potential features to build or tasks to perform (the backlog). At the beginning of a sprint, the team selects a set of features and tasks to do over the course of that sprint. By the end of the sprint, the features have been designed, implemented, verified, and integrated into the system. In other words, there is a life cycle pattern that applies to building each feature within a sprint.

Agile development aims to be as responsive to changes as possible. The start of each sprint is an opportunity to adjust the course of the project as problems are found or the team gets requests for changes. The agile methodologies arose from projects that were trying to keep the customer as involved as possible in development, so that the team’s work would stay grounded in customer needs and so that the customer could give feedback as their own understanding of their needs changed.

At their worst, the agile methodologies have been criticized for three things: an excess of meetings, drifting focus, and difficulty handling long-duration tasks. Note that these critiques come from people in teams who claim to be using agile methodologies, and reflect problems with the way teams implement agile approaches and not necessarily problems with the definition of the methodology itself.

Agile development emphasizes continuous communication within a team. In practice, this can lead to everyone on the team having multiple meetings each day: daily stand up meetings, sprint planning, sprint retrospectives, and so on. This likely comes from teams using meetings as the primary way to communicate, and from democratizing planning decisions that could be made the responsibility of fewer people.

Some agile projects have been characterized as behaving like a particle in Brownian motion: taking a random new direction in each iteration or sprint. This can happen when the team only looks at its backlog of needed tasks each iteration, or when new outside requests are given priority over continuing work. The focus on agility and constant re-evaluation of priorities can lead teams to this behavior, but it is not integral to the ideal of agile development. A team can develop a longer-term plan and use that plan as part of prioritizing work for each new sprint.

Finally, many complex systems projects involve long-running tasks that do not fit the relatively short timeline of sprints or iterations. Acquiring a component from an outside vendor or manufacturing a large, complex hardware component do not really fit the model of short increments.

22.4 Practical considerations

Most projects actually choose to use a hybrid among the different methodologies. They may start from one of the generally available methodology definitions, but they adapt that template based on the needs of their project and their own experience. Projects often follow different methodological approaches for different parts of the work: the early work on stakeholders and purpose is often linear, with later work done iteratively, for example.

In practice, the projects I have seen that have been successful have applied common sense to the choices they make about how they chose the design methodology for their specific project.

I have several general recommendations for making the choices about what methodology to follow.

The methodology should promote efficient and good quality work. This means, in part, minimizing errors and rework. Forcing the pace to be too fast can push people into sloppy work, not taking enough time to think something through. Slicing up the work into pieces that are too small can lead people not to consider the big picture of their work.
The methodology must be understandable by the team so that they all work in the same, consistent way.
Where possible, the methodology should give a focus to groups of team members, rather than having each person working on something unrelated, in order to foster a sense of shared responsibility and to encourage communication.
The methodology should provide a general, steady direction to the work, so that team members are only rarely given drastic changes of direction or assignment, and so that everyone both inside and outside of the team gets an accurate sense of how much progress is being made. A methodology like this promotes confidence that the work is following a plan and that it can reach a successful end. This typically means having a plan for the project’s work and using that to guide decisions about what to work on (Part XIII). The development methodology includes steps for keeping the plan current.
The methodology should allow for multiple unrelated tasks to proceed in parallel when there are people available to do so and work that can be done separately. The amount of time that some team members sit idle waiting for others should be as small as possible.
The methodology should handle tasks being of different lengths—some of them running for months, some for weeks, some for days—without penalizing people working on different size tasks. I have been on teams that held weekly status meetings, where the people working on short tasks could report something exciting each week but those working on months-long tasks would just report “continuing to work on the task” each week. Those who had news each week got greater attention and approval from other team members than those working on long-duration tasks.
The methodology must not impose a meeting workload or management workload that detracts from actually being able to get the work done. Many projects that have adopted Agile methodology report being swamped by meetings; iterative methodologies do not have to have so many meetings.

Beyond these general recommendations, which apply to any methodology a team might choose, I have three specific practices that I have found to be essential. These practices can be adopted in most methodologies one might choose. The are, first, focusing first on work that addresses areas of high uncertainty in the system; second, having shared milestones for many team members or the project as a whole; and third, explicitly managing reporting and communication.

Focus on uncertainty. I have said earlier that focusing on areas of high uncertainty is a useful heuristic for choosing what to work on. While this is partly a matter for tasking decisions, not development methodology, choices in the development methodology can make this approach more or less effective. undisplayed image

There is a general way to look at the kinds of uncertainty discussed in Section 21.5. Uncertainty in one artifact is related to uncertainty in another. Consider an artifact B that derives somehow from artifact A. When the content in A is uncertain but B has been worked out, it is uncertain whether B will actually fit with A or not. For example, if a component’s design is built before its specification, then the design could end up being wrong and needing to be redone. On the other hand, if decisions have been made in A so that its uncertainty is low, then there it is uncertain whether B, which derives from A, is feasible. A component’s specification might require something impossible in that component’s design and implementation, for example. Similar situations arise for artifacts that are supposed to be consistent but do not derive from each other, such as the designs of two components that are expected to work together. In other words, it can be costly to make a decision in the absence of understanding the consequences of that decision.

At the same time, decisions have to get made. Artifacts have to get built and the system completed. The rationale for prioritizing work to reduce uncertainty is that it reduces the greatest uncertainty first and allows decisions to be made with more information than they might otherwise be. This should, more times than not, result in better decisions and thus less expensive rework later.

My approach to making progress while managing uncertainty is to work iteratively, exploring forward and backward informally through a set of dependent artifacts. This balances different kinds of uncertainty by trying out a decision in one artifact (such as specification) and thinking about its effects on a dependent artifact (such as design or implementation), and feeding information back to adjust the decisions so that implementations are feasible and components integrate down the line.

I have used a collection of techniques to work this way, generally with good success. I have used different names for different related techniques: iterative sketching, integration-first development, multi-horizon planning, and continuous integration and verification. These techniques work together, and can be incorporated into most development methodologies.

Iterative sketching responds to my own desire to wonder about consequences. When working on a component, or a set of components, I often wander through them, imagining how one part of one component might work. I sometimes mentally step through several components to follow how they react to some external event. All of this results in semi-organized, unofficial notes about all the pieces: notes on requirements and design approaches, on key implementation ideas, on what might be used to verify that a component or collection of them works. In process I come to understand what is needed and where the uncertainties are. What kinds of safety or security needs does the component have? What technologies could be used to implement it? The act of sketching also moves back and forth among artifacts, so that I can check on whether a specification or design approach is likely feasible, or adjusting a specification when I find that it leads some something unobtainable.

Sketching has two other benefits besides avoiding infeasible decisions. It uncovers technical uncertainties in upcoming work, and provides a basis for comparing the importance of one uncertainty against another. Better information about uncertainties helps guide decisions about where to put effort next. This approach also provides information to help development plans (Chapter 62).

With integration-first development, I have found that many costly problems come when components are integrated together for the first time. This reveals differences in assumptions and expectations in how the components were designed and implemented. The problems are revealed after a lot of detailed implementation work is complete, when the integration problems are costly to solve—and when it is more difficult to even detect integration problems. Instead, my teams have built simple mockups of components that have just enough functionality to interact with other mocked-up components, in order to verify that they can work together before building out the complex internals of each.

Multi-horizon planning (Chapter 65) practice of honestly acknowledging how much is known and how much is uncertain in the project at any given moment and managing work accordingly. It is a complement to the idea of focusing on uncertainty first by, in effect, tracking which uncertainties are being given priority and which are being deferred.

Finally, continuous integration and verification practices complement the practices developing artifacts iteratively. Continuous integration—in the original sense of continuously integrating separate components into a partially-working system—and then verifying the assembly helps find design and implementation problems as early as possible, reducing the cost of correcting the problems.

All of these techniques are examples of the principle of gradual stiffening [Alexander77, Pattern 208, p. 962]. This is a principle of building that places focus first on the large-scale organization of the system and then filling in details. One does this by “weaving” a system rather than building it in one pass.

It is helpful to imagine a building being made like a basket. A few strands are put in place. They are very flimsy. Other strands are woven in. Gradually the basket gets stiffer and stiffer. Its final structural strength is only reached from the cooperation of all the members, and is not reached until the building is completely finished.

—Christopher Alexander [Alexander77, Pattern 208].

Alexander advocates, in other words, working iteratively across a system, getting its broad structure in place and seeing that it works, and then filling in more parts bit by bit, checking that the details can fit into the big picture without disturbing it. The alternative of doing all the specification and design up front requires a lot of work early on to make sure it is right, without easy recourse when problems are found. In those situations, “the details of connections, and components, are allowed to control the plan” [Alexander77, p. 965].

This kind of practice addresses all the different kinds of uncertainty in a balanced way, so that addressing uncertainty about one artifact’s content does not create greater uncertainty about feasibility or integration in later artifacts.

These ideas of approaching work iteratively, making multiple passes over the artifacts for components, and planning to feed back information from dependent artifacts to earlier artifacts and from component to component all get encoded in the development methodology. Agile and Spiral do not address these principles directly, though they leave room for them. This practice can be added to a basic iterative development framework.

Shared milestones. In a project to build a complex system, it is too easy for some team members to focus on their particular components and to lack incentive to consider the larger system. This can lead to people building artifacts that do not integrate together into a coherent, working system.

Giving groups some kind of shared objective, where they are all responsible for not just their own piece but for the working integration of the pieces, can help mitigate this tendency. A shared objective can foster a sense of working together, and as I will discuss next, doing so can help communication within the team.

In many projects, I have created artificial milestones where the team, or a part of it, would demonstrate some significant increment of the system working. In one project, the funder expected a series of demonstrations and reviews to check that the project was on track. In another project, I created milestones that took a month or two to reach. In each case the team worked together to reach those goals. The goals helped them focus on the work at hand, and made it easier to accept that they did not have to build a complete and perfect version of each component all at one time. The teams also had reason to celebrate their progress at each of these milestones; they were used as an opportunity for public recognition of their work.

Most projects also have hard decision points, often externally imposed, when decisions will be made whether to continue the project or not. Examples include the decision to award the team a contract to build the system, or decisions to continue funding. These decision points impose a degree of waterfall-like structure on the work. For example, one project had to present a proposal to a government agency in order to get a contract to perform detail design and prototype implementation. That proposal involved developing the concept for the system, showing how it met the customer’s objectives, and showing that there was a likely feasible design. (Once the contract was awarded, the team used a spiral development methodology.)

The choices made for development methodology can help or hinder this way of working. Methodologies that support developing components iteratively make it easier to divide the work of a complex component (such as a large subsystem) into steps that can be demonstrated and celebrated.

Managed communication. Many projects that have attempted to follow an Agile methodology report “death by meetings”: constant team meetings where most people don’t have much to say for most of the time, but the meetings eat up a lot of hours and create constant interruptions. This effect is the result of bad management, not of an Agile methodology per se. It arises when the team, or its leaders, avoid making decisions about how reporting and communication can be done effectively and instead default to group meetings to broadcast lots of information to people who mostly don’t need to know.

Much communication happens best organically, when people talk while they work together. A small group that is working together can set up ad hoc communications if needed. When a group is working together but is not located together, then short daily status meetings can be appropriate—but they should not replace continuous informal communication within the group.

Ordinary communication outside the small working group falls into three categories: information that is shared between these small working groups; reporting for management tracking; and top-down communication shared to parts of the team. These can be achieved by assigning some people in each group the responsibility to keep other groups up to date, and to share information the receive from other groups with their team. Other people in management roles have responsibility to collect status information by listening to how each team is doing and sharing that to others who may need the information. (This is the idea of managing by walking around (MBWA) [Packard95, Chapter 11].) Finally, when information needs to be presented top-down, those responsible can involve only those people affected (though that might be everyone on the team sometimes).

At the same time, every project I have worked on has had some kind of regular meeting. Regular meetings help people on the team keep in touch with everyone else, avoiding situations where part of the team gets forgotten because they aren’t interacting with others enough. These regular meetings should have an explicit social purpose and be organized to promote social cohesion, not dressed up as having technical objectives.

These choices, which I discuss more in Chapter 56, are independent of the choice of methodology. Making a choice of methodology does not force one into a choice of communication style.

Sidebar: Summary

A development methodology is a policy for how to organize development work: for example, whether to specify and design before implementing, or whether to interleave design and implementation.
A team’s life cycle pattern works together with its development methodology.
This chapter presents a taxonomy for understanding different methodologies, rather than going strictly by named methodologies (such as agile or waterfall).
Real projects customize their development methodology to meet their specific needs.
Including practices that address uncertainty by building related artifacts incrementally leads to better project execution.

Chapter 23: Life cycle patterns

18 April 2025

23.1 Introduction

System building in general follows a common story.

A project to develop a new system begins when someone has an idea that people should make the system. At this initial moment, the system is largely undefined. There is a vague concept in a few minds, but all the details are uncertain.

The project then moves the system from this initial concept through to an operational system, and through the system’s operational life and eventual retirement. During development, the team will need to ensure steps are taken in order to produce a correct, safe system. Designs will be checked. Implementations will be tested. The system as a whole will be verified before being deployed into service. At the same time, the resources spent on building the system must be used efficiently, doing the work that needs to be done and avoiding the work that doesn’t need to be done.

Many projects continue system development beyond the first operational version, with ongoing development or problem fixes. Some projects include the steps to shut down and dispose of the system once it has completed its functions.

The life cycle is how a project organizes the way the team moves through this story. It is a pattern that defines the phases and steps in the work: what will come first, what will done before something else, when checks will happen. It provides checklists to know when some step is ready to be done, and when it should wait for prerequisites. It provides checkpoints and milestones for reviewing the work, so that problems are found and dealt with in a timely way. It provides an overall checklist to ensure that all the work that needs to be done is in fact done.

Section 20.4 introduced the basic ideas for life cycle patterns. These include:

The life cycle usually includes multiple patterns for different parts of the work.
Work is organized as a set of phases or steps, with milestones or checkpoints included in some steps. (I use the terms phase and step interchangeably; I generally think of a “phase” being longer than a “step”.)
Phases or steps can be dependent on each other, with A depending on B meaning that work done in A will build on work done in B.
A phase can have conditions that should be met before it starts, or that should be true when it is complete.
Many phases and steps can be worked on in parallel.
A path from one step to another following dependencies does not necessarily mean that the steps must be performed serially.

Each project will use its own life cycle patterns. The patterns may incorporate a framework that is standard for the industry or the parent organization. Selecting and documenting the patterns is an essential part of starting up a project, and people in the project should review how well the patterns are working for them from time to time and may want to improve the patterns.

In this part, I discuss life cycles in general. In Part VI, I present a reference life cycle pattern.

23.2 Life cycle and development methodology

Life cycle patterns are related to, but separate from, the development methodology that a team chooses to use, such as waterfall, spiral, or agile methodologies. I addressed these methodologies in the previous chapter (Chapter 22).

Speaking broadly, the development methodology determines how the work is organized in time: in a single sequence or iteratively, synchronized tasks or separate tasks, how far ahead to plan. The life cycle patterns reflect some of those methodology decisions and encode how to do different tasks.

Put another way, the life cycle patterns help organize what work the project has to do, and what dependencies there are among different steps in the work. The development methodology organizes how that work is planned and scheduled. As a result, the two go hand-in-hand but are distinct from each other.

23.3 Key ideas

Almost all project life cycle patterns, for both whole systems and for components, follow a similar overall flow. Abstracting from the story in the introduction, there are phases:

Working out how the project will operate
Identifying purpose
Developing a concept
Refining concept into specification and design
Implementation
Verifying the result
Operating the system or component
Evolving it over time
Retiring the system or component at end of life
Shutting down the project

For a whole system, this looks like:

Note that this flow starts with the system or component’s purpose. Good engineering always begins with having a clear understanding of what a thing is for. I have watched many engineers rush into designing and building a component without putting time into understanding what the component is going to be used for. By random chance their design has occasionally worked out to match what the component actually needed to do, but only rarely.

Understanding a system’s purpose or a component’s purpose also provides a way to bound the work. If one doesn’t know what a component is for, it is easy to keep working on a design without stopping because there isn’t a clear way to know when the design is good enough to be called done.

There are many points in this flow where one might add checks. At these times one can check on the correctness of the work. These checks improve system quality by building in the opportunity to discover and correct flaws before other work builds on the flawed work. Finding minor problems quickly usually means the cost of correction remains low.

There are also points where a project might have project-wide decisions--go/no go decisions or key decision points. These provide opportunities to check the entire project progress, sometimes occurring in the middle of other work, or at times when irrevocable actions are to be taken, such as funding, launch, or public announcements.

This general pattern applies recursively. One can start by creating a specification and design for the system. The system design will decompose the system into high-level components (Section 6.4). The act of defining a set of components implies identifying a purpose for each one, then specifying and designing each high-level component. The design of a high-level component might in turn decompose into a set of lower-level components, which in turn need a purpose, then specification and design.

The overall flow shows a move from high uncertainty at the beginning to lower uncertainty as the work proceeds. I will address managing using uncertainty in Chapter 63.

Finally, a project’s life cycle patterns will reflect the development methodology that the team has selected. Waterfall, spiral, and agile development all affect the contents of the patterns. I discuss this more in Chapter 22.

The life cycle is provides a general set of patterns for how work should proceed, but it should not define exactly how each work step should be done. That is left to procedures (Section 20.5), which should provide step-by-step instructions for how to do key parts of the life cycle. For example, if a life cycle phase indicates that a design review and approval should occur before the end of a design phase, then there should be a corresponding procedures for design reviews. That procedure should indicate who should be involved in a review, what they should look for, how those people will communicate about the results, who is responsible for approving the design, and how they indicate approval.

The life cycle patterns are the basis for the project’s plan (Section 20.6). The patterns are a set of building blocks that people in the project can use to develop the plan. The plan, in turn, guides tasking: the selection of which tasks (as defined in the plan) people should be working on next.

23.4 Purpose of life cycle patterns

Life cycle patterns address problems that projects have. They can help the team have a predictable and reproducible flow to how work should be done, so that everyone shares the same understanding of how the team works.

There are six ways that life cycle patterns help a project.

Quality of work. The team must build a system that addresses the customer’s purpose, and in doing so must meet quality, safety, security, and reliability objectives.
Efficiency. The project will be expected to deliver the final system as quickly as possible, at the lowest reasonable cost, while meeting the quality objectives. This means that the team needs to be kept busy doing useful work.
Team effectiveness. People on the team need to know how to work together. Building trust depends, in part, on having shared expectations of how each person will do their work.
Management support. Project management will need to plan and track the work in order to ensure the team meets deadlines and that they have sufficient resources to do the work.
Customer and regulatory support. The customer may have specific milestones they expect the project to meet in support of the customer’s acquisition processes. Regulators often have similar expectations if a system must be certified or licensed for operation.
Auditing support. The project’s work may be audited to check that the processes followed meet regulatory requirements, certification requirements, or as part of a legal review.

Gaining these benefits is not a result of using life cycle patterns per se; rather, it comes from using patterns that are designed to provide the benefits. For example, if the customer has an acquisition process that specifies certain milestones, then the top-level life cycle pattern for the project should incorporate those milestones. If the project is likely to have auditing requirements, then the patterns should include tasks to generate and maintain auditing records.

Quality of work. The purpose of a project’s approach to operations is, in the end, to produce a system for the customer that meets their objectives. This means it should do what they need, meet safety and security needs, and support future system evolution. In other words, the team’s work needs to produce a system with good quality.

Neither the life cycle patterns by themselves nor the plan that derives from them directly result in good product quality. System quality comes from all of the detailed work steps that everyone on the team performs. If they do their work well, and if mistakes they make are caught and corrected, then the system can turn out well. If some work is not done well, nothing in the life cycle patterns can prevent that.

However, the life cycle patterns can create an environment that will more likely lead to good quality. They can proactively make flaws less likely by ensuring that steps happen in order: identifying purpose and concept before design and implementation, for example. They can insert points in the work that encourage people to think through what they should design or implement. They can also avoid problems by providing a checklist for what should be complete at the end of a work step. They can ensure that when a system is delivered, all the work needed to put it into operation is complete. They can build in checkpoints for reviews and verification to catch problems early. They also help project management organize the work so that it is complete, that is, so that no parts of the system or some work steps are overlooked.

Sometimes the value of a life cycle pattern will come from slowing down work. Most of the work done on a project is done by people who are focused on a particular part of the system; it is not their job to manage how the project goes as a whole. Their job is to get that one part designed and built, according to the specifications they have been given. If the specialists start building before the context for their work has been established, they are likely to design or implement something that does not meet system needs. I have been part of more than one project where the resulting rework caused the project to be canceled or required a company to get additional funding rounds to make up for the resources spent on the mistakes.

Efficiency. Most systems projects will be resource-bound, with more tasks than there are people on the team to do them. In this kind of project, it is important to keep each person busy with useful work. This means that nobody on the team is blocked with no tasks they can usefully perform. It also means that almost all the tasks that people perform contribute to the final system—that there is little work that has to be thrown out and redone because it had flaws that made it unusable.[1]

As project management builds the project’s plan, using the life cycle patterns as building blocks, they must detect where there are dependencies between work steps and plan the work steps so that later steps are unlikely to get blocked. For example, if some part will require an unusually long time to specify and acquire from an outside vendor, then the management will need to ensure that work on that part starts early. The life cycle patterns provide part of the structure on which the plan is based, and provides a template for some of the dependencies.

Life cycle patterns can also help avoid unnecessary rework. This comes partly from the ways that the patterns help improve the quality of work. In particular, a good life cycle pattern can lead people to take the time to think through the purpose and specification of something before they jump into design and implementation unprepared, and then build something that does not meet the system’s needs.

Finally, the patterns can help bound the work to be done. When a project does not define the scope of work to be done, it is likely that someone will start working on something in excess of or not related to the customer needs. Good patterns help avoid this by defining an orderly and thoughtful process for identifying what work needs to be done.

Team effectiveness. Members of an effective team respect and trust each other. Having shared norms and understandings for how work is done and how people communicate is important as part of the environment that allows the team to develop respect and trust.

A defined life cycle for a project addresses part of this by defining a common understanding of how work should be done. Good patterns define expectations of what will be done in different work steps. Everyone on the team can agree when a work step has been completed. Good patterns also create times when people know they are expected to communicate about some work step. This makes it easier for someone to trust that they will be consulted at appropriate points about work that might affect what they are doing, so that they do not need to create separate, ad hoc communication channels or try to micromanage something that is not their direct responsibility.

As I have noted elsewhere (Section 20.9.2), the life cycle patterns can only have this benefit if the team actually follows them.

Management support. The team, or designated parts of it, will be responsible for making a plan (Section 20.6) for the project’s work, then coordinating and tracking the resulting tasks. The life cycle patterns provide templates for the tasks that will go into the plan, and the key milestones that anchor the work. The life cycle sets the pattern for phases that the project will go through, such as initial conception, initial customer acceptance, concept exploration, implementation, and verification. The cycle also sets the pattern for milestones that gate the progression from one phase to another, such as a concept review, a design review (and approval), or an operational readiness review.

The plan will change from time to time, both in response to external change requests and as the project progresses and the team learns more about the work ahead. Sometimes the need for change occurs gradually, with an issue slowly manifesting itself but causing no acute problem that causes people to recognize there is a need for change. A good life cycle will build in times for people to step back to get perspective and detect when there is a slow-building problem to address. Review milestones are often a good time to plan for this.

Having life cycle patterns and corresponding procedures that apply when these changes occur will help the team adjust their work in an orderly way. It will help them ensure that steps don’t get missed as they work out how to change the plan (and the system being built).

Good life cycle patterns can help a project steadily decrease its uncertainty and risk as work proceeds. Most of the time, a project will start with high uncertainty about what the system will look like, and early project phases result in increasing understanding of what the system will need to be. This process will repeat at smaller scales: once the general breakdown of the system into major components is decided on, each of those components will start with high uncertainty about how it will be structured. The uncertainty about the major components will then gradually resolve, and so on. However, this occurs when the project is guided in a way that uncertainty is addressed systematically, not haphazardly.

Customer and regulatory support. Many customers will have a process they go through to decide whether to build a system and to track its development process. For US governmental customers, much of the process is encoded in law or regulation, such as the Federal Acquisition Regulation (FAR) [FAR] or Defense Federal Acquisition Regulation Supplement (DFARS) [DFARS]. The process governs matters like which design proposal is selected for contract, providing evidence of good progress, providing information that determines periodic contract payments, accepting the finished system, and determining whether the project should continue or be terminated.

These customers will expect deliverables from the project from time to time. The life cycle process must ensure that there are milestones when these are assembled and delivered. (It is then the job of project management to ensure that these milestones, and the tasks for preparing deliverables, can be completed by the time line that the customer requires.)

Whether the customer requires explicit intermediate deliverables or not, formally involving the customer may be important for keeping the project on track.

Similarly, regulatory bodies have processes by which a system that must be certified or licensed before operation can apply for that approval. Those processes will define activities that the team must perform, along with milestones and deadlines by which applications must be submitted or approvals received.

Auditing support. A project’s development practices may be audited for many reasons. Auditors may perform a review as part of an appraisal or certification against standards, such as CMMI [CMMI]. They may review processes to ensure compliance with regulatory standards, especially for security-sensitive projects. The processes may also be audited as part of a legal review. These reviewers need to see both the entire definition of processes, including the life cycle patterns, as well as evidence of how well the team has followed these practices.

23.5 A model for patterns

Each project will have several life cycle patterns, each covering a different part of the work.

Each pattern is defined by its purpose, the circumstances in which it applies, the phases or steps involved, and the dependencies among the steps. It should also include rationale that explains why the pattern is structured the way it is. In a previous chapter I used the example of a simple pattern for building one component:

This pattern applies to building one low-level component where the purpose of the component is already known, and the component is straightforward to design and build in house. Similar but slightly different patterns might apply when the component has to be prototyped before deciding on a design, or when the component is being acquired from a supplier outside the project. This pattern would be used as one part of a larger pattern for building a higher-level component that includes this one.

Each phase of a pattern defines a way to move part of the work forward. It should have a defined purpose that defines what work should be achieved in that phase.

The details of the phase are defined by:

Preconditions: what conditions should be true when a phase is ready to start. A list of conditions for starting the phase, beyond the artifacts that should be ready.
Input artifacts that should be available at the beginning of the phase, and their maturity or completeness level.
Actions that are to be taken to begin the phase.
What work should be done in the phase, and the artifacts developed in the phase.
Milestones to be met during the phase.
Actions taken to end the phase.
Output artifacts that should be available at the end of the phase, and how complete each one should be.
Postconditions that define what should be true when the phase has completed.

Each action should also indicate who is responsible for performing that work. The responsibility will usually be defined as a role, not a specific individual. For example, a component design phase might involve three actions: design the component, review the design, and approve the design. The design action would be the responsibility of the component developer; the review action would be the responsibility of the developers responsible for components that interact with the one being designed, and the approval would be the responsibility of a systems engineer overseeing some higher-level component of which this one is part.

The rationale for this example design phase might say:

The purpose is to work out a design for the component that meets its specifications, including its relationships with other components in the system structure and its safety, security, reliability, and performance objectives.
A separate design step between specification and implementation gives time to think specifically about this component, and to document its design for future developers who may need to revise the design.
The review actions provide an independent check that should improve the likelihood of this component working correctly as part of the system by looking at the design from a point of view that has not been intimately involved in working out the design.

The actions defined for the phase should reference the procedures for doing those actions, when those procedures are defined. For the example design review action, the procedure might be:

The component developer notifies the review group that their review is needed;
Each reviewer in the review group acknowledges that they will provide the review;
Each reviewer checks the design against the relationship that component has with their own components, and against the specification for the component being reviewed;
Reviewers give feedback to the developer, indicating whether they are satisfied or not;
When all reviewers are satisfied, they inform the developer.
If one of the reviewers detects a more serious problem with the design that cannot be resolved by feedback to the developer, the reviewer should use another procedure to raise the issue up to a higher level.

The procedure might also name the tools to be used (an artifact repository for the design, a review workflow tool for the reviews).

23.6 Documenting life cycle patterns

A team needs clear documentation of the phases if they are to execute them properly. A team can’t be expected to guess at what they need to be doing, or how their work will be reviewed; it needs to be spelled out.

This documentation is assembled during the project preparation phase. The details are usually not completely worked out before any other work is begun; rather, “project preparation” more often proceeds in small increments, working out the rules shortly before the associated work begins.

Each life cycle pattern should have a purpose, and the steps or phases in the pattern should be checked that they can achieve that purpose (and that there is no extraneous work in the pattern).

A pattern should also have an explanation of when it applies and when it does not. For example, there may be multiple patterns for designing a component: one for a simple component that is built in house; one for a component that is outsourced to a supplier; one for a high-level component that is made up of several lower-level components; one for a component that requires investigation or prototyping before deciding on a conceptual approach to its design. All these patterns likely have a lot in common, but procuring an outsourced component will have contracting steps that an in-house component will not.

Someone using the documentation should be able to tell accurately whether they are using the correct version of the patterns. The life cycle patterns will be revised from time to time—as the team grows and as people find ways to improve how they work together. This means that the material that a user sees should indicate not just a revision number but have a clear indication of whether the version they are looking at is not longer current.

The form of the documentation is not as important as the content. It can be a written document. It can be made available electronically, with structured access and search capabilities (such as in a Wiki). Some companies offer tools that help define and document development processes or life cycle patterns, including definitions of phases. What matters is that each person who needs to use the documentation can do so conveniently and accurately.

23.7 Work steps and artifacts

Each phase or step has a number of artifacts that the team must develop. At the end of a phase, some of those artifacts need to be complete (allowing for future evolution), and others need to have reached some defined level of maturity. The work in a phase consists of the tasks that develop those artifacts.

I discussed artifacts in Chapter 17. The artifacts are the products of building the system, including the system being delivered as well as documentation of its design and rationale, records of actions taken during development, and information about how the project operates.

These artifacts are the inputs and outputs of the work specified in life cycle patterns (and the associated procedures). Using the component design step example, the work uses:

The purpose and rough concept developed for the component;
The specifications developed for the component;
Documentation of the relations between this component and others, both functional and non-functional; and
The purpose of the system and of higher-level components of which this one is part.

The design step produces:

A design document for the component;
Analyses of the design showing whether it will meet its specifications or not;
Records of the rationale for why the design was chosen; and
Records of review and approval of the design.

In general, every artifact involved in building the system should be a product of some work phase or step, and every input or output of work steps should be included in the set of artifacts the team will develop. Ideally, the life cycle patterns will be checked for consistency with the list of artifacts the project uses.

Artifacts are developed at different times during the course of a project. A few artifacts should be worked out as the project is started—especially those recording the initial understanding of the system’s purpose and initial documentation of how the project will operate. These will be refined over time. Other artifacts are developed during the course of development, and the life cycle patterns indicate which ones are to be worked out before others. The artifacts will be in flux during development: the team learns about the system as it designs and develops it; the customer or mission needs often change over time; flaws get discovered in designs or implementations.

Many of the project’s artifacts support how people work together, and the life cycle patterns should reflect these communication needs. For example, one person may work out the protocol that two components need to use to communicate with each other. Two other people may design and implement the two components. The interface specification that the first person develops serves to communicate the details of the interface among all three people. The patterns should record that the design and implementation work steps depend on the work to develop the interface specification. Later, if one of the component developers identifies a flaw in the interface, the people involved can work through how to revise the interface—and the revised specification artifact informs each person how to update their work to match the change. The pattern helps to show how information about a change to the interface specification triggers rework on dependent artifacts.

A good life cycle pattern has procedures to manage the change in artifacts, and how those changes affect other artifacts downstream from them. There are two separate problems these procedures must address:

Managing how changes are coordinated across multiple artifacts and through the team while a part of the system is in development; and
Ensuring that when a part of the system is complete, all the artifacts are consistent with each other.

Different life cycle patterns approach this in different ways, which we will discuss in later chapters on different patterns. The most common approach is to maintain different versions of an artifact, with at most one version being designated as a baseline or approved version, and other versions designated as works in progress. Many configuration management tools have a way to designate a baseline version, and many software repository tools provide branching and approval mechanisms to track a stable version.

23.8 Life cycle and teams

What is the team size and background? How is it expected to change over time? A small team can often be a little less formal than a large team, because the small team (meaning no more than 5-10 people) can keep everyone informed through less formal communication. A large team is not able to rely on informal communication, so more explicit processes and communication mechanisms are important. Many teams start small when the project is first conceived, but grow large over time. A team that will grow will need to communicate more formally from the beginning than they otherwise might so that as they add people to the team, the larger team works smoothly.

Conversely, if the life cycle patterns indicate that some action will be performed by some person, does the team actually have the staff to do that work? When a project says that some work is to be done and then does not staff that function sufficiently, it sends a message to the team that they should not take the process as written seriously. This undermines the team’s trust. If the function is actually needed, either the team will find an ad hoc workaround or the function will not get done adequately. Either way, there will be a disconnect between what is written down and what actually happens.

23.9 Life cycle and planning

The life cycle patterns are just patterns that provide a guide to work that goes in the project’s plan. The plan is the actual definition of the tasks to be done. When the plan needs to be updated, the patterns provide a template for the work that goes into the plan.

Assembling the plan, however, takes into account many inputs, of which the pattern is only one. Planning involves deciding on the priority and deadlines for work, which is based on project deadlines, risk or uncertainty, and the project’s development methodology.

Chapter 62 discusses in detail how the plan is developed and maintained, including how the life cycle patterns get incorporated.

Consider the following example of how a pattern gets incorporated into the plan. This example shows how the pattern is only a template, and there are many decisions that will depend on other information.

This pattern defines what should happen when a customer requests a change. The basic pattern is that first someone on the team should evaluate the request; this may involve working with the customer to clarify the request, and with other engineers to estimate the scope and cost of the work. The project can then make a decision whether to accept the change or not. If the decision is to make the change, work to build, release, and deploy the update will follow. If not, there is another pattern for how to communicate with the customer that the change will not be made.

The activity starts when the project receives a change request. Based on this, the plan can be updated to include three tasks right away: the evaluation, review, and decision tasks.

At the same time, the planner must make decisions: who should each task be assigned to? What priority should the flow of tasks have? The pattern can indicate the roles involved in the tasks, such as there being a small team responsible for evaluating change requests and a customer representative from the marketing team, but it doesn’t determine which specific people. That’s for the planning and tasking efforts to determine. Similarly, the pattern does not specify how the work should be prioritized relative to other work the same people are doing. The planner incorporates information about how urgent the customer’s request might be and the importance of the customer into the decision. The project might have decided, for example, that there should be a queue of outstanding change requests and they should be evaluated in their order in the queue.

Determining who should be involved in a review of the evaluation might depend on the results of the evaluation. The pattern might indicate that the evaluation should be reviewed by engineers responsible for each high-level component that will be affected by the change. This means that the decision about who specifically will be tasked with the review can’t be made until the evaluation has worked out the scope of the change.

The decision to proceed with making an update will depend in part on whether the team has the time and resources to make the update. The team will need to determine whether adding the work to the plan will cause a problem with meeting deadlines that have been established already, or if it will overload a team that is already busy. This determination will involve analysis of the current plan—something that the life cycle pattern can help with only to the extent that the patterns can help with generating estimates of the work that would be involved.

When the project takes the decision to go ahead with developing an update for the request, the pattern shows that work steps follow to develop a change and then release and deploy the update. When the decision gets made, this will trigger the planning activity to add development and release work into the plan. These are high-level work steps with little detail. The planner will find patterns for these steps and populate those patterns into the plan.

Decisions about the work involved in development will depend on the development methodology that the team has selected to follow. If the update will involved extensive changes and the team is following a spiral-style methodology [Spiral], the development plan might consist of two or three development rounds. Each round would design and implement part of the changes, with a milestone at the end of each round showing how the partial changes have been integrated into the system.

Decisions about the release and deployment work will also incorporate policy decisions about how the team works. Will each change request result in a separate update release? Or will updates be bundled together into releases that combine several updates, perhaps on a schedule defined in advance?

23.10 Principles for a life cycle pattern

In this section I list some principles to consider when designing a workflow pattern.

The act of designing—or refining—a life cycle pattern is an opportunity to think deliberatively about how the team should get its work done. Life cycle patterns are the templates for the project’s plan, and so they should be designed to achieve the work that is needed to move the project forward well.

Designing the patterns ahead of time means having time to define good work patterns. The pattern does not have to be worked out under pressure, as a reaction to something unexpected happening in the project. It can be discussed among multiple team members to get different perspectives and to ensure everyone’s needs are met. Working in advance gives time to check that the steps in the pattern are consistent with each other. It means that there is time to think about what exceptional situations might happen and define what to do in those cases.

Note that if an organization already has an approach to life cycle patterns, whether documented or not, one should aim for continuity with that approach. Anyone already in the organization will know that approach to organizing work; making a major change would mean losing the advantage of established team habits. On the other hand, if the current approach is not working well, then a new project is an opportunity to improve.

The life cycle patterns encode principles and methodology that encourages good work. Principles to consider include:

Know the purpose for something before developing it.
Build in time for and incentivize deliberative thinking.
Assign decision-making authority to an appropriate level based on the nature of the decision.
Build in ways to check work, and design them so they are a team norm and not prone to triggering defensive reactions.
Build for the longer term.
Build in project-wide decision points.
Think about exceptions that might happen, how to handle them, and when to change course.
Define the work so that everyone on the team can agree when a step has been completed.
Give a clear definition for each step of the quality considerations by which the work can be judged.
Make the pattern as light-weight as possible without compromising quality.

Purpose. I have mentioned this principle several times already, and I believe it is a basic principle of effective system-building. The life cycle patterns encode this principle for specific parts of the team’s work.

As with anything else that is designed, a pattern itself starts with a purpose. That purpose might be “build a simple component” or “build the whole system” or “handle a customer’s change request”. A good pattern addresses its purpose thoroughly, without trying to achieve other purposes.

The pattern that results should then ensure that team members follow this approach when building parts of the system. If the pattern is for handling a customer’s change request, for example, the pattern should address understanding and documenting what the customer wants changed (and why), before starting to work out whether to agree to the change or to begin implementing the change.

Time to think. Key parts of a complex system are best served by taking some time to properly understand the purpose or need of that part, and to look at options for how it can be designed or built. A project running at too fast a pace skips this thinking and uses the first thing that someone thinks of—though there may be subtle ramifications of that decisions that are not appreciated until the decision causes a problem later. Asking someone to document the alternatives they considered and rewarding them to do so works to improve the quality of the system.

At the same time, people can take too long to make a decision or fixate on making it perfectly. The time spent on deliberation should be bounded to avoid this.

Decision-making authority. Bezos introduced the idea of reversible and irreversible decisions [Bezos16]. He wrote:

Some decisions are consequential and irreversible or nearly irreversible—one-way doors—and these decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before. We can call these Type 1 decisions. But most decisions aren’t like that—they are changeable, reversible—they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through. Type 2 decisions can and should be made quickly by high judgment individuals or small groups.

As organizations get larger, there seems to be a tendency to use the heavy-weight Type 1 decision-making process on most decisions, including many Type 2 decisions. The end result of this is slowness, unthoughtful risk aversion, failure to experiment sufficiently, and consequently diminished invention.

For engineering projects, many decisions fall in the middle ground between reversible and irreversible. Consider building an aircraft. As long as the designs are just drawings, the designs can be changed with low to moderate cost. Early in the design process changes can be quite low cost; as the design progresses and more and more interdependent components are designed, the cost of rework increases. Once the airframe has been machined and assembled, the cost of changing its basic design becomes high, possibly high enough in time or in money that it is in effect irreversible.

Good life cycle patterns will account for different costs of reversing decisions. They should both build in time for deliberation and consultation before making hard-to-reverse decisions and use lighter-weight decision-making for less risky decisions. Similarly, the patterns should ensure that the authority for hard-to-reverse decisions is assigned to someone with high-level responsibility in the project, while the authority for low-risk decisions should be placed as close to the work as possible.

Checking work. Checking that work has been done well is commonly understood to improve the quality of results. It is essential for parts of a system that require high assurance—safety- or security-critical parts.

The key to checking is that the checks not be subject to implicit biases that the developer might have. This can be handled either by the developer doing analyses that force a stepping back from decisions (perhaps by encoding them mathematically) and that can be checked for accuracy by someone else, or by having an independent person review the work.

Either way, the developer’s pride in their work can feel threatened. Setting out life cycle patterns in which every part of the work is checked enables the project to make checks a norm. Designating in advance that checks will happen, and who will do them, helps depersonalize the effort and in the long term contributes both to quality work and team morale.

Building for the longer term. It is easy to solve an immediate problem at hand quickly and move on, leaving a problem for the future. Taking time to think about the problem (the principle of taking time for deliberative thinking, above) will help but is not sufficient.

It is likely that someone will revisit the work sometime in the future. They may need to understand the work in order to fix a flaw or make an upgrade. They may be auditing the work as part of a critical safety review. They will need to know the rationale for decisions that were made, and they will need to understand subtle aspects of the work. If this information has been documented, these people in the future will be able to do their work accurately and relatively quickly. If they have to deduce this information by looking at artifacts built in the work, they will have to spend time reverse-engineering the work and their accuracy is generally low.

Building into the pattern checks for documentation of rationale and explanations will accelerate future work.

Project-wide decision points. Most projects have times when there is a decision whether to proceed or to cancel or to redirect the project. These include whether to start, times when funding is needed, public announcements, and irrevocable steps like launch. These decision points generally require work to prepare for them, which should be accounted for.

Exceptions. Things often go not to plan. What then? Who needs to know? What needs to be done to respond?

Sometimes this is as simple as setting an expectation for the team. If a component’s specification is inconsistent or cannot be met, who gets informed, and how does the problem get corrected?

Sometimes the situation is time-critical. If a major piece of equipment catches fire, what is the response? What if an insecure component has been incorporated and deployed? What if a large part of the system has been built, and someone finds a fundamental flaw? The responses to situations like these are complex, and there often isn’t time in the moment to work out the details.

Good life cycle patterns include pre-planned responses to these exceptional situations. This might consist of references to procedures that should be followed, or it might reference a pattern used to respond to the situation.

Completeness. Can everyone on the team agree when a part of the work has been completed? The person assigned a task should understand their assignment, so that they can do their work independently. Others will check the work, or mentor the person doing the work—and they should have the same understanding of the assignment.

The definition of actions, as well as the list of outputs and post-conditions for a pattern, should be clear to everyone.

Quality considerations. As with completeness, the people assigned to work on tasks need to have a clear definition of what makes the results of their work acceptable, or what makes one way better than another. Sometimes this is simple: when objectives or specifications, which would be inputs to a work step, are met. Other times considerations of quality arise not from specifications but from things like coding standards. In those cases the quality considerations should be spelled out explicitly so the people doing the work know to use them.

Light-weight patterns. Good patterns are lightweight enough to get their job done, and not more (Section 20.9.2). Working out the pattern in advance is an opportunity to work out what parts of the work are truly needed and which can be omitted or simplified. For example, a pattern should be adapted to the possible cost of making a wrong decision (see decision-making authority above). Patterns that involve easily-reversible decisions should include streamlined decision-making steps, pushing the decision authority to as low a level in the team as possible and involving as little work as possible. On the other hand, more difficult decisions should involve a pattern that calls for greater deliberation, more checking and consultation, and places decision-making authority higher in the team’s hierarchy.

Similarly, the patterns should be achievable by the team. If the team is small, it makes no sense to mandate complex work flows for which there isn’t the staff. Each decision about what to include in a pattern should be measured against what is possible for the team to perform.

Sidebar: Summary

Life cycle patterns document how the team will work its way through the steps of building the system.
They help ensure key steps aren’t skipped.
They help the team know how to work together.
The life cycle is made up of phases or steps, plus milestones along the way.
Documentation of a pattern can follow the model provided: inputs, starting and ending tasks, outputs, and the work in between.
There are several principles for evaluating a life cycle pattern.

Chapter 24: Example life cycle patterns

2 October 2024

24.1 Introduction

In this chapter I survey some of the many different life cycle patterns in use.

The patterns have different scopes. Some cover the whole life of a system, from conception through retirement. Some are concerned only with developing a system. Others focus on more narrow parts of the work.

I group the patterns in this chapter into four sets, based on scope. The first group covers the whole life of a project, without much detail in the individual steps. The second dives into the development process. The third addresses post-development processes—for releasing and deploying a system; these patterns overlap with development processes. The fourth and final group is for patterns with a narrow focus on some specific detail of building a system.

Patterns with different scopes can potentially be combined. Most patterns that cover a system’s whole life, for example, define a “development phase” but do not detail what that is. One of the patterns for developing a system can be used for the details.

Each of the examples will include a comparison against the following baseline pattern for the whole life of a project.

The baseline phases are the same as in Section 23.3:

Project preparation. Work out and document the processes, rules, standards, and life cycle patterns that the team will use.
Concept development. Determine what the system needs to be or do in order to be useful to its users and to those who will operate it.
System development. Design, build, and verify the system.
Operational acceptance. Review and acceptance by the customer, indicating that the system is ready for operation.
System production. Build and deploy the system, using the artifacts from system development.
System operation. Support the system in operation, including fixing flaws and supporting problem resolution.
System evolution. Add capabilities to the system.
System retirement. Take the system out of operation and ensure that all artifacts are archived, destroyed, or recycled. Shut down the project.

24.2 Whole project life cycle

These patterns organize the overall flow of a project, from its inception through system retirement and project end. I have selected two examples: the NASA project life cycle, which is used in all NASA projects big and small, and the Rational Unified Process, which arose from a more theoretical understanding of how projects should work.

24.2.1 NASA project life cycle

The NASA life cycle has been refined through usage over several decades. It is defined in a set of NASA Procedural Requirement (NPR) documents. The NASA Space Flight Program and Project Management Requirements document [NPR7120] defines the phases of a NASA project.

The NASA life cycle model is designed to support missions—prototypically, a space flight mission that starts from a concept, builds a spacecraft, and flies the mission.

NASA space flight missions involve several irreversible decisions, and this is reflected in how the phases and decisions are organized. Obtaining Congressional funding for a major mission can take months or years. During development, constructing the physical spacecraft, signing contracts to acquire parts, and allocating time on a launch provider’s schedule are all expensive and time-consuming to reverse. Launching a spacecraft, placing it in a disposal orbit, and deactivating it are all irreversible. These decision points are reflected in where there are divisions between phases, and when there are designated decision points in the life cycle.

There are several life cycle patterns for NASA projects, depending on the specific kind of program or project. I focus on the most general project life cycle [NPR7120, Fig. 2-5, p. 20], which is reproduced below:

The pattern includes seven phases. There is a Key Decision Point (KDP) between phases. Each decision point builds on reviews conducted during the preceding phase, and the project must get approval at each decision point to continue on to the next phase.

The key products for each phase are defined in Chapter 2 of the NPR and in Appendix I [NPR7120, Table I-4, p. 129].

Pre-Phase A (Concept studies). This phase occurs before the agency commits to a project. It develops a proposal for a mission, and builds evidence that the concept being proposed is both useful and feasible. A preliminary schedule and budget must be defined as well. If the project passes KDP A, it can begin to do design work.

Phase A (Concept and technology development). This phase takes the concept developed in the previous phase and develops requirements and a high-level system or mission architecture, including definitions of the major subsystems in the system. It can also involve developing technology that needs to be matured to make the mission feasible. This phase includes defining all the management plans and process definitions for the project.

Phase B (Preliminary design and technology completion). This phase develops the specifications and high-level designs for the entire mission, along with schedule and budget to build and complete the mission. Phase B is complete when the preliminary design is complete, consistent, and feasible.

Phase C (Final design and fabrication). This phase involves completing detailed designs for the entire system, and building the components that will make up the system. Phase C is complete when all the pieces are ready to be integrated and tested as a complete system.

Phase D (Assembly, integration, test, launch, checkout). This phase begins with assembling the system components together, verifying that the integrated system works, and developing the final operational procedures for the mission. Once the system has been verified, operational and flight readiness reviews establish that the system is ready to be launched or flown. The phase ends with launching the spacecraft and verifying that it is functioning correctly in flight.

Phase E (Operations and sustainment). This phase covers performing the mission.

Phase F (Closeout). In this phase, any flight hardware is disposed of (for example, placed in a graveyard orbit or commanded to enter the atmosphere in order to destroy the spacecraft). Data deliverables are recorded and archived; final reviews of the project provide retrospectives and lessons learned.

This pattern of phases grew out of complex space flight missions, where expensive and intricate hardware systems had to be built. These missions often required extensive new technology development. The projects involved building hardware systems that required extensive testing. The NASA procedures for such missions are therefore risk-averse, as is appropriate.

I have observed that many smaller, simpler space flight projects have not followed this sequence of phases as strictly as higher-complexity missions do. Many cubesat missions, where the hardware is relatively simple and more of the system complexity resides either in operations or in software, have blurred the distinctions between phases A through C. In these projects, software development has often begun before the Preliminary Design Review (PDR) in Phase B.

At the same time, I have observed some of these smaller space flight projects failing to develop the initial system concept and requirements adequately before committing to hardware and software designs. This has led to projects that failed to meet the mission needs—in one case, leading to project cancellation.

The phases in the NASA life cycle compares with the baseline model presented earlier as follows.

The NASA life cycle splits the system development activities across four phases. The NASA approach does this because it needs careful control of the design process, in particular so that agency management can make decisions whether to continue with a project or not at reasonable intervals. The NASA approach also places reviews throughout the design and fabrication in order to manage the risk that the system’s components will not integrate properly. Many NASA missions involve spacecraft or aircraft that can only be built once because of the size, complexity, and expense of the final product; this makes it hard to perform early integration testing on parts of the system and places more emphasis on design reviews to catch potential integration problems.

The NASA pattern is notable for some initial work on a mission concept starting before the project is officially signed off and started. There are two reasons for this. First, because all NASA missions have common processes, there is less unique work to do for each individual project. Second, NASA is continuously developing concepts for potential missions, and this exploratory work is generally done by teams that have an ongoing charter to develop mission concepts. For example, the concept for one mission I worked on was developed by the center’s Mission Design Center, which performed the initial studies until the concept was ready for an application for funding.

24.2.2 Unified Process

The Unified Process (UP) was a family of software development processes developed originally by Rational Software, and continued by IBM after they acquired Rational. Several variants followed in later years, each adapting the basic framework for more specific projects.

The UP was an attempt to create a framework for formally defining processes. It defined building blocks used to create a process definition: roles, work products, tasks, disciplines (categories of related tasks), and phases.

The framework led to the creation of tools to help people develop the processes. IBM Rational released Rational Method Composer, which was later renamed IBM Engineering Lifecycle Optimization – Method Composer [IBM23]. A similar tool was included in the Eclipse Foundation’s process framework, which appears to have been discontinued [EPF]. These tools aimed to help people develop processes and then publish the process documentation in a way that would let people on a team explore the processes.

While the UP and its tools gained a lot of attention, their actual use appears to have been limited. I explored the composer tool in 2014, and found that it remarkably hard to use. It came with a complex set of templates, which were too detailed for project that I was working on. Another author wrote that “RUP became unwieldy and hard to understand and apply successfully due to the large amount of disparate content”, and that it “was often inappropriately instantiated as a waterfall” [Ambler23]. Certainly I found that the presentation and tools encouraged weighty, complex process definitions and that they led the process designer into waterfall development methodology.

The UP defined four phases: inception, elaboration, construction, and transition.

Inception. The inception phase concerns defining “what to build”, including identifying key system functionality. It produces the system objectives and a general technical approach for the system.
Elaboration. This phase is for defining the general system structure or architecture and the requirements for the system. The results of this phase should allow the customer to validate that the system is likely to meet their objectives. This phase may be short, if the system is well defined and or is an evolution of an existing system. If the system is complex or requires new technology, the elaboration phase may take a longer time.
Construction. This involves developing detailed component specifications, then building and testing (verifying) the components. This includes integrating the components together into the whole system and verifying the result. The result is a completed system that is ready to transition to operation. RUP focuses on constructing the system in short iterations.
Transition. This phase involves beta testing the system for final validation that the customer(s) agree that the system does what is needed, and deploying or releasing the final software product.

The UP does not directly address supporting production, system operation, or evolution; however, the expectation is that, for software products, there will be a series of regular releases (1.0, 1.1, 1.2, 2.0, …) that provide bug fixes and new features. Each release can follow the same sequence of phases while building on the artifacts developed for the previous release.

The four phases in UP compare with the simple model presented earlier as follows:

The Unified Process provides lessons for defining life cycle patterns: keep the patterns simple, make them accessible to the people who will use them, and put the emphasis on what they are for, not on tools and forms. The basic ideas in UP are good—carefully defining a life cycle, and building tools to help with the definition. I believe that these good ideas got lost because the effort became too focused on elaborate tools and model, losing focus on the purpose of life cycle patterns: to guide the team that actually does the work.

24.3 System development patterns

Some patterns focus only on the core work of developing a system. These patterns generally begin after the project has been started and the system’s purpose and initial concept are worked out. The patterns go up to the point when a system is evaluated for release and deployment. In between, the team has to work out the system’s design, build it, and verify that the implementation does what it is supposed to.

These examples all share the common basic sequence of specifying, designing, implementing, and verifying the system or its parts. Some of the examples include similar sequences of activity to evolve the system after release.

24.3.1 Systems V model

This pattern is used all over in systems engineering work. It is organized around a diagram in the shape of a large V. It is used in many texts on systems engineering; it has also been used to organize standards, such as the ISO 26262 functional safety standard [ISO26262, Part 1, Figure 1].

In general, the left arm of the V is about defining what should be built. The right arm is about integrating and verifying the pieces of the system. Implementation happens in between the two. One follows a path from the upper left, down the left arm, and back up the right side to a completed system.

There is no one V model. There are many variants of the diagram, depending on the message that the author is trying to convey. Here are two variants that one often encounters.

The first variant focuses on the sequence of work for the system as a whole:

The second variant focuses on the hierarchical decomposition of the system into finer and finer components:

The key idea is that specifications, of the system or of a component, are matched by verification steps after that thing has been implemented.

In general this model conflates three ideas that should be kept separate.

Development follows a general flow of specification, then design, then implementation, then verification.
System development proceeds from the top down: start with the whole system, and recursively break that into components until one reaches something that can be implemented on its own.
Development follows a linear sequence from specification and design, through implementation of components, followed by bottom up integration of the components into a system (with verification along the way).

The first two ideas are reasonable. Having a purpose for something before designing and building it is a good idea. There are exceptions, such as when prototyping is needed in order to understand how to tackle design, but even that exception is merely an extension to the general flow. The second idea, of working top down, is necessary because at the beginning of a project one only knows what the system as a whole is supposed to do; working out the details comes next. Again there are exceptions, such as when it becomes clear early on that some components that are available off the shelf are likely useful—but again, that can be treated as an extension of the top down approach.

The third idea works poorly in practice. It is, in fact, an encoding of the waterfall development methodology into the life cycle pattern, and so the V model inherits all the problems that the waterfall methodology has.

In particular, the linear sequence orders work so that the most expensive development risk is pushed as late as possible, when it is the most expensive to find and fix problems. By integrating components bottom up, minor integration problems are discovered first, shortly after the low-level components have been implemented when it is cheapest to fix problems in those low level components. Higher-level integration problems are left until later, when complex assemblies of low-level components have been integrated together. These integration problems tend to be harder to find because the assemblies of components have complex behavior, and more expensive to fix because small changes in some of the components trigger other changes within those assemblies already integrated.

Development methodologies other than waterfall address these issues better, as I discussed in Chapter 22.

24.3.2 Systems or software development life cycle (SDLC)

There are several life cycle definitions for system development, primarily of software systems, that go by the SDLC name. They generally have similar content, with variations that do not change the overall approach.

I have not found definitive sources for any SDLC variants. It appears to be referenced as community lore in many web pages and articles.

The core of the SDLC consists of between six and ten phases, depending on the source, that give a sequence for how work should proceed in a project. The phases are:

Initiation(*): Identify a need or opportunity for a system.
Concept development(*): Work out the system’s purpose and scope, along with a rough concept of how it might work. Do feasibility and cost-benefit analyses. Some variants merge this phase with the next two.
Plan: Develop how the project will operate, including management plans, procedures, and resource estimates.
Requirements(*): Develop requirements (specifications) for the system based on customer needs developed in earlier phases.
Design: Develop a design for the system that will match requirements identified in the Plan phase. Includes making build-versus-buy decisions. Some SDLC variants separate this into Preliminary Design and Detailed Design phases.
Implement: Write software that follows the designs from the previous phase, or acquire it from outside sources. Some variants separate this into Implementation and Unit Test as one phase and Integration as a second phase.
Test: Perform testing on the software to verify that it meets the system objectives and requirements. Some variants mention review and analysis of the implementation artifacts as part of this phase.
Deploy: Release or manufacture the system artifacts and put them into operation.
Maintain: Fix problems identified in operation, and develop new features to extend the system.
Dispose(*): Take the system out of service, dispose of any physical assets, and potentially preserve any data generated by or stored in the system.

Phases marked (*) are not included in all sources.

Most discussions of SDLC stress that the pattern is meant to help organize a project’s work, not to dictate the sequence of activities. Some sources then discuss how the SDLC relates to development methodologies. A project using the waterfall methodology would perform the phases in sequence. Iterative and spiral development would lead to a project repeating parts the SDLC sequence multiple times, once for each increment of functionality that the project adds to a growing system. A project using an agile methodology would perform tasks at multiple points in the SDLC sequence in any given iteration, as long as for any one part or function of the system the work follows that sequence. I discussed how life cycles fit with development methodologies in Chapter 22.

24.4 Post-development patterns

24.4.1 EVT/DVT/PVT

Many electronics development organizations use a set of development and testing phases:

Engineering Validation and Testing (EVT)
Design Validation and Testing (DVT)
Production Validation and Testing (PVT)

This set of phases is intended for developing an electronic hardware component, such as an electronics board. Developing this kind of hardware differs from developing a software component: while software source code can be compiled and tested immediately, a board design must be built into a physical instance before much of its testing can happen. Simulating the board can be done earlier, of course, but much testing is only done on the physical instance. This is especially true for integrating multiple boards together.

This pattern also addresses not just the design and testing of the component itself, but also the ability to manufacture it—especially when the component is to be manufactured in large numbers. The NASA, V, and SDLC patterns do not address manufacturing specifically; this pattern can be combined with those if a project involves manufacturing.

EVT. The EVT phase is preceded by developing requirements for the hardware product. It is often also preceded by development of a proof of concept or prototype for the board.[1]

During EVT, the team designs and builds initial working version, often continuing through a few revisions as testing reveals problems with the working version. The EVT phase ends when the team has a version whose design passes basic verification.

DVT. The DVT phase involves more rigorous testing of a small batch of the designed board. The design should be final enough that sample boards can be submitted for certification testing. The DVT phase ends when the sample boards pass verification and certification tests.

PVT. The PVT phase involves developing the mass manufacturing process for the board. This includes testing a production line, assembly techniques, and acceptance testing.

24.5 Detail patterns

The last two patterns have to do with managing changes to the system: when errors are found, and when customer needs change.

Both these patterns apply to specific, short parts of a project. They apply as needed—when a error report or a change request arrive. Both also potentially involve repeating parts of the overall development life cycle pattern. Both may be used many times in the course of a project.

24.5.1 Defect or error management

This life cycle applies when someone reports a defect or error in the system. It includes fixing the problem and learning from it.

Common practice is to use an issue or defect tracking tool to keep track of these reports and the status of fixing them. Many of those tools have an internal workflow, and parts of this life cycle pattern end up embedded in that internal workflow.

There are two different times when people handle error reports: when errors are found during testing, before an implementation is considered as being verified, and later, when a verified design or implementation must be re-opened. In the first case, the people doing verification are expected to be working closely with the people implementing that part of the system; the pattern for that activity amounts to reporting an error, fixing it, and verifying the fix.

The general pattern for addressing later errors is:

Reporting. Someone finds an error and reports it into the tracking tool.
Triage and ranking. Determine what to do about the report. Someone investigates the report to determine whether it is understandable and actionable. This may involve communicating with whoever reported the problem to get more information, either about their expectation or about what they found. The result is either accepting or rejecting the report. If accepted, the report is given a priority level (typically one of four or five levels) and a likely part of the system affected. If rejected, the result is an explanation of why the report will not be acted on.
Assignment. Who will be responsible for resolving this error? Most projects have a standard procedure: whoever is responsible for the component identified as the likely source, or people can pick up reports when they have time, or a manager can make assignments.
Analysis. What is the actual problem that caused the report, and how can a fix be verified? This investigation might involve working to reproduce the problem. The analysis may find that the source of the erroneous behavior is a defect in a different part of the system than originally determined during triage, which may lead to assigning the responsibility to someone else. The analysis may show that the problem is broadly systemic or that it arises from multiple defects in multiple parts, in which case several people will be involved with fixing the problems and overall responsibility for the report may be moved to someone who can oversee all the affected parts.
Fix. Making changes to the system amounts to recapitulating the overall life cycle pattern for building a part of the system. This can be seen as an instance of rewinding progress, as discussed in Section 20.4. The fix might be simple—there is one part of one component that is reimplemented, a test is developed to check the change, and it can be reviewed and approved. On the other extreme, the problem might come from a high-level design decision; the fix may involve changing that design, which in turn changes the specifications for multiple components, each of which must have their designs updated, their implementation and tests updated to match.
Verify. Once a fix has been implemented, the changes are verified. The fix is verified to ensure that it actually addresses the reported problem, and that it hasn’t created new problems. The changes may have affected how components integrate together, in which case the verification status of integration and interactions among them may be invalidated by the change, and the integration must be revalidated.
Review and accept. Once the fixes are complete and have been verified, the fix can be reviewed and accepted. At this point the updated designs and implementations are baselined (that is, made as the current mainline working version). The record of the error can be marked as completed.
Learning. Is there something that can be learned from the error and its fix that can be used to avoid similar errors in the future? This may be informal learning by the people involved, or it may be important enough to document and used to educate others on the team.

24.5.2 Change requests

From time to time, someone will request changes to the system. The request may come from a customer, asking for a change in behavior or capability. The request may come from the organization or funder, reflecting a desire to meet a different business objective. The request might even come from a regulator, when the regulations governing a system change or when the regulator finds a problem when reviewing the system.

The pattern for handling a change request has much in common with the one for handling a defect report.

After receiving a request, someone evaluates the request to ensure that it is complete and that they understand the request. After that, there is a decision whether to proceed making the change and, if so, what priority to put on the change. After making the decision to proceed, there are steps to design, implement, and verify the changes and eventually release the new version of the system.

Change requests differ from defect reports in two ways. First, requests for changes do not reflect an error in the system as it stands. The team can proceed building the system to meet its current purpose and defer making changes until after the current version is complete and released. Second, most requests are expressed as a change in the system’s purpose or high-level concept rather than as a report that a specific behavior in a specific part of the system does not meet its specification or purpose. A high-level request will have to be translated into, first, changes in the top-level system specification, and then propagated downward through component specifications and designs to work out how to realize the changes. This sequence of activities to work from the change of objective to specifications to designs to implementations is essentially the same as the activities to specify, design, and implement the system in the first place. In the pattern shown above, the “develop update” step amounts to recapitulating the overall system development pattern.

The decision to proceed with a change or to reject it depends on whether the change is technically feasible and whether it can be done with time and resources available. This depends on having an analysis of the complexity involved in making a change. Ideally, the team will be able to estimate the complexity with reasonable accuracy and little effort. Analyzing a change request will go faster and quicker if the team has maintained specification and design artifacts that allow someone to trace from a system purpose, down through system concept, into specifications and designs, to find all the parts of the system that might be affected by a change. If the team has not maintained this information, someone will have to work out these relationships from the information that is available—which is difficult and error-prone.

24.6 Comparisons and lessons learned

The life cycle patterns in this chapter have all been developed in order to guide teams through their work. To meet this objective, they have to be accessible and understandable by the teams using them; they can’t be explained in legalistic documents that include many layers of qualification and exceptions. Some of these have passed this test and have been used successfully. Others, such as the Unified Process, have not caught on.

Some of the patterns cover the whole project, while others address specific phases or activities. One pattern often references other patterns: for example, a high-level pattern like the NASA project life cycle uses lower-level patterns for developing components or handling change requests. Some low-level patterns, such as handling change requests or error reports, can end up using or recapitulating higher-level patterns.

The specific patterns that a project uses depend on that project’s needs. A software project that is expected to be continuously reactive to new customer needs works differently from a project that is building an aircraft, where rebuilding the airframe can cost lots of money and time. The NASA approach is influenced by the US Government fiscal appropriation and acquisition mechanisms, which require programs to have multiple points where the government can assess progress and choose to continue or cancel a program.

All of these patterns implicitly start with working out the purpose of some activities before proceeding to do detailed work.

These patterns also implicitly reflect the cost of making and reversing a decision (Section 23.10). The NASA life cycle puts design effort before a decision to spend money and effort building hardware. The change request and defect report patterns place evaluating the work involved ahead of committing to make a change.

Sidebar: Summary

A survey of a few existing life cycle patterns.
They range from whole-project life cycles to patterns for specific parts of a project.
Some of the examples have been largely successful in practice; others have problems worth learning from.

Part VI: Reference life cycle

A reference life cycle pattern for projects. It models what a full life cycle contains, and can be the basis for developing an actual project’s life cycle.

Chapter 25: Introduction

2 October 2024

The previous chapters have introduced the ideas of life cycle patterns and development methodologies, along with the ways that the two affect each other. Chapter 22 introduced a number of characteristics that one can choose to match a project. Chapter 24 presented a number of example life cycle patterns, along with a rough framework for comparing the examples.

In this chapter, I present a reference model of development methodology and life cycle patterns. This approach is based on the approaches I have used myself or have observed others using in successful projects, along with learning from projects that have gone poorly. These recommendations do not attempt to follow any of the development methodologies dogmatically, instead taking the parts from several of them that work well. In other words, I have tried to distill a pragmatic set of solutions from the many options available.

The reference life cycle covers the entire life of a systems-building project. It has four high-level phases: preparation, development, operation, and ending.

Project preparation is about setting up the project: how it will work, who is sponsoring it, who is funding it. Development covers working out what the system is for and then designing and building it, until it is ready for use. Operation is about producing the system, deploying it, using it, and evolving it. Ending is about shutting down the project when its work is done.

This reference also includes a project support “phase”, which includes all the activities that go on throughout the project to support operations.

Some projects are only concerned with building a system; once the system has been implemented and tested, it goes into production or operation and is no longer the concern of the development team. Those projects skip the operations phase. Most projects, on the other hand, have some level of involvement after the system is deployed and in operation, such as fixing bugs or enhancing the system. These projects involve all the phases.

The phases in the top-level life cycle in turn expand into more detailed patterns. Development consists of working out a purpose and a concept for the system, then developing a system to match, ending with a review to determine that the system is acceptable for putting into operation. Operation expands into a pattern of several phases, which I will discuss below.

Some projects will spend most of their time in development, while others spend most of their time in evolution after the system is in operation. Exploratory spacecraft missions usually consist mostly of development, since once the spacecraft is launched there is little opportunity to change the spacecraft beyond the occasional software update. Mass-market consumer software, on the other hand, often spends as little time as possible on initial development and can spend years developing upgraded versions to keep consumers satisfied. This reference life cycle fits both kinds of projects.

The arrows in this diagram show how information and artifacts flow from one phase to another, but they do not necessarily indicate complete temporal orderings. For example, the project preparation phase often lasts quite a while, and overlaps early parts of the development phase. Within operation, different customers might deploy and operate their own instances of the system, and the project may be working on multiple system improvements at once.

Two of these phases—system development and system evolution—involve designing and implementing parts of the system. These are the two phases where a development methodology applies.

I will discuss each of the top-level project phases in turn in the coming chapters.

25.1 Projects with proposals

Some projects require developing a proposal to get funding or approval to proceed.

The life cycle for this kind of project adds a phase between preparation and development to develop a proposal. Developing the proposal typically involves developing the purpose and a preliminary concept for the system, so that the potential customer or funder can understand what they will getting if they agree to fund developing the system. The initial concept is then documented as part of the proposal itself, which is typically a document (often a large document) explaining what the system will be, how it responds to the customer’s requirements, how long it will take to develop, and how much it will cost.

Much has been written about how to do proposal development well. There is best practice for how to organize a proposal development team and what kinds of reviews are helpful.[1]

After the customer or funder has agreed to the proposal, system development proceeds as it does for other kinds of projects.

25.2 Project-wide decisions and reviews

Projects have times when there will be a decision whether to continue the project, end it, or continue with significant changes. Some examples: whether to start a project, when additional funding is needed to continue, or at periodic progress reviews.

These are often not driven by progress on making the system. They can be driven by external considerations, such as the need for funding, or by a regular cadence of progress checks.

Such reviews or decision points do not fit neatly into the flow of phases defined in the life cycle pattern. When multiple steps are in progress concurrently, as happens during most of the development phase, the decision often happens in the middle of several of them. Preliminary specification or design reviews are also common; they happen part way through specifying or designing part of the system. Design reviews often mean that design should have reached a given level of completeness for the top X layers of components in the system.

I will note some representative decision points in this reference lifecycle, but the actual milestones are project-specific.

Sidebar: Summary

This chapter introduces the high level reference life cycle.
The flow is: project preparation, development, operation, project ending, with project support running in parallel.
Development and operation have many subphases, as discussed in later chapters.
Projects involving proposals (such as responding to an RFP) are one variant.
The reference life cycle includes project-wide decisions or milestones that do not map to a specific phase.

Chapter 26: Project preparation

2 October 2024

The project preparation is for getting together the things that the team will need to operate.

The case for the project. Preparation includes getting funding or approval to begin pursuing the project. This usually includes developing an initial pitch for what the project might be about, who will benefit, and roughly what level of resources will be needed. This initial case for the project will evolve from a vague notion at the start to whatever is needed to get approval and funding.

I have found two guides useful for making this initial case. The so-called Heilmeier Catechism [Heilmeier24] is a set of questions originally developed to guide people pitching project ideas to the US Defense Advanced Research Projects Agency (DARPA). (Appendix B lists the questions.) It consists of eight questions that prompt one to articulate the what and why of the project, along with what it will take to do the work. The second is the CSP project startup document template [Wilkes90], which was developed at the Concurrent Systems Project at HP Labs in the 1990s to guide people to think through what they mean to do in a new project. It is organized around the scientific method, and is phrased in terms of a research investigation; however, it is just as useful for other kinds of projects. There are variations on these guides that add questions, such as: How might the result of the work be misused?

In practice the people starting the project will not begin with answers to these questions. They will have some general ideas for a system project, and their job during the preparation phase is to investigate those ideas to work out answers to the questions. As anyone who has tried to form a startup knows, the system that eventually gets built usually is different from the first ideas—and it is the process of investigating answers to these questions that will find the final answer.

These efforts to work out the project’s case naturally include identifying stakeholders (Section 16.2). They also include some of the work to define the system’s purpose (Section 28.3).

Project operations. Project preparation also works out how the project will operate. This includes:

The life cycle pattern and development methodology to be used for the project;
Procedures for each phase, especially for review and approval of work (Section 20.5);
How planning and tasking will be done (Section 20.6 and Section 20.7);
What kind of team is needed, in terms of capabilities and numbers;
Team management processes (Section 19.3);
Team organization, scopes of authority, and communication patterns (Section 19.2);
The kinds of physical space the team will need, such as offices or lab space;
The kinds of artifacts involved in making the system (Chapter 17); and
The tools and infrastructure to be used to support the work (Chapter 18).

Decision point. At some point during preparation, a decision must be made whether to pursue the project or to stop, based on the case for the project and a general understanding of its costs. The decision should be included as an explicit milestone for the preparation phase, or immediately after, so that people on the team are reminded to take the time to think through whether the project makes sense before more resources are committed.

It may seem that the decision can be left implicit when the project needs no external resources—but in practice the resources used always represent an investment and there is an opportunity cost if the team could be working on something more useful.

Outputs. The preparation phase results in many document artifacts, which the team uses later as they execute the project. The documents record the many decisions that people make during preparation.

People will use these artifacts in a number of situations:

When someone joins the team, and needs to learn how the team operates;
As reference material while executing some step in the project; or
As input to developing the system concept.

Timing. Bearing in mind that as the project and team are systems that bear careful design and implementation of their own, working out how the project will run is a process that takes time. Most projects start small, with just a few people and a general approach to how the project will operate, and develop additional details over time. Project preparation thus usually overlaps the beginning of development.

Progress on developing the project’s operations plans is balanced against the project’s progress on getting started and working out the system concept. Bear in mind Section 8.1.5—Principle: Team habits: the team will develop habits based on the procedures and organization they are working with, and changing those habits is hard. If the project leadership takes too long to develop team organization or life cycle patterns and procedures, it can become expensive and error-prone for the team to change behavior. On the other hand, if the project leadership rushes to develop these procedures and organization and gets them wrong, the team can end up in a similar situation.

The resolution to this dilemma depends on judgment by the project leadership; I know of no recipe for getting things exactly right. A few principles can help:

Openness with the team about how much of the project’s procedures and structures have been worked out, and what is still in development. This can keep the team from developing false confidence in how they understand the project to work; it prepares them for changes to come.
Work out parts of procedures, patterns, and organization before they are needed. If the team is going to start developing a component, have the general life cycle pattern worked out for component development. Sometimes a team can try out a prototype life cycle pattern while developing one component before accepting the pattern for general use—but the trial usage should be done and the pattern should be decided on before many people begin doing that kind of development. This means that project leadership is monitoring the team’s progress and anticipating what is needed.
Make the time to work out things that are needed. This principle follows from the previous: it takes time and effort on the part of project leadership to be prepared before the team needs organization or procedures. I have too often seen leadership end up overworked and managing by crisis, only taking the time to work out procedures when the need is blocking someone on the team (or later, after someone has used some kind of workaround). This is an indication of project leadership that is in trouble, and should be a sign that the project is trying to move faster than it is in fact able to move at the time. At times the best course is to slow down parts of the team so that the project organization can be worked out. Attempts to move a project faster than this, in my experience, do not in fact work out; projects that do this end up with problems that delay the system development in the long run, even though they make something look good temporarily in the short term.
Include the team in developing procedures and organization, so that they have a stake in the result. The people in the team are the ones who will be using most of the procedures and life cycle patterns that the project works out. As I have mentioned earlier (Section 8.1.6), they need to know about and understand procedures, and they need to have confidence that the procedures will provide a positive benefit to the project. While the team members in general may or may not be responsible for developing organization structures or life cycle patterns, getting their suggestions and feedback improves the chances they will accept and work with the organization once it is adopted.

Completion. Project preparation is complete for the most part when the project is set up to execute. This includes having funding or approval to do the project, as well as having team structure, life cycle and procedures, artifact management, basic tools, and resources are worked out.

Preparation is never truly complete, however. Many of the things worked out in preparation will need to be revised as the project goes on. For example, a team’s organization usually needs to change as the team grows from a few people who can collaborate informally to a large team who need more formal organization (Section 19.3.2). The project may also need to change the focus of the system based on funder or customer needs; changing the system may mean changing how the project runs.

Milestones. There are no milestones intrinsic to project preparation in general. The principle of working out how some part of the project will work before the team needs that information applies, but that is not a milestone in itself.

Other stakeholders may impose milestones on project preparation. For example, getting funding from a funder or approval for the project from the organization may be required.

Sidebar: Summary

Preparation has two parts:
- Working out why to do the project, and
- Working out how to do the project.
How to do project operations (the life cycle definition) need not be complete when development begins, but it must stay ahead of the work it organizes.

Chapter 27: Project support

Project support covers all the various things done continuously in the project to keep the team working. Project support starts with the beginning of the project and ends when the project ends.

This phase includes work to monitor and manage parts of the project. Teams are one example (Section 19.3.1); maintaining plans and tasking (Section 20.6) is another. Tracking project risk (Chapter 64) and technical uncertainty (Chapter 63) supports planning.

Other elements of project support include:

Artifacts: setting up and maintain the tools that store artifacts and handle artifact-related workflows.
Tools: defining how to acquire and maintain tools, and how people will be trained to use them.
Teams: hiring, on boarding, and off boarding procedures; how to maintain a team directory (Section 19.4).
Space: the office and work space needed for the team.
Partners: how to maintain relationships with partner organizations, such as subcontractors.
Finance: tracking budgets, working with accounting organizations.
Production and vendors: how to select and work with vendors.
Relations: who will be responsible for maintaining relationships with customers, the organization, funders, and regulators (Section 16.2).

These efforts, similar to project preparation, will usually start small and develop over time. Similar principles apply to the timing of work on project support.

Sidebar: Summary

Support includes all the work that goes on continuously to keep the project going.

Chapter 28: Development

27 January 2025

The development phase sees the project work out what the system is supposed to do, and then build the system to meet that objective.

28.1 What is development?

Before going into the sub-phases that make up the development phase in the large, it’s worth thinking about how a system actually gets developed. A great many systems have been built over the centuries without the benefit of methodologies; with some experience, good systems engineers usually have intuition that guides them through development.

Development starts with a rough idea of what the system is for: what problem the system will solve, or what it can do for people. An aqueduct begins with the idea that something should transport water from a source to a town. A pump driven by a steam engine starts with an idea that a machine could pump water out of a mine better than a human- or animal-driven pump, and thus allow mines to go deeper than they had before.

The people thinking about the problem to be solved also often have some approaches in mind that might be applied. Someone responsible for moving water into a town might know about aqueducts that have already been built. Steam pumps were developed incrementally by many people over a period of over two hundred years.

For developing modern complex systems, the development process still begins with a general idea of what the system might do and what problems it might solve, perhaps with some key technical approach in mind.

The team needs to get from this general idea to a clear and precise definition of what they need to design and implement. This does not occur in one step; the detailed design of the system does not spring fully-formed from the chief engineer’s head. Instead, the team starts with a vague understanding and refines it bit by bit until it is clear enough for design and implementation to start.

The team does need to understand the system’s purpose before working out how the system should work. However, in practice these are often parallel efforts, where some people work with customers and other stakeholders to clarify the system purpose while some people begin to brainstorm ideas of what kind of system might meet that purpose. As the understanding of purpose becomes clearer, those who are investigating what the system might look like—the concept of the system—will refine their ideas. Those who are working on the system concept track updates to the purpose, often feeding questions back to stakeholders when they find something potentially ambiguous or when they suspect that some part of the purpose might not yet be worked out.

The system concept represents the bridge between understanding the customer’s needs and building the details of the system. The concept sets the general approach that the team will use. Working out the concept is a time for creativity, when the team can entertain many possible ways to build the system, eliminating those that aren’t likely and refining those that are promising. The team evaluates these possible approaches along the way to see if they are likely feasible to build and to meet the system purpose.

The team may be tempted to turn the concept-building exercise into a full system design exercise. This is unwise. First, the techniques used to develop a system concept are meant to be fast and fluid, not working to the degree of rigor that design and implementation require. Second, this can lead to a concept and design period that drags on and on when the team needs instead to make a decision about the high-level structure and then move on to investigating design based on that decision. Third, stopping to review the basic concept before committing to it makes for a better concept that will better guide the team later.

This means that a system concept will be (and should be) incomplete. It should show some of the big ideas of the system’s structure, and it should show that these ideas are likely to meet the stakeholders’ needs and are likely to be technically feasible. It should be accurate, in that anything named in the concept should in fact be a necessary part of the system, but it should not be precise, having all of the details worked out.

Once the team has a concept, it is a good time to step back. Is this system still worth building? Is it likely to be feasible? Is it going to be a good answer to customer needs? And is it plausible that the resources needed will be available?

As the development work moves forward, the team will refine the concept. They will find things missing in the concept and have to find designs that fill those gaps. They will find inconsistencies or mistakes, and they will have to correct them. At the same time, customer needs may change—so the initial concept will always be different from the final system.

The level of detail and analysis needed in the concept depends on the project. A project that is building a revolutionary system for potential future customers probably only needs a rough sketch of the system, since investigations will continue for months or years into what those customers really need. On the other hand, a project that is answering a request for proposal typically needs a much more developed concept in order to explain to a funder what they will get and why their funding will be a reasonable risk.

Once the purpose and concept are completed, the team can turn to actually developing the system itself. In practice this is rarely a sharp transition; instead, some part of the team may begin moving forward in working out a system specification even before the concept is finalized, or they may begin prototyping parts of the system that seem especially uncertain.

28.2 The development phase

Development consists of many sub-phases. Purpose development comes first, in which the team determines what customer needs the system will address. After determining the system’s purpose, the team develops a high-level concept for the system, then builds the system itself. The development phase ends when there is agreement that the built system is ready to be produced, deployed, and put into operation.

The first two steps set the direction for the system development work. Purpose development establishes a record of who the stakeholders are for a project, and what each of them needs the system to do. This record of the system’s purpose will be incomplete, initially, but it must be accurate at the time it is documented. The concept phase then provides the time to explore different ways that a system might be built to meet those needs. The concept records a high-level picture of how the system will behave, the environment in which it operates, and some of the main top-level components that will make up the system. The concept phase is also the time when constraints related to security and safety are refined, turning general objectives coming from the customer or other stakeholders into more precise statements of what those objectives mean. Part way through or at the end of concept development is a good time for a review and decision about whether to continue the project.

The system development step in turn consists of many tasks. In this reference approach, the development phase is organized first into a number of system feature development phases, using the development methodology to determine what those phases are. Each system feature development phase, in turn, is organized as a sequence of specify-design-implement-verify patterns.

Figure 28.1: The development phase recurses through the component hierarchy

In this section, I will first discuss the development phase as a whole, then go into more detail about each of the subphases and development methodology.

Beginning. Development begins as soon as the project has a general idea of what customer needs to meet, has gotten funding and approval to start working on the system, and the project leadership has completed enough preparation that people can know the basics of how to do development work.

As I noted earlier, project preparation work is rarely complete by the time development begins. Enough of the preparation should be done that people can begin working out and documenting the system concept, and later parts of development should be gated on other preparation steps.

Completion. Development ends with a system that is ready to be released for production and deployment. Being ready means that the system purpose identified in concept development has been met in the system’s implementation, that this fact has been verified, and that the customer and other stakeholders agree.

The acceptance phase addresses checking that stakeholders (Section 16.2) agree the system is ready for production. The customer—or a proxy for the customer—provides a final validation check that their needs will be met. The organization and funder, as other stakeholders, may weigh in to validate that their objectives have been met, such as that the system will be sufficiently profitable, before investing in production. Some systems will require regulator approval or certification before the system can proceed to production; for example, civil aviation authorities require type certification for commercial aircraft before mass-producing and deploying new aircraft models.

Outputs. There are five kinds of artifacts that are created in the development phase:

The artifacts that record the purpose or customer needs for the system;
The artifacts that make up the system that is to be released to production and deployment, including hardware designs and software source code, as well as manufacturing procedures;
The artifacts that explain how the system was designed: its specification and design, the rationales and analyses, and the mapping from customer needs or system purpose;
The artifacts that provide evidence that the system is fit for purpose; and
Artifacts needed to support regulatory or audit requirements.

Milestones. The primary milestone comes at the end, in the acceptance phase. This milestone can go by different names. The NASA life cycle calls this the operational readiness review, for example. Passing this milestone implies that the system is ready for production (manufacturing) and deployment. As I noted above, this involves checking that the system meets stakeholder needs, and their agreement that this is true. This can include also regulatory approval.

There are other possible project-wide decision points or milestones for checking whether the project is on track and can continue or not. These do not necessarily fall at the beginning or end of phases; sometimes they happen in the middle, in order to correct the project’s trajectory or as dictated by external needs.

Other subphases in development define their own milestones.

28.3 Purpose development

The purpose development phase is for working out in detail what the system is to be, in terms of what it will do for its users and who those users are (Chapter 9; details in Chapter 32).

The people responsible for working out the purpose work with the customer (or a proxy for customers, when the customers are hypothetical; see Section 16.2.1). This requires the team to work directly with the customers, in order to understand not just what the customers are saying they need but also to identify implicit needs and to find constraints on the system that the customers may not be able to articulate.

The team does similar work with other stakeholders. They identify the objectives that their organization has: is it to make a certain level of profit? Are there time constraints on demonstrating capability? Who might be the funder, and what are they looking for? And finally, who might have regulatory authority over the system, and what regulation or standards apply? All this information creates constraints on how the system can be built and what it can do, and will be considered when determining whether these other stakeholders will agree for the project to continue.

The needs found in this phase define objectives that the system should try to address. The constraints, on the other hand, define things that must be true about the system. ! Unknown link ref

I discuss working out system purpose further in Chapter 32.

Inputs. The project should already have a vague idea of who the system will benefit and what their needs are. This is usually worked out when making the initial case for the project, as part of project preparation (Chapter 26).

Completion. The purpose development phase is complete when the list of stakeholders is complete, when the needs of each of those stakeholders are understood and have been documented, and the stakeholders agree that their needs have been documented correctly.

Outputs. The purpose phase produces three artifacts:

The list of stakeholders for the system. This list should characterize each of the stakeholders. For a visionary project, which does not have a specific customer in mind, it characterizes the target customer set.
The list of needs of each stakeholder. For regulator stakeholders, this should include a list of the regulatory requirements that apply to the project. The needs include both objectives and constraints.
A list of risks that arise from potential conflicts between different stakeholders’ needs.

These artifacts together define the system’s purpose and constraints on its design.

Milestones. The purpose phase can end when each of the stakeholders, or a reasonable proxy for them, has reviewed the list of their needs and agrees that the list is complete and accurate. At the review, the team should also determine whether the needs of all the stakeholders can actually be met, or what risks the team will take on by continuing.

28.4 Concept development

The concept development phase is the transition between working out system purpose and beginning to design the system in detail. It is a time to work out an initial, rough idea of how a system might be built to meet the purpose and constraints worked out in the purpose development phase. It is a time to brainstorm many different possible approaches and to be creative. These different approaches can be evaluated and narrowed down to one concept. That concept is the start, not the end, of design; it will guide the work in the subsequent system development phase.

The system concept is a sketch of the system on paper or similar media. It should cover all the major behaviors of the system, but it should not go into great detail about how those will be achieved. Chapter 33 discusses the system concept in more detail.

The concept has two general parts: an external view and an internal view. The external view takes a black-box perspective of the system, and includes:

The users and external interfaces of the system, documenting who will use the system;
The externally-visible behaviors; and
The environments in which the system will operate.

The internal view is an initial sketch of the insides of the system’s black box. This view includes:

The first-level parts or components of the system, and how they are connected; and
Behaviors that these components should have.

The concept does not usually go more than one or two levels deep in the component breakdown.

This information can be recorded in different forms, and it usually takes more than one to capture it adequately.

Key use cases capture information about users and expected behaviors.
A structure diagram, showing the major components and their relations, captures some internal information.
A concept of operations provides a narrative of how the system will operate and illustrates both internal and external behaviors.
A requirements matrix or requirements tracing tree shows how the system concept meets the system purpose.

Documents recording analyses complement these records. The whole collection of concept documents also records the rationale for decisions taken, and perhaps includes records of alternative designs that were considered and not chosen.

The concept is used for three purposes. First, it reveals whether there is likely to be a feasible approach for implementing a system that meets the customer needs. Second, it provides an illustration to customers and other stakeholders that they can use to validate whether the concept meets what they expect their needs to be. Third, it provides a guide as the team begins to specify and design parts of the system for real.

A likely-feasible concept is one where there is likely to be some way to design and implement each of the high-level parts of the system, and that combining those parts will likely satisfy stakeholder needs. The concept can only be likely, because it is supposed to be developed quickly; the uncertainties about whether the concept will actually work are not completely resolved until the whole system has been built and verified. The process of developing the concept can generate a list of what technical uncertainties people have found or suspect. (These uncertainties guide work planning as the project moves into the system development phase.)

The concept gets reviewed by stakeholders, including customers or customer proxies. While a stakeholder might look at the list of their needs as generated in the previous phase and think it complete, I have found that when they step through how a system concept will operate they get a different perspective and come to realize things they missed in the list of needs. When they find that a system concept appears to meet all their needs, the act of validating the concept with them provides them confidence in the project.

Finally, the road that the team follows from initial idea to a complete design (and implementation) has to start somewhere. The concept provides that starting point. The high-level components identified in the system concept become the starting point to specify, design, and build all of the rest of the component in the system.

Chapter 33 discusses the work involved in making and documenting the concept. To summarize that chapter, developing a system concept involves brainstorming many possible approaches to meeting customer needs and sketching them out. These different approaches get evaluated and compared to find out how well they meet the system purpose and how feasible they are; this often involves doing simple analyses. The evaluations show where there are gaps in meeting customer needs or in the technical solution. The best possibilities get refined or combined and improved until the approaches have been narrowed to one best option.

I have said that the concept should be likely feasible, and that the technical uncertainty and project risk uncovered in the investigation should be acceptable. The obvious next question is, how likely or how much uncertainty? In fact these uncertainties and risks are not generally quantifiable, as they deal in unknowns and the point of the concept development exercise is to expose unknowns and not to work them out. Qualitatively, some projects can accept more risk than others: a startup that is developing a speculative new technology can accept far more risk than a project proposing a system for a fixed-price contract. The decision will require a judgment call on the part of project leaders.

Inputs. The concept phase starts with the list of stakeholders and their objectives and constraints, which was developed in the purpose development phase. It can also use whatever informal investigation has been done in advance about system function or possible implementation approaches.

Completion. The concept phase is complete when either the team has found what they believe to be the best approach to designing the system, or they have determined that they cannot come up with a feasible approach.

A feasible system concept provides an understanding of how the system will function when viewed from the outside as a black box, and when that function has been shown to meet stakeholder needs.

A feasible system concept also defines some amount of internal structure and behavior, enough to support an argument that the team can plausibly build a system that works that way. This means that there are likely ways to build each of the components, and that the amount of time, money, and people required to build and verify the system is within what is available to support the project.

The system concept phase must end while the concept is still a concept. In many projects I have seen the temptation to keep improving the concept—make things a little more certain, make things a little better—before declaring the concept done. When this is left unchecked, concept development slides into system design and development, and leaves out the check of reviewing the imperfect and incomplete concept. Skipping that check means that easy and inexpensive course corrections don’t happen and the problems that will always be there aren’t detected and corrected until they are more expensive to fix.

Outputs. The concept development phase produces a number of artifacts that record the system concept, along with the rationales for why that concept was chosen. I noted earlier what the documentation of the concept should include. These artifacts are placed under configuration management, as they are likely to be revised as the project continues.

Milestones. The concept development phase ends with a conceptual design review (CoDR). This review checks the system concept to ensure that the concept meets stakeholder needs, is internally consistent, and is likely feasible to build. Customers and other stakeholders participate in this review when possible. Team members also participate, as a way to both check each other’s work and to share a common understanding of the concept. Some independent reviewers should also participate in order to check for gaps or biases that the team may have missed.

The conceptual design review is often used as a project go/no go decision point. If the team has not found a likely feasible concept, or one that meets organization and funder needs, this is a time for the organization to decide not to continue with the project. In this way the least resources are used before deciding to stop the project.

28.5 Development methodology

The system development phase is about creating the system based on the concept worked out in the previous phase. At the end, the project has the artifacts for a working system ready to hand off to production and deployment. Along the way, the project may need to meet other milestones—preliminary and critical design reviews for government customers, or feature demonstrations for funders.

The reference development methodology structures how the team does the work to design, implement, and verify that system. It is based on the spiral or incremental methodology. Project leadership works out a set of intermediate milestones where the team builds and demonstrates some set of system features working—usually integrating different parts of the system along the way. There is a life cycle phase leading up to each of these milestones, in which the team does the tasks needed to add features to the system. These are called feature development phases. Each feature development phase has an expected duration. If it appears not to be on track to meet that deadline, the team takes this as a signal that corrective action is needed. Unlike in the spiral methodology, this methodology leads to multiple overlapping feature development phases, running in parallel on different timelines and working toward different milestones.

This approach was motivated by several goals.

Provide mesoscale guidance to the development process, so that there is continuity and planning while providing a way to adapt when needed. This level of guidance operates on a time scale longer than a few days or a couple weeks, but less than the whole project; it is intermediate between the time scales used in agile and waterfall development.
Allow threads of activity (feature development phases) to operate on longer or shorter timelines as is appropriate to the work.
Allow different feature development phases to follow the microscale methodology appropriate to the work in that phase. For example, software development often benefits from short, agile-like sprints within one feature development phase, while fabricating large mechanical structures is better done using techniques derived from construction industries.
Promote interaction and collaboration across parts of the system during design, while avoiding involving people working on unrelated parts of the system. Those who are actively working together to build some interconnected capabilities in a few components need to communicate frequently, but people working on some other part of the system do not need to sit through meetings discussing things they aren’t working on.
Support integration-first and uncertainty-first planning practices (Chapters 48 and 63).
Support a partial plan—one that is concrete in the short term and less so in the medium to longer term, with growing specificity as the project progresses (Chapter 65).

Compare this approach to waterfall and agile development methodologies.

Waterfall development, practiced strictly, does not handle uncertainty or adaptation well: the system is designed up front, and implementation follows thereafter. In practice, projects nominally using the waterfall methodology often develop intermediate milestones to organize the work.

Agile development, on the other hand, can lead teams to constantly change direction—unless they develop a plan with some longer-term objectives. When they do so, agile development ends up looking a lot like this reference methodology. Short sprint periods can also work poorly for parts of a project doing work that does not complete within one sprint, like building an airframe or developing detailed analyses.

Example. Consider the following example, taken and simplified from a spacecraft project I worked on. The mission involved multiple spacecraft working together to perform a science mission.

The mission’s concept development defined the overall design of the system: multiple spacecraft, communication links between them, communication with ground stations, and so on. The concept also defined an initial breakdown of the system, where the spacecraft had a set of major subsystems like structure, power, avionics, sensors, flight software, and so on. The concept identified some existing software and hardware designs that could be re-used for this mission.

The development phase, then, was about building hardware, software, and operational procedures that would implement that concept.

The team worked out the major steps that had to happen to build the system, such as designing the avionics, designing the structure, testing and integrating them, and putting sample spacecraft units through environmental testing (heat, vacuum, vibration). The project also would build software to run each spacecraft, which involved tasks like prototyping algorithms for attitude control and then verifying that they would work in testbed equipment. These major steps were partly worked out based on experience on previous missions, and partly from working backwards from the high-level system design to determine major functions to be implemented.

The following shows the first part of the sequence of feature development phases for the main flight software (simplified and abstracted from the original). The flight software had a series of milestones that started with the basic software infrastructure and a simulation environment for testing it. Later milestones then added capabilities one after another. Each milestone integrated new functions across several different components. In most milestones, the work involved behind the scenes was as important as what was overtly demonstrated; for example, the first demo was as much about establishing a software configuration management and build system as it was about demonstrating simple software running.

This project made extensive use of software skeletons or scaffolds, mockups, and emulations. This is typical of a project that prioritizes integration over feature depth. In this case, the main spacecraft control software for the first couple of demos was a simple skeleton of what it would become. The software modules involved could start up, and interact with some others in simple ways, but there was no real logic in the control. Building this part first reduced integration risks that the control software modules would not interact properly with the middleware and operating system on which they ran—and indeed showed middleware bugs that cause the system to crash. By the third demo, the team added basic attitude control logic to the control software. This attitude control still only had limited function; its purpose was as much to show that the control software could interact with (emulated) sensors and actuators.

Sidebar: Kinds of development output artifacts

Feature development phases can produce four different kinds of artifacts, and it is important to differentiate between them.

Real artifacts are the ones that will be part of the final system. They may be incomplete at some points in development, but they will evolve into the final artifact. These include operational procedures, software source code, and hardware drawings or designs.
Skeleton, scaffold, or emulation artifacts stand in for real artifacts until they are developed. They may be a seed from which a real artifact is developed, or they may be replaced by a real artifact later.
Prototype artifacts, which are developed rapidly and to less than production quality solely for the purpose of learning about a potential design. These artifacts will not and must not be turned into real parts of the system (see Section 8.3.5 and Chapter 42).
Verification artifacts, which are used in testing that an implementation meets its specifications.

28.6 System feature development

A system feature development phase is a stream of work that adds a defined set of features (the purpose of the phase) to the system, ending in a milestone with those features implemented, integrated, and demonstrable. It starts with design work that has already been done and the purpose of the work, and ends with system artifacts updated to meet the phase’s purpose.

This approach to organizing development is focused on the features rather than on the components or component breakdown. One feature development phase usually involves several components (and their subcomponents). It promotes the integration of work across parts of the system.

Inputs. A feature development phase takes as input the system concept, design, and implementation artifacts that have already developed, plus a definition of the features that are to be implemented in this particular development phase.

Completion. The feature development phase is complete when the system has been built or modified to implement all the features named for this phase. The completeness and correctness of the implementation is documented in verification records and by demonstrating selected features working in the new system version.

Outputs. Feature development produces several different outputs.

New or modified system component artifacts, including specifications, designs, component implementation, and verification implementations.
Records of verifying the new implementation.
Records of rationales for the designs—why the particular designs were chosen, and what considerations people should later keep in mind as they learn how the design works.
Records of analyses that show that the component designs comply with specifications, and that the overall feature designs comply with the features expected in the design phase. This is different from verification in that these provide the evidence that the design and its verification are complete, and that the verification work did not miss checking some necessary behavior or property.
Records of reviews and approvals.

Along the way, the design phase work may also produce:

Records of the trade studies comparing different design approaches, including the design ideas that were not chosen and why they were not chosen.
Artifacts of prototypes or models used during design to determine whether some design approach is feasible or not.

Milestones. A feature development phase has one milestone, at its end. At this milestone, the completion conditions listed above should hold. The verification records are checked to ensure that the implementation passed verification, and the team who worked on the changes demonstrate key features to the rest of the project.

As will be seen next, the feature development phase is made up of several subphases, and each of these have their own milestones.

Reference pattern for feature development. A feature development phase recapitulates the life cycle of the overall system development life cycle. It starts with purpose, works out a concept, then proceeds into the specification, design, implementation, and verification of parts of the system to build in that purpose.

The concept for a feature development phase includes working the general design approach for adding the phase’s features. As with the system concept, the feature concept involves brainstorming different ways to implement the features, along with evaluations of the alternatives until the team selects one concept. The concept for the features should give a general idea of what components will be modified or created in this phase, along with the internal structure among those components and a narrative of how they will interact (the concept of operations for the features).

Identifying the components that will be affected is key to being able to scope how much effort will be required to implement the features, and who will need to be involved in the work.

The next step is to develop or modify specifications for the components involved (Chapter 34). These detail how the components are to behave and the non-functional attributes they are to provide. This may involve adding to or modifying the top level system specifications, or flowing those specifications down to components. Security, safety, and reliability specifications are particularly important.

Design follows specifications, working out how each component can be built to provide the behaviors and properties it is specified to have (Chapter 38). Design may require evaluating alternatives, perhaps by modeling or prototyping (Section 8.3.5; Chapter 42).

Two separate and independent implementation steps follow. One step implements components and changes to components, following the design. The other step works out how to verify the features in the feature development phase, including verifying both the individual components by themselves (using unit tests, for example) and the features that are provided by the components integrated into the system. If the verification implementation runs ahead of the component implementation, the component implementers can verify as they go (using test-driven development).

As parts of the feature set are implemented, they are verified. By the end of the feature development phase, the components created or changed in the phase and the features the phase is adding are all verified.

The feature development phase ends when the team successfully demonstrates that the system now has the features they have worked to implement. This demonstration might amount to showing that the new system version has passed its verification checks, but doing an actual demonstration gives the people who did the work an opportunity to show the rest of the project what they have done and for the project as a whole to celebrate their work.

Once again, note that this work is organized around the features, not the components. This methodology does not necessarily mean implementing each component’s changes in isolation, verifying those, and then verifying their integration. Rather, the team can order the work however works best for the particular task at hand. For example, an integration-first approach might lead the team to build simple skeletons or mockups of component changes and focus on checking out how the components will interact before implementing detailed changes to the components—which means verifying integration before verifying the unit components. (Of course, the finished changes still need to be verified as a whole before the verification work is done.)

The reference pattern for the feature development phase, in the diagram above, includes review milestones for each of the steps (concept, specification, design, implementation, verification) involved. These reviews serve two purposes. First, they are an opportunity for someone independent to check the work in order to find things the team doing the work might miss. Second, they provide an opportunity for the team working on the features to pause long enough to ensure that they all understand the work in the same way.

Finally, the team responsible for a feature development phase may decide that the phase is large enough that it should be split up into subphases. Each of the subphases might have its own milestone goals; those subphase goals build on each other to reach the features of the main feature development phase. These subphases might focus on individual components or smaller groups of components, or they might split the work into sequential steps, or some combination of the two. These subphases follow the same pattern as the higher-level feature development phase of which they are part.

Interaction between parallel feature development phases. The feature-oriented focus of this methodology can cause problems. If the team is working on two sets of features in parallel, these features could affect some of the same components. Someone working on feature set A might change component C to support A’s features. At the same time, someone working on feature set B might also change component C. In the worst case, the changes might be in conflict and the changes for A might preclude the changes for B working, or vice versa.

The underlying problem is known as serializability in database and parallel computing systems, where it has been studied extensively. In these systems, different approaches to handling concurrent changes are measured by whether they produce the same result as if the work was done serially, one task at a time rather than concurrently. That is, the work is serializable if it ends up with component C looking as if the work for feature set A were done entirely and then the work for feature set were done, or vice versa. This has led to many algorithms for coordinating concurrent work.

The simplest approach is to make changes serially: the people working on feature set A change C first, and when they are done, people working on feature set B get a turn. This is useful when the component cannot be physically shared, like a paper drawing or a mechanical device. There are two costs to this approach. First, one group must wait for the other to be done. Second, when group A changes C in ignorance of what group B will need, group B may have a lot of rework to do when its turn comes (and it is likely to need to consult with group A to keep their changes working).

Another approach is to let the two groups independently change C in parallel, keeping two separate versions of C and merging the changes when both groups are done. This is the approach taken by distributed version control systems like git [Git], which were developed for use by geographically separated, non-communicating software development teams. These tools rely on being able to reliably compare the different versions and to guide people through reconciling conflicting changes. The cost comes when the two groups make incompatible changes that cannot just be merged together.

The third way, and the one I have found most successful in complex systems projects, is to have one person or a small team be responsible for the shared component C. That person (or team) becomes part of both groups A and B working on parallel feature set changes. This responsible person can choose to handle the changes serially, or may choose to use a version control tool to manage their work. The advantage of this approach is that the person responsible for C understands the rationale for why the component is designed as it is, and will make changes that fit with the designs already completed. That person can also understand the needs of both sets of features, and design changes to support both rather than having to undo and redo incompatible design work.

28.7 Recursion to component development

A system feature, in the end, is made up of behaviors and properties of a number of components. That is, system features are emergent from the individual components involved.

The work to implement a system feature is thus made up of the work on each of the components, along with the effort to integrate those components and their changes. The team working out the concept for the feature determines how parts of the high-level feature are allocated to components. That is, they work out what behaviors or properties are needed from each component so that together they produce the high-level feature. Along the way during concept development, the team works out what components are affected by the feature development work.

The feature development life cycle pattern for the high-level feature applies for developing the changes to each of the affected components. Just as the feature as a whole has concept, specification, design, and implementation steps, so do each of the components. Developing the concept for the feature includes developing a concept for each affected component. Developing the specification for the feature leads to developing specifications for each component, and so on. The implementation of the feature is the implementation step for each component.

The people who are working on all these component pieces coordinate their work so that it all integrates properly and produces the desired features.

That coordination means that the work on each component moves at a pace at least partly constrained by the work on other components: for example, the specification step for any one component cannot be completely finished until the specifications for all the affected components are finished. Otherwise, the specification work in some other component could reveal a surprise that affected the specification that was thought to be finished.

At the same time, teams rarely just stop and sit idle when the work on some component lags. They proceed from specification to design to starting implementation, accepting the risk that some surprise may happen that will require them to re-do some amount of work. The choice of how much work to do at risk has to be made based on the usual estimates of likelihood and consequence. If the work on some other component is almost done and is in the final stages of cleaning up details, the likelihood of finding something that will require a change to other components is unlikely. On the other hand, if the work on some other component is just getting started, then the chances of a surprise are high. If part of the component in question appears to be fairly immune to changes in other components, then there is little risk of having to redo that work. For example, if the component will definitely need to communicate over a network with other components, then getting network communication designed is low risk.

The figure above illustrates how the work for a feature is coordinated across all the components. The top row shows the steps or phases for the feature as a whole. that work is broken down into the work for two components, shown in the middle two rows. The components each follow the feature development pattern of concept, specification, design, implementation, and verification. The last row covers the thread of work done to address integrating the changes to individual components, and it follows a reduced form of that pattern. The feature integration thread of work is primarily about checking that the work on the components properly combines to produce the high-level system features, and so it focuses on verification methods for this integration.

The figure also shows that the concept development work for the high-level feature and the affected components may often be done as a single task. If the feature and components are simple enough, a small group can work out the concept together and produce one set of concept artifacts that cover both the feature as a whole and its effects on specific components. In this case, the artifacts for each component will reference the shared concept artifacts; after a while, the records for a component may reference several concepts for different features.

If the feature or the components are more complex, the work may need to be divided up so people can work on different parts in parallel, combining and reconciling the pieces before the concept is completed. The artifacts for the components will then reference their own concept for that feature as well as the high-level feature concept documents.

28.8 Feature development variations

The feature development pattern in the last section covers the simplest case: when the team is designing and building a straightforward feature. There are three variants to consider: when the component carries enough uncertainty that prototyping is warranted; when the component will be acquired from outside the project rather than built in house; and the specific needs for implementing hardware components.

Prototyping. Prototyping is used when there are possible technical approaches to designing some part of the system, and the technical uncertainty is too high. In these cases, taking steps to reduce the uncertainty before committing to one particular design can lead to better outcomes.

The uncertainty can take different forms. In one case, the team might have an idea, but they don’t know if it will work correctly. In another case, they may not have an idea for a solution, and they need to explore and learn in order to find possible solutions. Or the team might have a solution, but lack skills essential to completing design or implementing it. Finally, the team may have a solution that is not technically mature enough, and they need to validate its suitability. In each case, developing a prototype of some kind can help.

The prototyping effort is added to the design step. The prototype might take the form of a simple implementation, or of a model of a possible solution. Any prototyping effort should have a clear purpose: to see if an idea works (and working out what it means “to work”) or the like. The focus must be on learning what is needed as quickly as possible. The work should prioritize speed of learning over quality of the prototype implementation.

Prototyping can be a necessary part of learning about a design and managing its uncertainty, but its contribution to the system is indirect—by leading to a good design. The amount of effort or time spent on the prototype should be bounded so that the prototyping effort does not take over the development effort.

The principles about prototyping (Section 8.3.5) apply. The prototype artifacts should be built as quickly as possible to maximize efficient learning, without putting in effort to make them high quality. The artifacts that come out of the prototyping work must not end up in the real implementation.

Acquired component. Sometimes components are best acquired from somewhere else rather than being designed and built by the team. This might involve reusing a component from another project, or using an open source design, or purchasing a component from a supplier. Acquiring a design or component can take advantage of work that others have already done, reducing development costs. It can take advantage of expertise that the team does not have itself, such as a supplier that can manufacture an electronics board or a software vendor that has developed a component with a particular algorithm.

The pattern for an acquired component proceeds with developing a concept for what is needed and a specification for the component. The specification is the basis for a request for proposal (RFP), which is sent out to potential suppliers that are expected to offer potential solutions. The suppliers in their turn use the specification to develop a design, which might simply be an off-the-shelf product or might involve development work on their part. Once the suppliers have a design, they respond to the team. The team evaluates whether the design in fact meets the specification and determines which option is best, if there they have more than one potential choice. In many cases the team will build a simple prototype using a supplier’s prototype implementation, if they have one, as part of the evaluation. After that, the supplier implements, builds, and delivers the component. In other words, this pattern moves the design and implementation work away from the project team and onto the supplier.

The team, however, still does some amount of verification once they have received the implementation. This acceptance testing may be more limited than it would be for a bespoke design, if the supplier provides information about the verification steps they have taken. Nonetheless, the team should spot check any verification work that the supplier has done and must check that the supplied component integrates as expected into the rest of the system.

Acquiring components like open-source designs or software do not have to go through the process of developing a formal RFP. However, these components do still require evaluation before deciding whether to use the design or not. The team must ensure that the license terms are compatible with the system. The team must also ensure that the potential component meets the specification of what is needed of it. Finally, the team must evaluate the quality of the component—which for open-source components, includes not just the quality of the artifact itself but also its governance and supply chain security [Goodin24][CVE24].

This pattern involves support roles that I have not detailed out elsewhere. For example, the acquisition might involve someone who manages contracting or payment. The acquisition will likely involve checking that the license terms and intellectual property rights associated with the component are appropriate for the system the team is building, which may require legal expertise.

Hardware components. Hardware development has different constraints than some other kinds of component development, and so a different development pattern applies. The primary cause of the differences is that a hardware component involves physically building one or more artifacts, which can take time and resources. This makes iterating on a design to work out bugs or to change features much more expensive than it is for software or higher-level designs. In addition, some verification testing is destructive, putting a component in increasingly harsh environments or under harder loads to determine when it fails.

Hardware development also differs from other kinds of component and feature development in the way terms like “design” are used. A design for an electronics board is a full description of how it is to be implemented; in some cases, it can be sent to an automated production system to create a complete physical board. Similarly, many mechanical designs are complete enough to send to a CNC machine or additive printer to create the physical artifact. By comparison, a software design is more abstract; it cannot be directly translated into a working program. Software source code is closer to mechanical or electronic designs, as source code can be sent to compilation tools that produce the executable artifact.

These constraints have led to disciplines about how to organize hardware development. I discussed the EVT/DVT/PVT pattern earlier (Section 24.4.1), which defines a sequence of phases for developing and verifying a hardware component. The NASA approach uses different language [NASA16, p. 124] to describe the sequence of hardware artifacts to be developed and verified. The two approaches are similar, with one naming the phases and one naming the artifacts.

This approach splits up the design, implementation, and verification phases into multiple iterations. There are typically four iterations.

Preliminary development: produces a breadboard or brassboard, which are low to medium fidelity versions of the component. This version is focused on function but often has a form unlike that of the final. It may use off-the-shelf components that will not be used in the final version. These may be subscale or digital models. This step may be repeated multiple times, adding features at each iteration.
Engineering unit development (or EDU, engineering demonstration unit): produces a version that closely resembles the final version in both function and form. It is put through EVT (engineering validation and testing), which verifies that the version mostly meets its specifications. The engineering unit may have a small number of defects, but at the end of verification the team should have confidence that these can all be corrected to produce the final version.
Qualification unit development: this produces one or more units that are built to the final design. These units are typically built manually in small numbers. They are put through DVT (design validation and testing), which involves thorough testing of the function and form of the component. For some systems, some of these units will be tested to failure in order to show that the design can function correctly across the full range of environments in which it will operate. Aircraft wings, for example, are tested by flexing them until they fail. Spacecraft electronics are subjected to heat, vacuum, radiation, and vibration beyond what they will experience in use, and those tests are likely to damage the components. These units are also used for certification with regulators or similar industry organizations.
Production or flight unit development: this step produces a small number of components that can be deployed into operational systems. These are used to verify final manufacturing processes, using the PVT (production validation and testing). This includes matters like supply chain operation, manufacturing logistics, and delivery and storage of the final units. These units are put through acceptance testing, which verifies that the manufacturing process builds components that are identical to those built for qualification.

Figure 28.2: Hardware development phases

The fourth step, producing production or flight units that can be deployed, can occur as part of development or later, in a production phase after the system has been accepted (Section 29.1). If a component is going to be mass produced, verifying the manufacturing methods is worth doing before declaring that the component is complete. After acceptance, the manufacturer will build more units. On the other hand, if only a handful of units will be built and they are expensive to build, such as with individual spacecraft, delaying the production of those units until after acceptance can manage risk.

Finally, the development of a hardware component is part of the development of the larger system. This leads to two ways that the hardware development steps can be organized, depending on how the hardware development will be synchronized and integrated with other parts of the system.

The first way is to plan out the hardware component development as its own thread of work. This way has the advantage of keeping the team focused on designing and building the component.

The second way is to break up the hardware development thread into smaller steps, and put some or all of those steps in feature development threads. For example, when building a circuit board that will run a control system, it will be hard to verify that the board works without some version of the software that runs on it or the interfaces to sensors and actuators of what it controls. In other words, verifying the integration of the hardware component with other parts of the system is an essential part of checking that the component actually works. This is the way virtually every project I have worked on has actually planned out its hardware development work.

As an example, this sequence of feature development steps is loosely based on two different control system implementations in projects I have worked on. The sequence shows how different hardware and software components come together to implement increasingly complex features. This approach integrates the hardware and software parts in incremental steps.

28.9 Acceptance

The acceptance phase is the time for final checks that the developed system is indeed ready for production and deployment. It is the last step in the overall system development life cycle.

There are three kinds of checks involved: that the system can be put into production and deployed; that the customer (or their surrogate) validates that the system is what they need; and that regulators approve the system, if needed.

The check for production and deployment involves verifying that the manufacturing and distribution process is ready for operation, and that all the procedures and tools are in place to install a manufactured product for customer use. For a software-only product, the manufacturing and distribution procedure involves packaging the software release and putting it on distribution servers (or manufacturing distribution media if it is not distributed over networks). The deployment readiness involves verifying that the packaged software has prominent and understandable instructions on how to install it and start using it. On the other hand, for a mass-produced hardware product, verifying manufacturing and distribution involves checking that the manufacturing line can correctly build the system, that it has the proper supply chains in place to support the manufacturing, and that the products can be shipped and warehoused before delivery to customers.

Validating that the system meets customer needs involves customers trying out an instance of the system—not just looking at documentation about the system. This often involves getting one or more customers to use a test installation of the system to do the tasks that the customers need. For some systems, this kind of validation can be done by beta testers, who are given an almost-ready version of the system and try it out in their environment while providing feedback of what works or doesn’t. Other systems that involve more installation and setup can involve setting up test installations that the customers come to use.

Regulatory approval involves different procedures in different industries. An aircraft, for example, must be reviewed and certified by the appropriate civil aviation authority. A spacecraft mission typically requires licenses for launch, communication, and certain kinds of earth observation. Other systems may need approval by an industry safety organization. Most of the work to get these approvals or licenses is part of the development phase, and the acceptance phase is the final check that the necessary approvals are in place.

Once these checks are completed, the final milestone is for the organization and the project to decide whether to proceed to production and deployment or not. Many systems are designed and built, but in the end the organization behind the project decides that the result does not justify the investment in production. Many commercial aircraft, for example, are designed and built, but in the end there is not sufficient sales interest to start production and the aircraft model is quietly retired.

Sidebar: Summary

The development phase runs from the start of the project to the point when a system is ready to manufacture and deploy.
The basic flow is: purpose, concept, system or feature development, acceptance.
- Always start with the purpose first.
- Concept is a rough sketch of what might meet the purpose.
- System or feature development has the flow: specification, design, implement, verification.
The development pattern applies recursively into lower-level components.
The pattern is meant to be applied to multiple features or components concurrently.
There are variants for prototyping, acquiring components, and hardware.

Chapter 29: Operation

Once the system has been developed and verified, it is ready to be manufactured, deployed, and put into use. The initial work of building is done, but there is much more to go. There are several ways the operation phase can proceed, depending on the kind of system, kind of customer, and the role that the organization that developed the system plays.

The general flow is first to manufacture or produce the system using the artifacts that have been developed, then deploy instances of the system. After that, the system instance is in operation. Further development of the system, to evolve it or to fix problems, continues in parallel with customer operation. Finally, at some point, the customer will decide to retire and dispose of the system instance. The steps of deploying, operating, and retiring system instances can occur multiple times in parallel for different customers.

29.1 System production

Production is not the application of tools to materials. It is the application of logic to work.

—Peter Drucker [Drucker93, Chapter 17]

The production phase covers manufacturing the artifacts to be deployed.

Bear in mind that this is a brief overview of manufacturing, intended to explain the main points that people like systems engineers or project managers will need to know in order to understand the general scope of the work, and to understand how the manufacturing steps are related to other parts of the system-building work. Manufacturing has been studied and refined for a couple centuries, and there is an extensive literature with far more information.

There are several kinds of production that different projects might use. These include:

One or small batch production, often done mostly in house. For example, a project might manufacture one spacecraft and a backup for a mission, or a company might build a one-off system for internal use.
Mass production, often by outside manufacturing lines. Physical consumer or office products use mass production.
Digital systems with little or no physical artifacts. For example, software that people will download over networks.
Hybrid, with physical artifacts and digital added to them. This is the most common today: systems that have both physical artifacts that are manufactured and software or configuration that is loaded into them, and that software may be updated in the future.

Production of a new system for a new installation can also differ from production of parts for maintaining or upgrading an existing installation. A new system might consist of a complete collection of hardware components that will be assembled from scratch for the installation. Producing replacement or upgraded parts, on the other hand, consists only of manufacturing a few parts and making them available for deployment into existing installations.

A review and approval to begin production milestone checks that the project has everything ready before committing to production, as discussed below. The review checks that the system development has completed all its milestones and that a system will be ready to deploy when manufactured. It also checks that everything needed for production itself is ready: the manufacturing tools and people, suppliers, testing. It also checks that the organization is prepared by being able to pay for supply and manufacture, and that people are ready to deploy systems once their parts have been manufactured, so that capital does not remain tied up in unneeded inventory.

Production relies critically on security of the supply chain, management of the developed artifacts, the manufacturing process, and the delivery mechanisms. All these elements of the production process have been attacked in recent years. For example, the SolarWinds attack [Zetter23] compromised the production process for their software, which was then distributed to and installed by many other organizations and led to attacks on those other systems. There are other reports of fake hardware components (e.g. pressure sensors [Control19]) being injected into a supply chain. These attacks can result in loss of system components, delaying deployment to a customer, exposure of intellectual property, deployment of a faulty or dangerous system, or creation of security problems for the system’s customer.

The overall production process has the following steps:

Take in implementation artifacts, input stock supply, and manufacturing procedures.
Acquire input stock from suppliers or inventory.
Perform manufacturing procedures.
Perform acceptance tests.
Record tracking information for each manufactured unit (such as serial number, build history, or test history).
Deliver deployment-ready artifacts to inventory or directly to customer.

This flow depends on the supply chain of parts used in manufacturing or production. Any physical parts or stock used must be on hand to perform manufacturing; this implies that the stock is in inventory, and that it has been supplied from some qualified source. Sourcing implies selecting the suppliers and setting up contracts for them to provide the stock. The contracts with the suppliers should include clear specifications of exactly what stock or components are to be supplied, along with evidence that the delivered parts meet the specification.

Procedures for receiving materials from suppliers and maintaining inventory are part of the definition of manufacturing procedures. The procedure will typically need some amount of space for maintaining this input stock, along with managing information about what stock is on hand and what should be used next. The storage space maintains the input components or stock in an environment that will keep the material in its designed storage conditions. The procedures include determining when to order more stock. The receiving and storage facilities should have security that ensures that material is not stolen or replaced.

The production process relies on accurate configuration or version management. The artifacts used to manufacture the production components should have consistent versions, and those should match the versions used for final verification during development. If inconsistent implementations were manufactured, the components might not work together—and the resulting problems are often subtle.

The manufacturing procedures specify who does what steps, in what order, using what tools. These procedures are designed during system development and verified during production verification testing (see the section on hardware development above).

After system components have been manufactured, they are checked to ensure that there are no manufacturing defects. This is typically called acceptance testing. For many hardware components, this involves putting the component through a set of tests that are defined during system development. These tests do not stress the component to a level that will induce faults, like testing at high temperatures or voltages; the tests only look for potential manufacturing problems. Some mechanical or electrical components go through a “burn in” period, which operates the component long enough to catch early component (“infant mortality”) failures. For some other kinds of components, only a sample of each batch of components gets tested, under the assumption that manufacturing defects will tend to cluster in one production batch (for example, one day’s production shift).

The production process involves a significant amount of record keeping. Each produced component has its own set of records. These records start with the component’s identity, typically represented as a serial number. The record identifies what version of the input development artifacts were used, often by associating a release version number or code with the serial number. The records include when, by whom, and using what equipment the component was built, so that if parts start failing an analysis can identify other components that may be at higher than expected risk of failure. The records track what parts or stock were used to manufacture the component: the serial number of components used, if appropriate, or the supplier, model, and batch number of stock.

In addition, each manufactured component must be identifiable. That typically means that it should be clearly labeled with its model or version information and serial numbers, at minimum. The labeling is typically in both human- and machine-readable forms.

Once a component has been manufactured and checked, it is placed in inventory and later delivered for deployment. The components in inventory are stored in secure spaces that maintain the components in their designed storage environment—often dust-free, within a particular temperature and humidity range, and so on. The inventory is managed to know what components are in stock and ready to send for deployment.

The production process needs to be resilient to disruptions. One company I worked for was building hardware systems outside the US, and investors asked the company how they would handle a political or military disruption in that country. (The answer was that the company would go out of business because it had no alternative manufacturing option.) Many production or manufacturing processes are also in places that can be vulnerable to natural disasters, including earthquakes and storms.

Finally, the manufacturing process is generally a human process, and processes involving humans have a tendency to drift over time away from their originally-intended procedures (see e.g. Leveson [Leveson11, Chapter 12]). This drift can come from changes in how people are trained, people finding potential simplifications in the procedures, changes in the environment in which the people are working, and many other causes. The designs of robust, safe manufacturing procedures include periodic audits to check that people are performing the procedures as originally designed, and to re-design the procedures if they are found to have problems in use.

Inputs. The production step uses many inputs:

System implementation artifacts, such as source code, mechanical and electronic designs or drawings.
Production procedures that define how to manufacture and assemble parts of a component.
The manufacturing space, tools, forms and other equipment used in the production procedures.
Acceptance test procedures to verify that a component has been manufactured without defects.
Contracts for supply, logistics, and manufacture.
Supply inventory space that maintains input components and stock in a secure and organized way in its intended storage environment.
Product inventory space that maintains the produced components in a secure and organized environment until they are sent for deployment.
Inventory tracking records.

I use two terms loosely: input component and stock. By input component, I mean something that is used as it is in manufacture, such as a chip or a valve. By stock I mean material that has to be worked during manufacture, such as a metal or wood block that is machined to make a component, or plastic that is melted and formed in a 3D printer to make something else. Others may use other terms for these two kinds of inputs, but the distinction remains.

Outputs. Production has two major outputs: Deployable artifacts that are in inventory storage or on their way to a customer, and records of each artifact.

Milestones. Production does not begin until there has been a review that ensures that the organization is ready to perform production activities. The approval milestone checks that all of the manufacturing, testing, inventory, and logistics procedures are complete and performable. These checks typically depend on results from production verification testing. The review also checks that all the necessary suppliers are qualified and under contract to deliver manufacturing inputs. Finally, approval to begin production depends on having the capital or cash flow needed to support production, and that the organization is ready to deploy the manufactured system once it has been produced.

Each component has an acceptance testing milestone, as discussed above.

29.2 System production examples

I mentioned earlier that different projects follow different kinds of production patterns. Here are a few examples that show some of these different approaches.

Software only. This example covers a software-only system that is delivered electronically to customers for installation.

When building a software-only system, many people don’t put much thought into what happens between when a version of the source code is marked as ready for release and the delivery to a consumer. In practice there are several steps between the two, and those steps require careful design.

The input to production is a version of the software—either as source code or as binaries—that has been verified to meet its specifications, and validated against the original customer needs. This code is under version control and has been labeled as being ready for release.

The output is one or more installation packages on servers that customers (or deployment teams) can access over networks. Some software packages are not or cannot be delivered over networks, in which case the output is some physical artifact, such as a CD or USB drive, containing a copy of the installation package.

The production process involves the steps to generate these installation packages then stage them on distribution servers. The procedure typically involves building binary versions of the software from the appropriate source code artifacts, then performing acceptance tests on the binaries. The binaries are then bundled with other material, such as manuals and configuration files into an installable package. The package also includes metadata recording what the package is, its version, and the environment in which it is intended to be used. The procedure also adds security information, such as signatures or encryption to ensure the integrity of the package. The installation package is then copied to distribution servers, and tested to ensure that the package can be downloaded and verified correctly. Once the package is available for distribution, the final step is to let customers know that the package is available.

If the software is intended to run in multiple environments, such as on different operating systems or CPU architectures, the procedure will need to be repeated for each target environment.

In recent years, the integrity of the software production and distribution process has received increasing attention [CISA21]. This has led to standards for protecting the production and distribution processes.

Single spacecraft mission. Building a spacecraft is different from producing software: it involves physical artifacts, and it produces only one or a few instances of the spacecraft.

A project will typically build at least one spacecraft that will fly the mission, but may build a backup or an extra that is used on the ground to verify behavior during the mission.

The objective is to deliver a flight-ready spacecraft that is ready to ship to the launch site, be placed on a launch vehicle, and fly the mission (the deployment), or to deliver a test unit that is otherwise identical to the flight unit to testing teams.

Before assembling the flight instance, many projects often separately manufacture all or parts of additional spacecraft that are treated as Qualification Units for testing, especially for environmental testing that pushes the test unit beyond normal operating limits and might damage it. These units may be built and tested as part of the development phase or during production, as appropriate to a specific project’s rules.

The production process starts with acquiring and building all the components, then assembling them according to procedures worked out during development. The assembly is typically done in a “clean room” that keeps out contaminants that could affect the spacecraft’s ability to function, such as dust entering into cable connectors or hinge bearings. The team typically performs incremental acceptance testing along the way to ensure that subassemblies have been built correctly while they are accessible.

The team assembling the spacecraft document what components are used in each unit as they are assembled. The accumulated records are maintained for the entire life of the spacecraft, as they can be essential to establishing the causes of problems encountered in flight.

Once the entire spacecraft has been assembled, the team performs final acceptance testing, ensuring that testing remains within limits that will not inflict damage. They then package up the built spacecraft for delivery, typically in sealed containers that will protect it from contamination and shock during shipping. The packaged spacecraft is then delivered to the launch site, where it is mounted to the launch vehicle in preparation for launch.

Some spacecraft require final preparation shortly before launch. This can include charging batteries, entering final configuration data, or loading gases and fluids (such as fuel). These steps follow carefully-defined procedures, as they often involve hazardous materials (such as hydrazine fuel) and because there is risk of damaging the launch vehicle in ways that could cause in-flight failure.

The overall production step typically has strong requirements for safety and security. A malfunctioning spacecraft can lead to the failure of a mission, at the cost of significant invested capital. In some cases a malfunction can risk life and property on the ground, such as when a spacecraft causes failure of a launch vehicle, enters the atmosphere and damages or injures something on the ground, or creates debris that damages other spacecraft or injures people on orbit. To this end spacecraft are regulated and must obtain safety approvals before being allowed to launch (see, for example, the US regulations [14CFR450]).

Mass consumer product. This kind of production is for a device that is produced in large numbers for use by the public. These are often produced regularly, in multiple shifts or over multiple days, though not necessarily continuously. The production rate is often ramped up and down to reflect demand. Mass production for consumer products is often done by a contract manufacturer rather than in house, but not always.

Mass production requires a supply chain that can deliver the right parts on a steady schedule, with warehousing to maintain enough parts to keep the production line going and absorb any expected interruptions in delivery.

While mass production for consumers often does not use security standards as high as those for high-assurance systems, security still applies. In particular, using component parts different from that specified can cause unexpected failures in use. Consumer products also need security to keep the features of a new product secret until it is released, and security to avoid theft during and after production.

The manufacturing process uses assembly instructions for workers. These instructions are developed during the system development phase, and are verified during PVT. The instructions must be understandable by the people who will actually do the assembly, who often have different backgrounds from the people who develop the system. The instructions must also account that people may switch from working on one product to another and back over time.

Manufacturing may involve molds or jigs used to create mechanical parts. These are designed and produced during development and verified during PVT.

Products need acceptance testing and possibly burn-in after being assembled. The acceptance tests are also designed and verified during the system development phase. The tests often use test equipment that is also designed and verified during development.

Manufacturing results in many assembled and packaged products ready for delivery. These are then delivered to customers or to warehouses using a logistics provider.

The production process should be checked regularly. Because production goes on for a long time, the people or procedures may drift from the procedures originally developed. People find shortcuts, or worker training may change, or the environment in which assembly is done may change. The production activities may also reveal mistaken assumptions embedded in the assembly and testing procedures. Regular checks or audits will find where these discrepancies exist, and allow people to either bring the assembly and testing procedures back on track or create change requests to update the procedures.

29.3 System deployment

The objective of deployment is to set up a system instance for a customer and get them successfully using that system.

There are several kinds of deployments. The first variation is: who is doing the deployment? Consumer products are set up and installed by the customer. More complex systems are delivered and set up by a team that is part of the project. I will refer to this as “assisted deployment”. Other systems are deployed and used internally by the organization that created them. The second variation is whether one is deploying a complete new system installation, or installing an upgrade into an existing system.

The overall flow of events is the same for all these variants:

Deliver and install the system (or its new components) at a customer’s location;
Verify that the installed system works as expected;
Train users on how to use the system properly;
Migrate any data or material if needed; and
Put the system into operation.

A system is deployed into an environment. That environment might be a customer site for a physical system; it might be spread over multiple sites; it might be an attachment to a launch vehicle; it might be resources on a compute server somewhere. In all those cases, the customer finds the places where the system can be installed. The deployment team and the customer usually interact before deployment starts to let the customer know what is required for the system, and for the customer to let the deployment team know what is available.

The environment for a software system might include the number and kind of compute servers used, the amount of memory or storage on each, the reliability and security of the servers, and the reliability of each server.

The environment for physical systems might include physical space, along with the temperature and atmosphere in that space. It might include the mechanical mounting needed, along with electrical, water, networking, and other supply lines.

Some customers will be migrating from an existing system to the new system being deployed. The migration might include moving information from the old system to the new system, or it might involve moving physical artifacts or supply from one to the other. Developing the migration procedures are a development activity on their own; in effect, they are a second mini-system to design and implement.

Complex systems will have users who need to be trained in order to operate the system safely and correctly. The initial group of users are trained during deployment, so that they can verify that the system works correctly and can take over its use once the installation has been accepted. Other users will learn to work with the system later, perhaps years later.

The installed system includes education and training materials for these users. These materials are assembled during the development phase of the project.

Different kinds of users may interact with the system. At the simplest, there are users who directly command and use the system’s primary behavior. A system may also have administrators who are responsible for specialized tasks, such as managing the set of users or the system’s security. It may have people who are responsible for maintenance and repair. I likely has other people who set policy for how the system should be used. All these people use the education and training material, and that material must address each of their needs.

Deployment presents a number of ways that someone could attack and compromise the system. The system components will be in transit from the production facility or warehouse and could be tampered with; they will be received at the customer site and might be accessed before being installed. The system components may be partially installed but not fully configured to be secure during the deployment process. The deployment procedures themselves could be altered or hijacked. All these potential exposures mean that the deployment procedure must be designed with security in mind, and that security must be evaluated as part of the system requirements.

Deployment includes setting up the customer on a customer service system. Once the system has been installed and the initial users have been trained and given access, the customer will begin to take over system operations. As they do this, they are likely to find they do not actually understand some parts of the system and have questions. They will use the customer service system to communicate with the project team for questions and to report problems.

Accidents and incidents may happen during system operation. When these happen, the customer works with the team that developed and maintains the system to investigate what happened. If the accident is serious enough, regulatory agencies may be involved. During the deployment process, the team establishes the necessary working relationships with the customer that will help the customer to detect when accidents have happened and to bring in the team for investigation. The investigation may determine that there is a flaw in the system, in which case a problem report and change requests are sent to the team to guide fixing the flaws. Section 29.7 below addresses how the team handles such changes.

Setting up the customer for ongoing success using the system is the last part of deployment. Once the system is in operation, the customer’s users are responsible for the ongoing safe and secure use of the system. Users of complex systems tend, over time, to find shortcuts and workarounds for how they use the system. They may forget part of their training, and new users may not be trained fully correctly. The environment in which the system operates may also change—parts might be moved, air conditioners changed out, or electrical feeds changed, for example. All of these can slowly change how the system is working and lead to accidents. Regular monitoring or auditing of system and user behavior is necessary to detect and correct these drifts and avoid accidents, and this auditing must be backed by management policy and actions. (See Leveson [Leveson11, Chapter 12] for background.) The deployment activities, therefore, must include working with the customer to establish the necessary monitoring activities and to establish necessary management policies.

Customer deployment. The components for these systems are delivered to the customer, who is responsible for installing or upgrading the system. The process includes:

Delivering the system, along with deployment procedure instructions and training material.
The customer sets up the environment for the system. This might involve creating space for machines, providing power, providing network connections, providing cooling, and so on.
The customer follows the deployment procedures to install and test the system installation.
The customer might contact the project’s organization for assistance, but most customers should be able to follow the instructions on their own.
The customer migrates any materials or information from an old system. The customer is generally responsible for determining how to do this, though the system deployment instructions might include procedures for common cases.
The customer puts the system into service.

Assisted deployment (internal or external). When someone from the project team does the deployment, the process is similar to customer deployment.

The process includes:

The team gives the customer requirements for the environment the system will require, such as server room, logistics bay, assembly bay, or an open field. The customer is responsible for arranging that environment before deployment starts. The team may verify that the environment is ready before starting delivery or deployment.
The team delivers system components to the customer site, then assembles them following deployment procedures.
The team performs basic testing to ensure assembled system works.
The team provides training on operation and maintenance to the customer.
The team and customer may work together to migrate information or materials from a previous system.
The customer performs acceptance testing, and accepts the deployed system when no issues are found.

Inputs. The deployment step takes as input:

Manufactured artifacts from the previous production step. These might be the parts to set up a whole system, or a few parts being used to upgrade an existing installation.
Training material for the customer on how to use the system.
Deployment procedures.
An environment in which to install the system.
Ways to interact with the customer once the system is in operation (“customer service”).

Deployment can also involve migrating materials or information from a previous system. If so, the procedures for doing the migration are also an input.

Outputs. The deployment step results in:

A running system instance, set up in its environment.
Trained users, who can interact with the system, monitor it, and perform maintenance when needed.
A working communication channel for customer service.

Milestones. When the customer handles deployment, the milestones involved are their concern.

When the team handles deployment, there are three potential milestones:

Deployment readiness is for determining whether the customer and the team are prepared for the deployment. The customer is prepared by having the environment ready for the installation and having people ready for training. The deployment team has the parts and time available to set up the system and perform training.
Migration readiness applies when the customer is moving from an earlier system. Readiness means that procedures have been developed and tested for moving information or materials from the old system to the new, hat the old system has been prepared to extract the information or materials, and the newly installed system is ready to take in what is being migrated.
Deployment acceptance is the final check that the system works as expected and that the customer’s users have been trained. At this point the customer is ready to put the system into regular operation.

29.4 System deployment examples

Deployment follows many different patterns, depending on the kind of system and customer. The following four examples illustrate some of the range of ways that the general deployment step can happen.

Digital product. Start with a digital product, such as a software application. These are often deployed by the customer, and involve deploying no physical artifacts. The customer downloads the application over a network and runs an installer package to perform the deployment.

The deployment process begins with the customer ensuring they have the resources needed to support the application. This includes operating system and CPU architecture compatibility, and the amount of available memory and storage needed. The customer gets and checks this information, presumably online, before deciding to download and install the application.

Next, the customer downloads an installation package and runs the installation. The package performs checks to ensure that the application is supported in the local environment, and copies in the application contents. The download or the installation package may interact with the customer for payment or licensing.

This process is more or less the same whether the customer is installing a new application, or installing an upgrade to an application they already have.

At this point, the customer has an application they can use. However, they may not know now to use it yet. The customer can learn about the application using training media provided with the application. If the customer is updating an application, they usually look for information on whatever changes update might include.

The customer is responsible for copying in any information that they may already have that they want to use with the new application.

Most consumer applications provide some kind of customer service, which the customer can use to report problems they find or to ask for help. These services are often provided on line as web sites.

Consumer product. Now consider a simple consumer hardware product: a home light fixture.

In this example, the customer is responsible for all of the deployment steps. Unlike the previous example, the deployment involves hardware artifacts and includes steps required to maintain safety.

The customer starts the process of deploying a new light by determining what kind of light they need—in the ceiling, on the wall, stand alone, and so on, as well as the needed brightness and the electrical supply voltage. They then research what fixtures are available from their preferred suppliers, implying that an organization that is building light fixtures sends out specifications and advertising materials to those suppliers well before the customer goes looking.

Once the customer selects, purchases, and receives the fixture, they review the installation instructions that the team has developed and included with the fixture.

The customer then installs the fixture using those instructions. The instructions should include basic safety steps, like turning off power to the affected circuit before working with the wiring. The customer tests that the light works after it has been installed.

Complex system, shared deployment responsibility. The previous examples have been simple, performed entirely by the customer. The next example covers a more complex deployment.

Consider an information system that supports a repair and maintenance workshop. This example is based loosely on a system I worked on for local government public works agencies, which maintained a wide range of equipment from buses to lawn mowers to backhoes.

The repair and maintenance organization had multiple shop sites. Some shops were specialized for working on particular kinds of equipment.

The system provided record-keeping support for managing work orders (repair orders), scheduling resources like work bays or large equipment, and managing parts inventory. It also interfaced with the customer’s other IT systems: security and user authentication systems, and systems to place orders to buy parts and to pay for them.

For a particular installation, the customer asked for a set of features to be added to an existing software package. The development phase of the project for this customer involved work to determine their specific objectives and changes, implement changes to the base system, and then validate the customized system with the customer. Once the customer accepted the changes at the end of the development phase, a production phase generated the software installation packages and other materials for the deployment.

Physically, the system consisted of a small set of servers in a server room, plus workstations of different kinds at the workshops. This equipment used communication equipment between the sites and the server room. The server room provided power, cooling, communications, and support services like backup and security for the servers.

The customer wanted to perform a phased installation and roll-out, where initially only a few people would use the system and over time its use would be extended to more and more sites. The goal was to minimize risk by avoiding disruption to the shop’s existing work, and to contain any problems that might come up as the shop users learned to work with the system. A phased installation would also allow the customer and deployment team to monitor the performance of the servers and communication systems in order to identify unexpected behaviors before they caused problems. The customer decided to continue using their existing (paper-based) system for all existing work, so no data would be migrated into the new system.

The project’s deployment team was responsible for installing and configuring the initial system, and for training the initial users. The customer installed servers and communications, along with workstations at the shop sites. The customer was also responsible for adding users to the system and would take over training and configuration after the system had been rolled out to half the shop sites.

The deployment process proceeded as follows:

The customer installed servers and networking to host the shop management software.
The deployment team installed the software and verified that it was running correctly.
The customer installed workstations for the system administrators, and the deployment team trained the administrators.
The customer installed workstations at one test shop site. The deployment team trained users at that site. The customer’s IT staff and the deployment team helped the new users and monitored system performance for the one shop site.
The installation and training process continued with a second site.
After half the sites were set up and users trained, the customer accepted the deployment and took over responsibility for operations. The customer installed workstations at more shop sites and trained the sites’ users.
The deployment team remained available to help with questions and problems under a support contract.

This system thus had a phased transition between deployment and operation, rather than a hard split between one phase and another.

Spacecraft. Deploying a spacecraft covers the activities from when it is delivered to the integration site to be integrated into a carrier or onto the launch vehicle to when it is on orbit and ready to perform its mission.

The general sequence for spacecraft deployment is:

Operational training and rehearsals. The team that will be operating the spacecraft learn how to use the ground systems that will interact with the spacecraft and learn how the spacecraft works. The training includes doing rehearsals of events that might happen during the mission.
An operational readiness review, to verify that the operations team is ready.
Spacecraft delivery to integration site.
Final preparation and checkout. The spacecraft gets final checks to verify that it was not damaged in transit. Final data or software are loaded. Fuels and gases are loaded and batteries charged.
Integration with the launch vehicle or carrier. The spacecraft is mounted to its attachment points on the launch vehicle or in a carrier spacecraft. Shortly after this, the spacecraft becomes inaccessible.
Flight or mission readiness review. The final check that the team has completed all the work needed for the spacecraft to be ready to launch, including performing all planned checks.
Launch and deployment. The spacecraft is placed on orbit and released from the launch vehicle or carrier.
Startup and stabilization. The spacecraft turns itself on and stabilizes its state. This typically means stabilizing its attitude and spin, beginning to generate electrical power, and beginning to communicate.
Checkout. The operations team communicates with the spacecraft to ensure that it is in good working order. The operations team can address any problems that have come up during and after launch, and can take steps to calibrate on-board sensors.
Commissioning. The spacecraft is deemed operational.

When done with deployment, spacecraft is ready to perform its planned mission, is in communication with other systems, and operations team is managing the spacecraft

Deploying a spacecraft is different from the other deployment examples above in two key ways. First, a spacecraft poses far higher safety risks than the other examples. The deployment process reflects this by using procedures that have been designed and checked to meet safety constraints, and the deployment team are trained accordingly. Second, significant parts of the deployment occur beyond human access: while the spacecraft is on orbit, people cannot stop by to observe or fix a potential problem. The spacecraft’s design thus must provide sufficient information to the operations team on the ground to be able to detect and analyze problems without visiting the spacecraft. The operations team also uses detailed records of the spacecraft’s configuration, and so the production process must record all the details of what components were used, their provenance, and inspections of the work.

29.5 System operation

In this phase, the system is placed into operation. The customer uses the system, performing administration and maintenance as needed. Most of the system operation is the customer’s responsibility; in this section, I focus only on what the project does to support the customer’s operation.

The system operation phase affects the project team in two ways. First, the team will sometimes support the customer during operation. Second, point of the team’s work is to build a system that can go into operation, which means that the system’s design supports all the activities that the users will do. This includes the rare and exceptional activities, not just everyday usage, so these activities are included in the concept and specification to which the system is built.

The customer is responsible for maintaining the system. That may mean only following procedures for periodic checking, but for many systems maintenance can be far more intrusive, and involve regular replacement of some components. The customers rely on maintenance procedures that are designed as part of the system to keep the system operating safely; these maintenance procedures are designed to take safety, security, and reliability constraints into account. The customer also periodically orders replacement parts to install into their system.

The customer also takes care of their users. This includes adding and removing user access to the system and training those users. The project team supports these tasks by including features to manage users and their roles as part of the system. The team also develops training material that the customer uses when bringing on new users.

The system may have problems from time to time. These may reflect flaws in the system, improper usage, wear and tear, or combinations of all three. The customer, as the system owner, is responsible for handling the problems. However, the project team sets the customer up to be able to address problems by developing instructions for detecting and diagnosing problems, and training some of the customer’s staff on how these work. The project may also provide services to help diagnose and repair problems. The project also provides some form of customer support that the customer can use to report problems back to the project.

Most complex systems have human elements—users who operate the system and in doing so act as a control system that manages system behavior. As I noted in the previous section, these users can change how they interact with the system over time, finding shortcuts or using the system in ways they are not expected to. The customer establishes usage policies and performs monitoring and auditing tasks that check that people continue to interact with the system in safe and secure ways. The project team sets the customer up to perform this work by documenting what constitutes safe and secure system usage, including the rationale for why some interactions are acceptable and others are not.

Accidents happen. When some loss or injury occurs due to the use of the system, both the customer and the project team have a responsibility and interest to determine why the accident occurred in order to avoid future accidents. The accident investigation may also be mandated by regulation, in which case regulators are involved. The customer may be able to pursue the investigation on their own, if they have sufficient information about how the system is supposed to be used safely. The project assists in the customer’s investigation by providing that information, which includes the documentation of how to use the system safely, and why. However, for serious accidents, the investigation often requires a more in-depth understanding of the system’s design and implementation. The project prepares for supporting these investigations by maintaining complete records about the system’s concept, specification, design, and implementation, including explanations of the rationale for why choices were made and safety or security analyses that the team did about the system’s design.

Finally, the customer may find that their needs change over time, or that there is some aspect of the system that does not work as well as they had planned. These changes can be externally driven; for example, regulatory changes that affect the customer’s industry can affect what the customer needs from the system. The project team can receive change requests (along with problem reports) through a customer service mechanism.

Inputs and outputs. The operation phase is ongoing, unlike some other phases. It continues as long as the customer continues using the system. It is also primarily the customer’s responsibility.

The working system, as accepted by the customer at the end of deployment, is the primary input. That working system includes parts that support the customer’s tasks:

Training material for users.
Maintenance and repair procedures.
Replacement parts for maintenance.
Procedures for detecting, diagnosing, and recovering from expected problems.
Safety and security constraints, with associated design information and rationales, to support customers monitoring system operation.
A customer service mechanism that the customer can use to report problems, request changes, and initiate accident investigations.

From the point of view of the project, the customer’s operation produces a few outputs:

Replacement part orders.
Problem reports.
Change requests.
Accident investigation requests.

Milestones. Most organizations require some kind of authorization to operate in order to place a system in operation. This is typically a review that all of the system deployment steps, including acceptance, have been completed successfully and that the system meets the customer’s policies. All these steps should have occurred earlier, and the authorization to operate is usually just confirmation that none of the steps were skipped.

The system remains in operation as long as the customer chooses and as long as they maintain the system in good repair. The customer thus periodically performs maintenance tasks and audits that usage remains safe and secure. The customer periodically determines—perhaps implicitly—whether to continue the system in operation.

29.6 System operation examples

Operations vary widely depending on the kind of system. Here are some examples illustrating the range.

Consumer product. A consumer product is generally the responsibility of its users. The development team is responsible primarily for designing a system that the users can understand and providing enough documentation or training material so that the users learn how the system works. The development team also provides documentation on any cleaning or maintenance tasks the users should perform.

Some consumer products can require occasional more complex maintenance, and a product team might offer a maintenance service in addition to the system itself.

Aircraft. Operating a commercial aircraft is a joint endeavor between the air carrier and its staff, the manufacturer, and the civil aviation authority (CAA). While the carrier’s pilots are responsible for an aircraft in flight, the carrier has overall responsibility for safe operation. The carrier is responsible for setting policy and training its staff in order to meet CAA regulation. The manufacturer supports the carrier by, first, getting type certification for the aircraft design, and then providing the carrier with documentation on the general limitations of the aircraft’s design.

The air carrier is generally responsible for ensuring all its employees and contractors have training and know which rules to follow—pilots, flight attendants, ground handlers, maintenance personnel, dispatchers and so on. Individual people are responsible for complying with the rules and limitations of their certificates—pilots, dispatchers, and mechanics, for example.

The manufacturer works in concert with the air carrier and repair facilities to develop training materials and is responsible for promulgating maintenance documentation, including service bulletins generated from operational reports back to the manufacturer about problems discovered through use of the aircraft. This means that the project team develops this material during the development phase.

If there is an incident or an accident with the aircraft, the carrier typically works together with the CAA and other government organizations as well as with the manufacturer to investigate what occurred. The records of the aircraft’s design and manufacture, along with safety analyses, implementation, and verification, are one of the inputs to these investigations.

Summarizing, the project team has the following responsibilities that affect operations:

Maintaining records of the specification, design, analysis, implementation, and testing of the aircraft.
Maintaining records of the construction and acceptance testing of the aircraft.
Coordinating with the carriers that fly the aircraft and the CAA to collect maintenance reports, work out corrective actions, and distribute the procedures back to the carriers for implementation.
Maintaining a supply of spare parts to support aircraft maintenance.
Coordinating with carriers and the CAA to support accident investigations, which depend on the records about the aircraft.

Uncrewed spacecraft. Unlike the other examples, an uncrewed spacecraft is operated completely remotely. The only way to interact with it is through command and telemetry communication channels. Without the ability to interact physically with the spacecraft, its operators rely on design records and hardware instances on the ground to interpret the information they receive.

A spacecraft is typically managed by an operations team. This team uses ground systems—which are designed and implemented as an integral part of the overall mission system—to watch the telemetry sent by the spacecraft and send up commands. The operations team plans upcoming activities for the spacecraft, such as observations to take or maneuvers to make, based on mission plan. The team uses design information about the spacecraft’s capabilities to determine what activities to plan, and the order in which different steps must occur. The team turns these plans into commands that are sent up to the spacecraft, which then follows the commands. The spacecraft sends telemetry messages down to the ground systems. The operations team processes and interprets this data. They use information about the sensors generating the information, such as records of how the sensor has been calibrated, its position and attitude on the spacecraft, and the format of data it sends.

The operations team also monitors the telemetry data for off-nominal conditions. It detects that the spacecraft has had a problem by comparing the data received against what is expected from the plan, such as expected attitude information, and looking for data values that are out of normal range, such as a high temperature or low battery voltage. After identifying that a problem has occurred, the operations team looks for the causes of the problem and then works out how to return the spacecraft to normal operation. The investigation relies on the spacecraft’s design records. The team often uses simulation models or duplicate spacecraft systems on the ground to see if they can replicate the problem and to verify that any recovery plans will work as intended. Once they have a plan, they formulate the corresponding commands and send them up to the spacecraft.

For example, consider the first crewed Starliner CST-100 flight [Foust24]. During the early part of the flight, several thrusters began showing poor performance that led the flight systems to shut them down. Even though the spacecraft was carrying crew and eventually docked to the International Space Station, no one could physically access the thrusters to determine what had happened. In the end, teams on the ground replicated the performance problems using duplicate thruster units. Having learned the likely cause of the failures, NASA changed the flight procedures for departing the ISS and returning to ground. (The agency also determined that the failures posed sufficient safety risks that the vehicle did not carry crew on the return to Earth.)

Factory system. Consider a generic plant that produces chemicals. Its operation involves multiple chemicals that can cause serious injury and death to both workers and the surrounding population in an accident. While parts of the plant’s operations are automated, there are many manual operations—cleaning, responding to a failure, maintaining machinery, and so on. The plant, therefore, relies on its operators following safe procedures. This generic example is inspired by several real-world examples; see Leveson [Leveson11, Section 2.2.4] for one relevant case study.

Chemical plants are subject to incentives that work against safety. The desire for profitability leads to streamlining operations or shutting down safety-specific systems, which then break safety requirements. Individual staff are likewise incentivized to work quickly, and often look for workarounds that make their jobs easier or faster. These can also break safety requirements. Finally, staff turnover leads knowledge gaps at all levels, so that workers and management don’t know what is needed to maintain safe operation.

Plants like this are operated by a company. The company’s upper management are the ultimate authority that is responsible for safe and profitable plant operation. They set policy for how the plant’s workers will balance profitability against safety. The plant management act on this policy to run the plant, making specific operational decisions to set procedures. The plant workers then follow the procedures to operate the plant (or shut it down when needed).

The hierarchy within the company forms a control hierarchy, involving decisions, feedback, and commands. Upper management sets policy, gives instructions to plant management, and observes feedback metrics. Plant management give instructions to staff, adjusting those instructions to meet the company’s policies. The staff in turn control portions of the plant.

Two steps are needed for this control hierarchy to keep the plant operating safely. The first is that everyone working on the plant or overseeing it must have an accurate understanding of how the plant has been designed for safety. The project staff who design and build the plant make this information available to people in the company, both as reference documentation and as training material. The second is that the behavior of each level of the control hierarchy must be regularly monitored to ensure that the people are operating their part of the system consistent with safety designs. If there is a deviation from safe practice that violates safety constraints, the company takes corrective action to stop the unsafe behavior. This is true at all levels of the company, and especially for upper management: cost-cutting measures meant to improve profitability are a common cause of accidents, and upper management must be answerable to checks that will prevent such decisions.

These control systems are part of the system to be designed and implemented during the system’s development phase. Accurate controls do not arise spontaneously; they come from intentional design. A safe system’s implementation defines roles for upper management, plant management, and plant staff, and includes the procedures that each is to follow. These procedures are verified both analytically and (where possible) by testing, in order to ensure that each level will behave in ways that keep plant operation safe. The analyses account for human factors—what kind of information each role can receive, how likely that is to convey the correct understanding of what is happening in the system, the incentives driving people in each role, and how accurately they can implement instructions.

In some cases, auditing operations will find that people are not following the designed procedures but that these changes do not pose a safety risk. These changes first must be checked thoroughly for evidence that they do not violate safety constraints in the system. If they are found to be acceptable, they should lead to a formal change to the documented procedures (in the form of a change request; see Section 29.7 below). The documented procedures must always remain consistent with what people are actually doing so that all staff clearly understand what is acceptable operation and what is not.

29.7 System evolution

The system evolution phase is about making changes to the system after it has been released and potentially deployed to customers. These changes can happen for many different reasons—such as a planned roadmap for adding to the system over time, requests for changes from customers, fixing problems, or changes in regulation. System evolution can occur in parallel with system deployment and operation.

Overall, system evolution is a recapitulation of system development (Chapter 28). It involves working out a purpose for the change, a concept for how the system will work when changed, leading to specification, implementation, and verification. These steps use information about what has already been specified and implemented in the system, along with the reasons why it is that way, to work out how to make changes that achieve the desired results without disturbing the system’s existing behaviors.

Making changes starts with a change request. In whatever form the request takes, it identifies who is asking for a change, what their purpose is in the change, and why it is worth doing. In practice change requests are usually maintained in a database. Requests can come from many sources. They may be part of the project’s long-term plan to continue developing the system. They may come from customers, who ask for new or changed capabilities. They may stem from the investigations into reported problems or accidents, in order to avoid problems in the future.

A project does not act on all change requests. Some of them will be technically impossible; some will be infeasible because of time or resources. Others might be reasonable requests that have to wait until higher-priority requests have been addressed. The team looks at each request received to determine its importance, its feasibility, and its cost, and makes decisions about whether to accept or reject the request based on the analysis. If a request is accepted, the team determines a relative priority compared to other work or a potential deadline. These are used in planning the team’s upcoming work (Section 20.6).

Determining whether a request is feasible involves determining how much of the system will be affected by a potential change. While working out the concept for the change, a team member determines what parts of the system will be affected, using documentation about the system’s structure and design (Chapter 12). The result is a preliminary analysis listing the set of components that will be changed and the general nature of those changes. This information is then used to estimate the effort that will be needed to design and implement changes for the request.

Changes happen iteratively. There may be multiple iterations in progress concurrently if multiple changes have been accepted. Handling multiple concurrent iterations requires careful configuration management discipline (Section 17.4).

Making the changes involves changing the specifications and designs for affected components. These changes can be difficult to make accurately because they are done to an existing, complex set of relationships between components. Making a change without causing flaws depends, then, on being able to accurately understand the structure of the system and how parts of that structure contribute to emergent properties like safety constraints. This relies on having rationales, analyses, and earlier designs available, so that people can work from an accurate information base.

Once a change has been specified, designed, and implemented, it is verified. Verifying the work for a change has two parts: ensuring that the modified system meets the new specifications and the purpose of the change request, and ensuring that the rest of the system continues to work correctly.

Once the changes have been verified, they can be deployed to customers as an upgrade or incorporated in new deployments, using the production (Section 29.1) and deployment (Section 29.3) patterns already discussed.

The team continues to evolve the system until the team is relieved of responsibility for fixing problems or when the system is taken out of operation.

The overall process for system evolution includes:

Receiving change requests. These may include problem analyses, such as root cause analysis or accident investigation reports.
Triage the requests to determine the importance of addressing each one. Perform a basic analysis of the affected scope and the feasibility of a change.
Decide on which change requests to address and which to reject, and the relative priorities for requests to be addressed.
Reiterate the development pattern.
- Update the purpose and concept to include the changes. Identify what parts of the system will be changed to address this as part of the concept.
- Modify specifications and designs.
- Update implementations.
- Verify the changes. This includes verifying manufacturing changes, training changes, maintenance procedure changes, and similar material.
- End with acceptance test. This might include a beta test with customers.
When development is done, reiterate production and deployment. Manufacture the update packages: new hardware components to be deployed, software updates, new procedures and training. Deploy the updates, either by moving in new hardware or applying digital updates, along with training.

Inputs. The system evolution phase starts with change requests. A change request is a record of the desired new behavior or properties, or the problem that should be fixed. It also records who is making the request, their reasons for doing so, and information about priority or deadlines if appropriate. The change request may reference incident analysis reports or other background information needed for context.

The evolution phase will take in the current development plan and the current system.

Outputs. The primary outputs are updates to the system artifacts, including updated concept, specifications, design, rationale, and verification artifacts. These artifacts feed to production and deployment phases, which result in other outputs.

The development plan is updated as a side-effect of deciding whether to process a change request, and its priority or deadlines if so.

Milestones. There is one milestone unique to the system evolution phase: the decision whether to proceed or reject a change request.

In addition, this phase incorporates all the milestones associated with the development pattern while developing a new version of the system.

29.8 System evolution examples

Consumer software. Many consumer applications are released initially as a simple initial version, with a roadmap to add features in future releases. This approach lets the developer test the market and develop awareness of their application as early as possible, and with the least investment possible before adapting to customer needs.

These upgrades are often planned to be released on a regular schedule, with a plan or roadmap of what new capabilities will be released each time. Additional bug-fix releases are released as needed between the planned upgrade versions. These are driven by a balance between problem fixes and the roadmap, which is updated by a marketing team listening to customer requests.

Spacecraft. In most missions, the spacecraft hardware cannot be changed once the spacecraft is launched. The opportunities for evolving the system are to update on-board flight software and ground systems.

Flight software is updated for several reasons: correcting bugs found after launch, adding fixes to work around hardware problems discovered in flight, and adding new capabilities. New capabilities might include new kinds of data analysis and science operations, such as the autonomous dust devil detection uploaded to the Mars rovers [Castano06]. The project team develops and tests these software updates using simulations and replicas of the spacecraft on the ground before risking sending changes to the spacecraft. This test equipment is an important output of the original development phase. The ability for the spacecraft systems to continue functioning even after a buggy software update is also an important system property, often addressed using internal fault detection, software rollback, and “safe modes” where the spacecraft operates with only a minimum amount of well-tested software running [Wertz11, Chapter 14, p. 410].

Flight software updates are driven first by problem fixes that are needed and second by mission opportunities to use new capabilities. It is uncommon to plan to regularly produce new flight software versions during a mission.

Ground systems are easier to update, since people can access them directly. For example, a mission can add new ground communication stations or upgrade the workstations and severs in mission control. New mission planning or data analysis tools are regularly tried out during a mission. Some ground system updates are planned on a regular schedule over the course of the mission, though more happen when problems or opportunities are identified.

Some spacecraft mission systems in recent years have tackled in-flight upgrades. The GPS constellation is regularly updated with new spacecraft [Albon24]. Low Earth orbit constellations, such as the Starlink communication constellation, use spacecraft in low orbits that have intentionally limited life spans, and they are regularly replaced with newer-generation spacecraft. The System F6 project, on which I worked, looked at flying in new capabilities over time [LoBosco08].

Factory system. Consider the chemical plant example from the previous section. Over the plant’s life, there can be many reasons why the plant will change from what was originally implemented. New technology can become available that will improve the factory’s operation. Parts can wear out and need replacement, but duplicate parts might not be available any longer and a substitute must be found. The factory’s chemical process may be changing to meet new demands, leading to changes in the plant’s equipment. And finally, there will be changes to operational procedures as noted in the previous section.

All of these involve changing the design of the plant. Following the pattern for system evolution ensures that the necessary design and implementation steps are done so that the plant continues operating safely.

For example, when substituting a different model part for one that is not longer available, there are a number of questions to answer. Does the replacement part meet the functional and safety assumptions of the original? Will the replacement fit into the physical space available, and connect to other parts properly? Does it fit into the control mechanisms, both automated systems and manual control? Is the replacement manufactured with equivalent reliability, and does the supplier provide the same assurances about provenance? How do maintenance and operation procedures need to change to reflect the substitute part?

29.9 System retirement

No system lives forever, and most are deliberately taken out of service when their usefulness has ended.

Most systems continue in operation until there is a decision to retire them. For some systems, this comes when the purpose for the system has been completed—for a spacecraft mission, for example. For others, it comes when the system has worn out enough that ongoing maintenance and repair costs outweigh the cost of replacement, such as for vehicles that wear out. Yet others are replaced because newer systems become available that can meet the customer’s need better.

A system being retired and disposed of typically goes through three periods. In the first period, the system is in normal operation, but the decision has been made to retire it. During this period people plan how to shut the system down and transition its functions or information. They should conduct dress rehearsals to verify that the procedures will work as expected. The system then enters the second period, where it is no longer in normal use but may remain at least partly operational to support transition and archival. Once those are verified complete, the system is shut down for the last time, is dismantled, and its resources are disposed of.

There are two primary aspects of retiring a system to consider: what to do with information or materials that should be migrated to a new system, or archived, and how to dispose of the artifacts that make up the system.

I discussed migrating into a system in Section 29.3 above. The task of migrating out of a system is part of the same process, involving developing a plan for migrating information or materials from the old system and into the new.

There will be other information that people will want long after the system has been retired, in many cases. This can include logs of system activity or user access that may be needed for later accident investigations or legal inquiries. It can also include information or materials that the system processed that are not being migrated to another system, but that may be valuable in the future. This information is moved from the working system to some kind of archive. Developing the procedures to archive the information, how the information will be organized, and the system to hold the archive requires development on its own, just as migrating information from one system to another does. This development phase involves determining what information needs to be archived and how it will be used once it has been stored, which in turn leads to a concept, then specifications, then a design.

Archived information is usually retained for a long term. If a system has been used for business or manufacturing, retention is mostly governed by regulation—anywhere from one to 30 years in the US, depending on the kind of information. Scientific and medical data is often of value indefinitely, though legal retention requirements may be shorter. Scientific data is often re-evaluated decades after it was first gathered; for example, data collected from the Viking landers on Mars in the mid-1970s was re-interpreted thirty years later after other missions gathered more information about Martian soil composition [Navarro-Gonzalez10]. This particular example also illustrates a problem with many data archives: the mission data were recorded on microfilm and had to be scanned to get digital data to process.

Long-term archival media often have two problems. First, the media wear out and decay over time, which has led to information believed to be safely archived to be found to be unreadable [Purdy24]. Second, even if the media are readable, there may no longer be machines that can read them. I have a number of backup tapes for which I have not been able to find a drive to try reading them.

Sometimes physical artifacts are retained from a retired system. It is common to keep parts of aircraft and spacecraft in museums after they are retired, for example.

Disposing of system artifacts can range from trivial to complex. Erasing a software application and its data, for example, is easy; once the storage media have been erased, there is no further meaningful trace of the system remaining. Disposing of a system that processed hazardous biological or chemical materials, on the other hand, can be difficult.

The retirement and disposal procedures must be secure. An unauthorized attempt to shut down a working system can cause major losses, and can lead to safety hazards. Information and materials are being moved around during migration and archival, and are potentially accessible to being copied or corrupted. Physical artifacts that are being decommissioned can carry confidential information about both the way the system works and about the customer that has been using the system.

Inputs. Retirement begins with a system in operation, along with records of its specification and design.

Some systems develop data archival, shutdown, and system disposal procedures during the development phase. If so, then these are input to system retirement. If not, then the procedures are developed during the retirement phase.

If the system’s function is being migrated to a new system, the specification and design of the new system is an input, and is used to develop a migration plan during the retirement phase. An unpopulated but functional installation of the new system is also involved.

Outputs. There are three kinds of outputs from system retirement:

A new system that includes information and materials migrated from the system being retired, if appropriate.
Archived information and artifacts.
Physical resources and debris. Resources that have residual value or are reusable are separated from debris that cannot be reused feasibly.

Milestones. The overall retirement phase starts with a milestone decision that the system should be retired.

After that, the three threads of activity—migration, archival, and disposal—each have readiness milestones for reviewing and approving a plan for each, and a verification milestone to confirm that each was completed correctly. The disposal readiness milestone also checks that migration and archival have completed.

There is also a decision milestone to determine when the running system should be taken out of service in order to start migration to a new system and archival.

29.10 System retirement examples

There are many different ways systems are retired. Here are three examples that illustrate different approaches.

Simple software system. When retiring software such as a workstation or phone application, the objective is to remove the software from the system on which it runs, so that none of the software or its related files remain. This is typically done by running an uninstall program that is set up to remove any files that were added on installation, plus any internal files that might have been created (configuration, logs, caches). This uninstaller is typically developed as a part of the application and packaged with it. In some cases, the software can be disposed of by erasing the storage devices that held the software and its related files.

Sometimes retiring an application means that the server on which the software was running is no longer needed, and so the server can be retired. Disposing of the server is similar to disposing of a vehicle, as discussed next.

Vehicle. Retiring a vehicle, such as a car or aircraft, involves getting rid of the vehicle’s physical parts while recovering as much value from the parts as possible. At the same time, records about the vehicle are retained for longer in order to meet financial record-keeping needs as well as supporting analysis of maintenance or reliability for other similar vehicles.

The overall process is:

Take the vehicle out of service, meaning that it is no longer available for regular use.
Remove any hazardous materials. This can include fluids such as fuels or lubricants, batteries, and other toxic metals that are not encapsulated in other parts.
Disassemble the vehicle, recovering reusable parts and sorting the rest as scrap for recycling.
Update the records of any parts that are being saved for reuse.
Archive financial, maintenance, and usage records about the vehicle.
Clean reusable parts and add them to inventory for reuse (or deliver to someone who will use them).
Transport recyclable scrap to organizations to recover the material, and dispose of the remainder.

Spacecraft disposal. The objective when retiring a spacecraft is to ensure that it will pose no future hazard to the Earth, other spacecraft, or other bodies. Some of the most important hazards are impacting the Earth and causing damage or injury; colliding with other spacecraft; or contaminating other planets or moons that potentially carry life. Collision can occur either with the whole spacecraft, or with fragments of it if the spacecraft breaks up on orbit. Interfering with radio spectrum is another, though lesser, hazard.

There are four approaches usually used to retire and dispose of a spacecraft.

Cause it to enter an atmosphere and disintegrate, so that no parts remain in orbit and no parts will pose a risk of falling on people or property. This is used for spacecraft in low Earth orbit, such as small spacecraft that burn up when entering the atmosphere and large spacecraft that are directed to impact in the deep ocean. It has also been used for deep space probes around Jupiter and Saturn to avoid the spacecraft impacting and contaminating moons there.
Cause it to enter the Earth’s atmosphere, land, and be recovered. This is used for crewed missions and returning cargo from low Earth orbit. The returned spacecraft equipment is often re-used for later flights.
Cause it to impact another body, such as the moon. This has been used for Apollo hardware and science missions such as LADEE [LADEE13].
Place it in a parking orbit that is stable and reserved for retired spacecraft. This is used for spacecraft in Earth geosynchronous orbit, which require significant energy to deorbit.

If a spacecraft is going to remain in orbit after its useful mission is complete, such as if it is being placed in a parking orbit or being left in a low decaying orbit to enter the atmosphere passively, then regulations require passivating the spacecraft. This involves removing any energy that could cause the spacecraft to explode, change its orbit, or activate radios—eliminating ways that it could cause collisions or interfere with communications. This typically involves venting any fuel and other gases or fluids and permanently shutting down any electrical systems.

All of these disposal approaches can experience problems. A spacecraft may lose its communication capacity before passivation commands have been sent to it. Thrusters may fail, interfering with the ability to put the spacecraft into an orbit that will enter the atmosphere or impact as planned. The design of the disposal methods must account for these potential problems, and safety analyses must show that the spacecraft and its procedures will avoid the identified hazards with acceptable likelihood.

The NASA life cycle standards require that a mission develop the plan for retiring and disposing of a spacecraft during the development phase [NPR7123]. This includes the plan for how the spacecraft will be disposed of, including meeting safety requirements. The plan must also include the procedures for archiving all mission and project data.

Sidebar: Summary

The operations phase takes an implemented system and puts it into use as a system instance.
The main flow is: production (or manufacture), deployment, operation, retirement.
System evolution runs in parallel; it is a recapitulation of the development phase pattern.
There are many things that are developed to support production, deployment, or operation besides the core technical parts of the system itself.

Chapter 30: Project ending

When a project ends, there are three objectives: completing obligations and support for stakeholders (Section 16.2); saving information and artifacts that might be needed in the future; and releasing resources that the project used.

Note that ending the project is separate from retiring any particular instance of a system. Ending the project is about stopping development and support for a system product, independent of whether there are instances of that system in operation or not. Some projects will combine these, such as for exploration space missions that build and fly one spacecraft.

A project might end for one of many reasons. It might have a fixed term or have completed a defined system deliverable. It might run out of money or time. It might no longer fit the organization’s or funder’s strategy, perhaps because a better replacement system is planned. Competitors might have won over customers and there is no longer demand for the system. The team might be unable to deliver, with the project behind schedule or over budget or lacking key features.

The first step is a decision to wind down the project. This is typically a decision made by the organization that hosts the project, or its funders; the project staff generally do not make the decision on their own.

A decision to end the project is followed by a plan for how to do so, which defines the steps the team will take to meet the final objectives. The plan typically gets review and approval before proceeding. (In some environments, at least part of the plan must be worked out early in the project, long before any decisions are made.)

The following sections list some of the steps that are involved in ending a project. The specific steps will depend on the project; for example, not all projects have contracts with funders that must be closed out. The team can use this list to help build the plan, bearing in mind that some steps should be done in particular orders. Customers should be notified of the project’s impending end before ending contracts; shutting down production should happen only after all upgraded components have been manufactured.

Obligations to customers. If there are any system instances still in operation, the first step is to let those customers know that the project is ending. If system instances are owned and operated by customers separate from the project, then they will want to work out how to keep their system in operation after project support ends or they will decide to retire the system. The terms on which the system is licensed may affect what the customer can do after the project ends.

The project develops any final updates or fixes for the system, and releases them for deployment along with appropriate documentation and training. The project builds or acquires such spare parts inventory as is needed for remaining customers before shutting down production.

If there are contractual relations with the customer, the contract is closed out. This might include final billing or payments, or other deliverables.

Finally, the customer service mechanisms are shut down.

Obligations to team. Ending a project means loss of work for everyone working on it. It can also mean the loss of social relationships.

The first obligation to the team is to keep them informed once the decision has been made to end the project. The people should understand why the project is ending, the plans or timeline for winding down, and their roles during that time.

People will be needed on the project for different lengths of time. Some roles will end shortly after starting to wind down the project, such as doing development of new system changes. Other roles will last to the end, such as closing out finances and contracts. Each person needs an expectation of how long they will be needed so that they can make plans for what to do next. (In some jurisdictions, notices of layoffs are required well in advance.)

At the same time, many people will have incentives to move on to something else before their project role is complete. The plans for ending the project must take this into account, and often include incentives for people to stay on as long as they are needed.

Finally, the team’s experience represents an asset. These people can be a resource to other projects in their organizations. Helping people transition can help other projects and, done well, generates good will that helps incentivize people not to leave early.

Obligations to funders. Some projects will have contracts or other agreements with funders. These projects provide final reports and other deliverables to the funder. They can then finalize financial accounting with the funder and close out the contractual relationship.

Obligations to regulators. Some projects for systems in highly-regulated industries may need to work with their regulators when the project is shutting down. This might include filing notices that the project is ending. The project is responsible for determining what other requirements their regulators may have.

Obligations to organization. The project takes two final steps: saving information and releasing resources.

There are several reasons that information about the project may be needed in the future. There may be a need to restart the project, in which case the new team must be able to learn about the system’s design and implementation, as well as the reasons behind its design. The intellectual property in the system may be valuable for licensing or sale. There may also be investigations related to the system or the project that need information about how the project was conducted.

The project may archive the artifacts needed to restart the project. It may also archive records of project execution, known issues, and any plans that will not be completed. Some projects will archive physical artifacts: molds and forms that support production, for example; some artifacts may be kept for museums.

The end of the project is time to gather a retrospective on how the project went. A bit of introspection about what went well and what didn’t will help people on the team to do better on future projects, and helps build institutional knowledge.

Archiving project information has security concerns. The process of moving information to an archive must maintain the information’s integrity and confidentiality: it must not be modified, lost, or disclosed during the move. After that, the archive must maintain the information’s integrity and confidentiality.

The project also releases the resources it has held. This includes:

Shutting down production lines and supply chains that supported production;
Releasing office space, lab space, lab equipment, servers, and purchased licenses; and
Disposing of or recycling physical artifacts used in the project.

Lastly, the people on the team will move on as discussed above.

30.1 Project cancellation

Some projects end because they are canceled, even before they have completed their development phase. Anecdotally, it seems that more projects are canceled than go to completion—this is a consequence of using competitive approaches to programs, and the net effects of competition are generally regarded as valuable. The information in this chapter applies to canceled projects just as to other projects.

Consider two examples, based on projects I have worked on.

In the first project, the team was writing a proposal for a US DoD spacecraft system. In the proposal-writing phase, the team has to establish the basic architectural and management approaches for the project, show they meet the department’s needs, and establish the price at which the team proposes to build the system. The team progressed through establishing the initial concept and architecture for the system, and we began evaluating the solution to see how good a job it would do for the customer and how much it would cost to build it.

We had a checkpoint milestone where we reviewed what we had found. At that review, it became clear that while our team had a decent solution for the needs, we did not have a great solution, and that other companies we expected to propose designs would likely have better solutions (because they had more experience in a couple of key technical areas). We made the decision not to pursue the proposal.

This was a good decision. Assembling a proposal is not a small task; we had a team of about 15 people working long hours. For US government projects, the proposer generally pays for the proposal development. Choosing to spend our team’s time and money on this project meant that the team couldn’t work on some other project. We judged that the opportunity cost was not matched by the probability of successfully winning the contract, so we freed up the team to work on a different system that did prove successful. If we had continued to work on the original proposal, we would have spent the budget available to develop proposals and could not have spent it on the proposal that succeeded.

In the second example, a different US DoD spacecraft program, the team was about two years into a multi-year contract. The team had performed excellently in a competitive first prototyping phase, and was the only team to be selected to move on to a second phase for building an initial working version. A key subcontractor on the team had staffing and management problems, and were not delivering results. Within the team we were struggling to fix the execution problems or find another way to build the necessary components, all the time keeping a large staff on payroll and running through budget. While the technological solutions for many system capabilities were probably sound, the team could not deliver. The customer observed the problem, and after working with the team to try to resolve the problems, went through the process to cancel the project.

This was also a good decision. In hindsight, the team lacked necessary capability in the subcontractor and in the project management team. If the project had been allowed to continue, it is unlikely that the team would have solved the problem and more money would have been spent without benefit in the end.

The take away from these examples is that there are many sound reasons for canceling a project. Sometimes the cancellation is designed in (as with competitive acquisition); other times it is because continuing to invest money, time, and the care of the team building the system has become unlikely to pay off.

For a more general discussion of US DoD project failures, see the report by Bogan et al. [Bogan17].

Sidebar: Summary

A project ends when there is no more work to be done on the system.
- This is independent of whether there are system instance being retired.
The work include:
- Finishing support.
- Archiving key information.
- Closing out contracts and cleaning up resources.

Chapter 31: Using the reference life cycle

25 September 2024

The last several chapters have presented a reference life cycle pattern. This pattern is intended to inspire thoughts about how a project can organize its own work. It is not in itself ready to use off the shelf. Each project will have its own needs, its people will have preferred ways to work, and some projects will have life cycles mandated by regulation or industry standards to follow.

Figure 31.1: The entire reference life cycle.

XXX purpose of the life cycle – get a system built, deployed, and sustained that meets customer and other stakeholder needs – in doing so, develops the whole system including all its development artifacts, not just the end product – plea for thinking about flexibility with discipline, and the idea of the artifact edifice

The reference pattern does not discuss the roles involved. A full definition of each phase will include definitions of who performs different tasks in each phase, and in particular who is responsible for milestones. I argue elsewhere (e.g. Section 8.2.6) that, to be meaningful, reviews must be done by people with an independent perspective on the material being reviewed, and that approvals must reflect a check on the work fitting into the project’s big picture.

The objective for any project is to develop and adopt a life cycle that meets its needs. In the next section I discuss several principles that a good life cycle will follow; these can help people evaluate a life cycle they are considering. Some other considerations include:

Is this project intended to deliver a system once, perhaps with upgrades and fixes after delivery, or will it evolve the system through multiple releases for a long time? A system that is expected to have multiple releases needs more thinking about how to sustain the development over time and how to support customers at the same time as developing new releases.
What make-or-break events are foreseen during the project? These include things like getting funding, aircraft type certification, or spacecraft launch. The life cycle should include these milestones explicitly, and include the work to be done to prepare for them. In many cases these will interrupt what might otherwise be a continuous development flow. (See Section 31.2 below for examples.)
What regulatory constraints are there on the project or the system? Are there work products or milestones needed to satisfy regulators?
What development methodology will the project use?
Is the system intended for a single customer, or is it meant to be a product marketed to multiple customers? If there is a single customer, they can be included directly in the life cycle—especially if the customer is internal. If there are to be multiple customers, a marketing team may have to stand in for customers, and they may need time to investigate particular market segments before they can accurately reflect potential customers.

The project works out what life cycle patterns it uses, and documents the patterns. This effort starts during project preparation (Chapter 26). It does not necessarily need to define the entire life cycle all at once; it can be done iteratively, as long as the work keeps ahead of what the team needs. In practice I have found that enough should be completed in the project preparation phase that the team understands the general complexity of the work ahead, has chosen a development methodology, and can name major milestones they will need to meet. The remainder of the life cycle can be worked out during purpose and concept development and likely will be refined or adjusted as the project moves along. (I worked on one project that had a limited budget, and spent much of that budget writing elaborate management and engineering plan documents before even beginning to work out the high-level system concept. The result was a pile of such documents that were never looked at again, which was a waste of their efforts.)

In no case should the team get ahead of the defined life cycle. See Section 8.1.5—Principle: Team habits for a discussion of this principle.

The life cycle patterns have value only if the team actually uses them. This means that the team must know that the patterns exist, understand them, and agree that they are useful. The people in the team must also understand that they have a responsibility to follow the patterns, or to raise an issue when they find a problem with the life cycle’s definition. Achieving these means educating people as they join the team about what the life cycle is and how to learn about it, as well as monitoring that everyone actually follows the patterns. The team can also learn about and accept a life cycle a bit more easily if they are involved in developing the patterns; at minimum, they should be able to give feedback before the patterns are adopted.

The life cycle patterns are documented in a way that the team can find them and learn about them when they are joining the team and when they need to refresh their understanding of how some step works. The documentation is an artifact that should be managed using the principles in Section 17.4: it should be versioned and under change management; it should be stored in a way that the team can find it when needed; and it should be secure enough that it will not be tampered with.

There is no one right way to document them, as long as the documentation is well-organized and accessible. Some organizations prefer to define the life cycle in a prose plan document, which can be printed in its entirety if needed. I have had some success maintaining the documentation in a wiki or in a collection of web documents; the advantage of these is that they allow linking between parts of the document. The patterns should be explained and listed explicitly; they should not be hidden in a workflow system that doesn’t let team members see and understand the whole context for their work (see Section 4.6 for an example).

The documentation for each phase or step in the life cycle should include the information listed in Sections 23.5, 23.6, and 23.7.

31.1 Meeting life cycle principles

In Section 23.10, I listed principles that a life cycle pattern should meet. The reference life cycle pattern in this part reflects these principles, though it cannot address all of them. Here are ways that a life cycle built using this reference as a base can address them.

Know the purpose for something before developing it. The development phases in the reference life cycle all start with a purpose development step, in which the purpose for the system, component, or feature gets worked out before proceeding on to concept and design. The system evolution phase reiterates these patterns.

The project preparation phase is a time to think about the purpose for the project as a whole, and to work out the purposes for the different aspect of project operations; for example, what the team organization should achieve, or what is expected of life cycle and procedures.

Documenting these purposes means that when the questions are revisited—and they will be—people can understand the reasons why decisions were made, instead of forgetting why and making up new and probably different reasons.

A good life cycle definition will ensure that these phases have review and approval milestones that check that the purpose has been worked out and documented.

Build in time for and incentivize deliberative thinking. The concept steps in development and evolution support this kind of deliberation, as long as the team culture actually incentivizes taking the time to work through a concept deliberately.

The procedures and instructions for reviews complement the life cycle patterns by prompting reviewers to ask questions about deliberations taken, and encouraging them to reject work that has not been thought through. Again, projects that are in a rush will tend to disincentivize this, usually storing up trouble for themselves for later. The project leadership can create an example and incentivize taking enough time to think.

Assign decision-making authority to an appropriate level based on the nature of the decision. The reference life cycle does not address this as written. The structure of the team, and how roles are organized in the team, complement the life cycle and determine how authority is distributed. The specific decisions about what role can take what decisions is encoded in the details for life cycle phases and in the procedures that apply during those phases.

For more details, see ! unknown reference XXX.

Build in ways to check work, and design them so they are a team norm and not prone to triggering defensive reactions. The reference life cycle includes reviews at regular points in the work in order to support this principle. The definitions of procedures for reviews augment the life cycle by making it clear what is to be reviewed and how people are to go about the reviews.

Build for the longer term. The reference life cycle supports this somewhat by providing development steps when thinking about how to design for the long term and when documentation to support future revision can happen. A project can define more specifically what kinds of documentation is expected from development phases, and review procedures can make it clear that such documentation must be provided before a piece of the work advances in its development steps.

Project-wide decision points. I have pointed out some times when the life cycle might have review and decision points, such as after purpose development, during concept development, and in the acceptance phase at the end of development.

Think about exceptions that might happen, how to handle them, and when to change course. I have not tried to address this principle in the reference life cycle. Working out how to handle exceptions is a process like designing for safety or reliability (Chapter 44): working out the kinds of hazards (exceptions) that might be foreseen, then deciding what should be done about each one. The particular kinds of exceptions depend on the project: a delay in getting a new funding round affects a project in a startup but not a small project in a well-funded organization, for example.

Some kinds of exceptional conditions are not really a matter for the life cycle, but rather for the development methodology, procedures, and planning approach that the life cycle patterns organize. Risk management (! unknown reference XXX) and the way that planning accounts for uncertainty (! unknown reference XXX) are ways to anticipate specific exceptional conditions and, in many cases, avoid them.

The choice of development methodology affects how easily the project can adjust when it needs to change direction (Section 28.5).

Define the work so that everyone on the team can agree when a step has been completed. This is achieved by clearly documenting each step or phase in the life cycle.

Give a clear definition for each step of the quality considerations by which the work can be judged. Similar to the previous principle, this is met by the documentation for each phase or step.

Make the pattern as light-weight as possible without compromising quality. The reference life cycle in these chapters is only a skeleton of a complete life cycle definition. I believe that everything in it is necessary for most projects, though some projects will likely be able to trim out some parts. As long as a project’s life cycle does not add too much to this reference, the life cycle itself will likely be acceptably lightweight.

There are three ways that I have seen a project end up with a too-heavyweight process. One is to add too many new phases or steps to the life cycle, to the point that people in the team have trouble figuring out where the project is and what steps they should be doing. Another is to make the work inside one phase too complex: adding more reviews than the minimum necessary, for example. The third is when the procedures that say how to do parts of the steps get complex. In Section 4.6, I discussed how complex one procedure (in this case, for qualifying component vendors) caused problems for a large launch vehicle project.

31.2 Relation to NASA life cycle

As many people are familiar with the NASA life cycle, or may be obliged to use it (or a variant of it), in this section I discuss how the canonical NASA life cycle compares to this reference life cycle. I will use the general NASA life cycle defined in NPR 7120.5 [NPR7120, Figure 2-5]. I presented an overview of this life cycle in Section 24.2.1.

The NASA life cycle is divided into seven major phases:

Pre-Phase A: concept studies that define a potential mission.
Phase A: concept development that results in the definition of a feasible and useful mission.
Phase B: high-level design of the system for a mission and evaluation of the technology available for the mission.
Phase C: the bulk of development, in which the system components are developed, verified, and manufactured.
Phase D: assembly, integration, and test of the flight and ground systems; launch and initial on-orbit checkout.
Phase E: operation in flight.
Phase F: close out, including flight system disposal.

This life cycle was developed over several decades as NASA learned how to develop and operate complex missions. Elements of this approach have been adopted by many other organizations—terms like “System Requirements Review” and “Preliminary Design Review” have become nearly ubiquitous in the aerospace industry.

The overall flow of the NASA life cycle is organized around two constraints: fitting in with the US Federal funding cycle, and managing risk for a few highly expensive steps. The funding constraints come at the transition from Pre-phase A to Phase A, when the mission is approved and funded enough to develop its concept, and between Phases B and C when the agency commits to funding the full mission [NASA16, Section 3.5, p. 25]. The distinction between Phases C and D comes with Phase C covering development of designs and fabricating components, but actual assembly of a spacecraft does not start until Phase D, at which point there should be little residual risk that the system design will not work out.

Figure 31.2: NASA life cycle mapped to reference life cycle

The NASA approach was, however, developed for hardware-heavy systems and people who today develop spacecraft or aircraft that have a greater amount of software components sometimes find it difficult to map software project best practices onto the NASA approach. There are usually two issues: software development best practice puts integration earlier than the way many people interpret the NASA model; and many software developers combine design and implementation, especially for novel software functions. I show one way to reconcile these approaches in the mapping in this section.

The reference life cycle I have presented is organized around types of work—conception, specification, design, and so on. The NASA life cycle is organized at the highest level around milestones that check progress early, allowing corrections before committing agency resources. This means that the NASA life cycle splits several of the early phases in the reference life cycle in two, with a major review or checkpoint of the project’s progress before continuing. These two approaches are compatible: almost every project will have some kind of project-wide milestones alongside the milestones specific to the work phases.

In the following, I present how each of the NASA phases maps to the reference life cycle.

31.2.1 Outside the NASA life cycle

The reference life cycle defines the project preparation phase and project support “phase”. The preparation phase involves a rough definition of the project and establishing basic operations abilities. Project support covers support functions, like managing teams, finances, or artifacts.

In the NASA environment, the initial support is provided by one or more agency centers and external collaborators, using budget, tools, space, and people for general concept exploration. Each center has its own procedures for starting up a concept exploration project.

Similarly, the NASA agency provides essential support services to its projects.

In one project I worked on, the NASA Ames Research Center had a Mission Design Center that was charged with exploring potential mission concepts. A small group developed the mission idea and explored ways it could be realized. Ames and the agency provided all the key support infrastructure: staffing, finance, office and lab space, and IT services, for example.

31.2.2 Pre-phase A—Concept studies

The Pre-phase A work develops a concept for a mission, presumably in response to NASA agency priorities. It is expected to limit its work to the concept of a mission: what it might achieve, who would benefit from the mission, and high-level technical approaches that might support such a mission.

There is one major review in this phase: the Mission Concept Review (MCR). This checks that the potential mission is well formulated and that there is sufficient interest to justify funding “project formulation”—working out a detailed concept and high-level design.

At the end of Pre-phase A, after the MCR, the agency makes a decision whether to continue the project and fund it for “formulation”: the phases where the concept and high-level designs are worked out. This involves greater financial commitment than the early studies, and is the start of the “real” mission.

Pre-phase A maps to the purpose development phase (Section 28.3) and part of the concept development phase (Section 28.4). The purpose development phase covers identifying what the mission might do, and who the mission stakeholders might be. The concept development produces an initial sketch of a mission concept, without breaking the concept down into great detail.

31.2.3 Phase A—Concept and technical development

This phase is the first of two that are about developing a feasible high level design for a mission and ensuring that necessary technologies are available. Phase A includes developing a complete mission concept and high-level system designs. The team identifies any new technology that the mission will require and works out what will be needed for it to be ready to use in flight.

The depth of design and requirements is not clearly specified in the NASA procedural documents. However, my experience is that it is generally taken to include the spacecraft and its major subsystems, ground systems and their major subsystems, potential launch vehicles, and testing and other ground support equipment to a similar level. The exercise is intended in part to develop the general structure of the system and its likely cost, and in part to find those parts of the system that will require new technology.

Phase A includes developing a list of new technology that will be used for the mission, an evaluation of its maturity, and plans to develop that technology so that it will be mature enough for flight.

This is the first phase where a NASA project is funded for itself, as opposed to using resources allocated for general mission concept development. The various management and development plans required by NASA procedures get developed in this phase.

Phase A includes two key reviews:

System Requirements Review (SRR): checks that high-level requirements for the mission will support the intended mission, and that they are complete and consistent.
Mission or System Definition Review (MDR or SDR): checks the overall structure proposed for the system, and how requirements flow to major subsystems. Checks that initial measures of the design are reasonable. Also checks that management plans are in place.

The NASA Phase A maps to the second part of the concept development phase in the reference life cycle, along with concept and specification and preliminary design steps for the highest-level components in the system.

31.2.4 Phase B—Preliminary design and technology completion

Phase B continues the work from Phase A, completing a preliminary design and refining any new technology to the point where it is sufficiently mature to use in flight. This often involves building models and prototypes of parts of the system.

Phase B also involves developing the safety and security of the mission. The high-level design should incorporate designs for safety, security, and other critical mission success factors, and the design should be backed up by analysis showing why the design is sufficient. (See Chapter 44 for more on safety design.)

At the end of Phase B, the project should have a high-level design for the entire mission. That design should meet all the mission objectives, be technically feasible, and fit within cost and schedule available.

After Phase B, the agency allocates money to actually implement the system. The process can be complex and time-consuming, potentially involving legislative approval. The estimates for cost and schedule should be accurate enough that the project is unlikely to exceed them, which would require repeating the process to find more funding or time. This imposes limits on how much risk the project can carry going from Phase B to Phase C.

There is one key review in Phase B:

Preliminary Design Review (PDR): checks that the high-level system design is complete and meets mission objectives, that any new technology needed is ready, and that safety and security requirements are met by the design.

The end of Phase B maps to a slice across the development phase in the reference life cycle. It includes the concept, specification, and preliminary design of the first two or three levels of components in the breakdown hierarchy (Section 11.3; Chapter 39). In general this might include the major spacecraft subsystems—payload, structure, propulsion, attitude control, and so on. The portion of the design step includes prototyping or modeling of components that pose technical risk, and the design may go to deeper levels of the breakdown hierarchy if needed to understand and address that risk.

The mission-level PDR follows reviews of the component-level preliminary designs.

31.2.5 Phase C—Final design and fabrication

This phase is when most of the development and production work is done. It involves designing, building, and verifying all the components in the system, to the point where they are ready to be assembled into the working spacecraft and ground systems.

Phase C is designed around the spacecraft being difficult and expensive to assemble, involving building large structures, using complex manufacturing tools, threading complex wiring harnesses through the structure, and putting large amounts of money at risk during the assembly. This leads to organizing the final assembly work to avoid as much risk as possible by ensuring that all the components are ready to assemble before committing to the final assembly steps.

During this phase, the team completes all of the designs and implementations of the system components, and verifies all of them. This usually includes producing engineering and qualification units of hardware components (Section 28.8) for testing, including destructive testing for some parts. It also usually includes integrating all of the engineering or qualification units and the corresponding software into a testing version of the entire spacecraft in order to verify the entire integrated system.

Verification in Phase C typically includes verifying the human interfaces in the system. Can an operations team use the ground systems to accurately control the spacecraft, using simulated telemetry showing the spacecraft in different conditions (including off-nominal conditions).

Phase C is typically divided into two parts: the first part for completing all the designs, and the second part for implementing and producing the components. The Critical Design Review separates the two parts, where all the designs are checked.

I have seen the Critical Design Review milestone cause confusion: how far should work progress before the CDR? What is the boundary between “design” and “implementation”? For hardware components, such as an electronics board, engineers work on the board design: the layout of the components and traces that will be fabricated. The NASA CDR definition ([NPR7123, Table G-7, pp. 113-4]) indicates that the CDR should include “integrated schematics” and “fabrication, assembly, integration, and test plans”, which would indicate that the board design is complete. That the document also indicates that the CDR and Production Readiness Review are often coupled lends credence to the interpretation that the CDR reviews the board designs.

If this same interpretation were applied to software, it would imply that the software would be essentially complete by CDR. Software source code is the equivalent of electronics board design: while it is thought of as implementation, it must be processed through a build system to produce the actual executable software, just as a board’s design file is used to manufacture the boards.

However, the NASA Systems Engineering Handbook states that the CDR for a software component should occur “prior to the start of coding of deliverable software products” [NASA16, Section 3.6, p. 29]. In other words, the documents appear to disagree, though NPR 7123.1 is presumed to have precedence.

Further, software is often developed iteratively, implementing one version after another, each version adding some amount of functionality over time. Some lower-level functionality is left as a mockup, perhaps not even fully designed, until some of the higher-level integrated functionality has been implemented and verified (the idea of integration-first development (! unknown reference XXX), done to reduce risk as quickly as possible). Software development best practice also has verification proceeding continuously throughout implementation, with feedback to the implementer as early as possible. This often implies having some of the hardware components built and available for testing the software before the software is completed.

An official answer to how a team should resolve the discrepancies and interpret the CDR for a NASA project will have to come from the relevant NASA authorities.

However, in practice, I have found that focusing on the review before implementation is more useful for components in the upper and middle levels of the breakdown hierarchy. For example, this might include the major components with a subsystem, such as power distribution or generation within the electrical power subsystem, or attitude control algorithms in the guidance, navigation, and control subsystem. Components at these levels realize the important relationships between components in the system structure (Section 12.2) and the way components work together to produce emergent properties (Section 12.4). Analyzing these designs allows one to check whether key system behaviors will be met, and that properties like safety or reliability are handled correctly. These are the properties that are difficult to change if the implementation is found during verification not to meet them. The design and implementation of low-level components should be reviewed, but as long as there is an obvious, low-risk approach for them their review need not block the design reviews of the system as a whole. This interpretation, of performing the critical design review before implementation, means that the team is then free to implement software components incrementally if that is the best approach for that part of the system.

There are three reviews in Phase C:

Critical Design Review (CDR): a set of reviews of all the specifications and designs for the system, from the lowest-level component to the system as a whole. If a component’s design passes this review, it is ready to be produced. Each major component in the system—both flight and ground systems—has its own CDR, working from the bottom up until finally there is a CDR for the system as a whole.
Production Readiness Review (PRR): for components that will be produced in some quantity (typically three or more), checking that the production procedures, tooling, and people are ready to build them. This is sometimes conducted alongside the CDR for the component.
System Integration Review (SIR): checking that all of the components are on track to be assembled into the system, and that the procedures, tools, and people needed for assembly are ready. “Integration” in this review means “integration into the final assembly”, and not the more abstract integration that occurs during development and verification.

The NASA Phase C maps to completing the development phase, the acceptance phase (Section 28.9), and the system production phase (Section 29.1) in the reference life cycle.

The CDR milestone maps to a slice through the system and component development phases, at the end of the design step for most or all of the components. The PRR for a component is equivalent to a review at the end of the production unit development step (Section 28.8). Note that the reference life cycle has a manufacture and deployment check milestone in the acceptance phase; this applies when the entire system is manufactured together, rather than the model implied in the NASA life cycle where different hardware components go to production individually. Finally, the SIR is equivalent to the deployment readiness review that is at the beginning of the deployment phase (Section 29.3) in the reference life cycle.

31.2.6 Phase D—System assembly, integration, and test, launch and checkout

Phase D covers the work between the end of designing and building all the parts and having a spacecraft on orbit ready to begin its mission proper. This includes assembling the spacecraft and ground systems, and verifying that they work (and work together). The verification typically involves testing the assembled spacecraft in vacuum, under strong vibrations, and in thermal environments equivalent to what it is expected to handle in flight—but not testing beyond those levels, in ways that might damage the vehicle. After testing, the team proceeds onward to integrating the spacecraft with its launch vehicle, final preparations, launch, and starting operations on orbit. The team on the ground finally checks the spacecraft out before declaring it ready to begin its mission.

Some missions build a second copy of the spacecraft to be used on the ground for debugging issues with the one in flight and to test possible commands before sending them to the operational spacecraft. The duplicate is typically assembled in Phase D. It might use qualification units for hardware that were used for testing in Phase C, rather than flight-ready units.

There are several reviews in Phase D. All of them are final checks that some part of the mission is ready for taking an irrevocable step. These include:

Operational Readiness Review (ORR): that mission operations is ready for the flight to start.
Mission or Flight Readiness Review (MRR or FRR): that the flight parts of the mission are ready to go.
Launch Readiness Review (LRR): that the launch vehicle and supporting systems, including the site and launch range, are ready.

The NASA Phase D maps directly to the deployment phase in the reference life cycle. It takes in manufactured components and procedures, assembles them into a working system, tests that it has been assembled properly, and starts it in operation. The milestones in the NASA Phase D are different from the deployment phase milestones mainly because they are specific to launching a spacecraft.

31.2.7 Phase E—Operations and sustainment

In this phase, the team operates the mission through its end.

There are two kinds of reviews that occur in Phase E:

Critical Event Readiness Review (CERR): reviews before specific mission events where something could go wrong. This might be adjusting an orbit, or uploading software updates.
Decommissioning Review (DR): reviewing the decision to end the mission.

Phase E is equivalent to the system operation (Section 29.5) and evolution (Section 29.7) phases in the reference life cycle. The Decommissioning Review is equivalent to the decision to retire the system at the beginning of the system retirement phase (Section 29.9).

31.2.8 Phase F—Close out

The final phase in the NASA life cycle involves retiring and disposing of the flight systems, retiring or releasing ground systems, archiving mission data, and closing out the project.

There is one review identified in the NASA life cycle:

Disposal Readiness Review (DRR): determining that the mission is ready to command the disposal of the spacecraft.

This phase corresponds to the system retirement (Section 29.9) and project ending (Chapter 30) phases in the reference life cycle.

Sidebar: Summary

The reference life cycle is a foundation for developing the patterns for one’s own project.
- Tailoring includes adding details and defining procedures.
There are several questions to consider when tailoring the reference life cycle.
A detailed mapping from the NASA life cycle to the reference shows how to fit with the NASA approach.

Bibliography

[14CFR450]	Part 450—Launch and reentry license requirements. In Title 14, Code of Federal Regulations. United States Government, August 2024. https://www.ecfr.gov/current/title-14/chapter-III/subchapter-C/part-450, accessed 2 September 2024.
[Albon24]	Courtney Albon. Space Force may launch GPS demonstration satellites to test new tech. C4ISRNET, February 2024. https://www.c4isrnet.com/battlefield-tech/space/2024/02/09/space-force-may-launch-gps-demonstration-satellites-to-test-new-tech/, accessed 11 September 2024.
[Alexander77]	Christopher Alexander, Sara Ishikawa, and Murray Silverstein. A Pattern Language. Oxford University Press, New York, 1977.
[Ambler23]	Scott Ambler. What happened to the Rational Unified Process (RUP)? https://scottambler.com/what-happened-to-rup/, accessed 29 February 2024.
[Bezos16]	Jeffrey P. Bezos. 2015 Letter to Shareholders. Amazon.com, Inc., 2016. https://s2.q4cdn.com/299287126/files/doc_financials/annual/2015-Letter-to-Shareholders.PDF, accessed 22 February 2024.
[Bogan17]	Matthew R. Bogan, Thomas W. Kellermann, and Anthony S. Percy. Failure is not an option: a root cause analysis of failed acquisition programs. Naval Postgraduate School, Technical report NPS-AM-18-011, December 2017. https://nps.edu/documents/105938399/110483737/NPS-AM-18-011.pdf.
[CISA21]	Defending against software supply chain attacks. Cybersecurity and Infrastructure Security Agency, U.S. National Institute of Standards and Technology, April 2021. https://www.cisa.gov/sites/default/files/publications/defending_against_software_supply_chain_attacks_508.pdf.
[CMMI]	ISACA. What is CMMI? https://cmmiinstitute.com/cmmi/intro, accessed 24 March 2024.
[CVE24]	Information Technology Laboratory, National Institute of Standards and Technology. CVE-2024-3094 detail. In National Vulnerability Database. https://nvd.nist.gov/vuln/detail/CVE-2024-3094, accessed 4 August 2024.
[Castano06]	Andres Castano, Alex Fukunaga, Jeffrey Biesiadeick, Lynn Neakrase, Patrick Whelley, Ronald Greeley, Mark Lemmon, Rebecca Castano, and Steve Chien. Autonomous detection of dust devils and clouds on Mars. Proceedings of the International Conference on Image Processing, October 2006.
[Control19]	Yokogawa announcement warns of counterfeit transmitters. Control, 29 May 2019. https://www.controlglobal.com/measure/pressure/news/11301415/yokogawa-announcement-warns-of-counterfeit-transmitters.
[DFARS]	Defense Federal Acquisition Regulation Supplement. General Services Administration, United States Government, January 2024. https://www.acquisition.gov/dfars, accessed 16 February 2024.
[Drucker93]	Peter F. Drucker. Management: Tasks, Responsibilities, Practices. Harper Business, New York, NY, 1993.
[EPF]	Eclipse Process Framework Project (archived). Eclipse Foundation, 2018? https://projects.eclipse.org/projects/technology.epf, accessed 29 February 2024.
[FAR]	Federal Acquisition Regulation. General Services Administration, United States Government, January 2024. https://www.acquisition.gov/browse/index/far, accessed 16 February 2024.
[Foust24]	Jeff Foust. Slow Burn: How Starliner’s crewed test flight went awry. Space News, 4 September 2024. https://spacenews.com/slow-burn-how-starliners-crewed-test-flight-went-awry/, accessed 9 September 2024.
[Git]	Git contributors. Git documentation. https://git-scm.com/doc, accessed 31 July 2024.
[Goodin24]	Dan Goodin. What we know about the xz Utils backdoor that almost infected the world. Ars Technica, 31 March 2024. https://arstechnica.com/security/2024/04/what-we-know-about-the-xz-utils-backdoor-that-almost-infected-the-world/, accessed 4 August 2024.
[Heilmeier24]	George H. Heilmeier. The Heilmeier Catechism. In DARPA. https://www.darpa.mil/work-with-us/heilmeier-catechism, accessed 13 July 2024.
[IBM23]	Engineering Lifecycle Optimization—Method Composer. IBM, version 7.6.2, 2023. https://www.ibm.com/docs/en/engineering-lifecycle-management-suite/lifecycle-optimization-method-composer/7.6.2, accessed 29 February 2024.
[ISO26262]	Road vehicles — Functional safety. International Organization for Standardization, Geneva, Switzerland, Standard ISO 26262:2018, Second ed., 2018.
[LADEE13]	LADEE—Lunar atmosphere and dust environment explorer. NASA Ames Research Center, Fact sheet FA-ARC-2013-01-29, 2013. https://smd-cms.nasa.gov/wp-content/uploads/2023/05/ladee-fact-sheet-20130129.pdf, accessed 16 September 2024.
[Leveson11]	Nancy G. Leveson. Engineering a safer world: systems thinking applied to safety. Engineering Systems. MIT Press, Cambridge, Massachusetts, 2011.
[LoBosco08]	David M. LoBosco, Glen E. Cameron, Richard A. Golding, and Theodore M. Wong. The Pleiades fractionated space system architecture and the future of national security space. AIAA Space 2008 Conference, September 2008. https://chrysaetos.org/papers/Pleiades%20fractionated%20space%20system.pdf.
[McConnell09]	Steve McConnell. Software Estimation: Demystifying the Black Art. Microsoft Press, Redmond, Washington, 2009.
[NASA16]	NASA Systems Engineering Handbook. National Aeronautics and Astronautics Administration (NASA), Report NASA SP-2016-6105 Rev2, 2016.
[NPR7120]	NASA Space Flight Program and Project Management Requirements. National Aeronautics and Astronautics Administration (NASA), NASA Procedural Requirement NPR 7120.5F, 2021.
[NPR7123]	NASA Systems Engineering Processes and Requirements. National Aeronautics and Astronautics Administration (NASA), NASA Procedural Requirement NPR 7123.1D, 2023.
[Navarro-Gonzalez10]	Rafael Navarro-Gonzalez, Edgar Vargas, José de la Rosa, and A. C. Raga, Christopher P. McKay. Reanalysis of the Viking results suggests perchlorate and organics at midlatitudes on Mars. Journal of Geophysical Research, vol. 115, December 2010.
[Packard95]	David Packard. The HP Way: how Bill Hewlett and I built our company. HarperBusiness, New York, 1995.
[Purdy24]	Kevin Purdy. Music industry’s 1990s hard drives, like all HDDs, are dying. Ars Technica, 12 September 2024. https://arstechnica.com/gadgets/2024/09/music-industrys-1990s-hard-drives-like-all-hdds-are-dying/, accessed 13 September 2024.
[Spiral]	Wikipedia contributors. Spiral model. In Wikipedia, the Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=Spiral_model, accessed 14 February 2024.
[Wertz11]	Space Mission Engineering: The New SMAD. James R. Wertz, David F. Everett, and Jeffery J. Puschell, editors. Microcosm Press, Torrance, CA, 2011.
[Wilkes90]	John Wilkes. CSP project startup documents. Concurrent Computing Department, Hewlett-Packard Laboratories, Report HPL-CSP-90-42, 11 October 1990. https://john.e-wilkes.com/papers/HPL-CSP-90-42.pdf.
[Zetter23]	Kim Zetter. The untold story of the boldest supply-chain hack ever. Wired, 2 May 2023. https://www.wired.com/story/the-untold-story-of-solarwinds-the-boldest-supply-chain-hack-ever/.

Making systems

Table of contents

Part V: Development methodology and life cycles

Chapter 21: Introduction

21.1 Purpose

21.2 Tasks

21.3 Defining tasks

21.3.1 Component breakdown structure

21.3.2 Artifact patterns

21.3.3 Mapping artifact patterns onto breakdown structure

21.3.4 Development methodology and life cycle patterns

21.4 How work grows and changes

21.5 Uncertainty and choosing tasks

21.5.1 Uncertainty versus risk

Chapter 22: Development methodologies

22.1 Purpose

22.2 Characteristics

22.3 Commonly-discussed methodologies

22.4 Practical considerations

Chapter 23: Life cycle patterns

23.1 Introduction

23.2 Life cycle and development methodology

23.3 Key ideas

23.4 Purpose of life cycle patterns

23.5 A model for patterns

23.6 Documenting life cycle patterns

23.7 Work steps and artifacts

23.8 Life cycle and teams

23.9 Life cycle and planning

23.10 Principles for a life cycle pattern

Chapter 24: Example life cycle patterns

24.1 Introduction

24.2 Whole project life cycle

24.2.1 NASA project life cycle

24.2.2 Unified Process

24.3 System development patterns

24.3.1 Systems V model

24.3.2 Systems or software development life cycle (SDLC)

24.4 Post-development patterns

24.4.1 EVT/DVT/PVT

24.5 Detail patterns

24.5.1 Defect or error management

24.5.2 Change requests

24.6 Comparisons and lessons learned

Part VI: Reference life cycle

Chapter 25: Introduction

25.1 Projects with proposals

25.2 Project-wide decisions and reviews

Chapter 26: Project preparation

Chapter 27: Project support

Chapter 28: Development

28.1 What is development?

28.2 The development phase

28.3 Purpose development

28.4 Concept development

28.5 Development methodology

28.6 System feature development

28.7 Recursion to component development

28.8 Feature development variations

28.9 Acceptance

Chapter 29: Operation

29.1 System production

29.2 System production examples

29.3 System deployment

29.4 System deployment examples

29.5 System operation

29.6 System operation examples

29.7 System evolution

29.8 System evolution examples

29.9 System retirement

29.10 System retirement examples

Chapter 30: Project ending

30.1 Project cancellation

Chapter 31: Using the reference life cycle

31.1 Meeting life cycle principles

31.2 Relation to NASA life cycle

31.2.1 Outside the NASA life cycle

31.2.2 Pre-phase A—Concept studies

31.2.3 Phase A—Concept and technical development

31.2.4 Phase B—Preliminary design and technology completion