Copyright ©2024 by Richard Golding
Release: 0.4-review
What life cycles and development methodologies are. It covers what goes into a life cycle pattern and how the patterns relate to other parts of how a project operates. It defines development methodologies and how these relate to life cycles. The last chapter lists several example life cycle patterns, which lead into a comprehensive reference life cycle in Part VI.
System building in general follows a common story.
A project to develop a new system begins when someone has an idea that people should make the system. At this initial moment, the system is largely undefined. There is a vague concept in a few minds, but all the details are uncertain.
The project then moves the system from this initial concept through to an operational system, and through the system’s operational life and eventual retirement. During development, the team will need to ensure steps are taken in order to produce a correct, safe system. Designs will be checked. Implementations will be tested. The system as a whole will be verified before being deployed into service. At the same time, the resources spent on building the system must be used efficiently, doing the work that needs to be done and avoiding the work that doesn’t need to be done.
Many projects continue system development beyond the first operational version, with ongoing development or problem fixes. Some projects include the steps to shut down and dispose of the system once it has completed its functions.
The life cycle is how a project organizes the way the team moves through this story. It is a pattern that defines the phases and steps in the work: what will come first, what will done before something else, when checks will happen. It provides checklists to know when some step is ready to be done, and when it should wait for prerequisites. It provides checkpoints and milestones for reviewing the work, so that problems are found and dealt with in a timely way. It provides an overall checklist to ensure that all the work that needs to be done is in fact done.
Section 20.3 introduced the basic ideas for life cycle patterns. These include:
Each project will use its own life cycle patterns. The patterns may incorporate a framework that is standard for the industry or the parent organization. Selecting and documenting the patterns is an essential part of starting up a project, and people in the project should review how well the patterns are working for them from time to time and may want to improve the patterns.
In this part, I discuss life cycles in general. In Part VI, I present a reference life cycle pattern.
Life cycle patterns are related to, but separate from, the development methodology that a team chooses to use, such as waterfall, spiral, or agile methodologies. I address these methodologies in the next chapter (Chapter 22).
Speaking broadly, the development methodology determines how the work is organized in time: in a single sequence or iteratively, synchronized tasks or separate tasks, how far ahead to plan. The life cycle patterns reflect some of those methodology decisions and encode how to do different tasks.
Put another way, the life cycle patterns help organize what work the project has to do, and what dependencies there are among different steps in the work. The development methodology organizes how that work is planned and scheduled. As a result, the two go hand-in-hand but are distinct from each other.
Almost all project life cycle patterns, for both whole systems and for components, follow a similar overall flow. Abstracting from the story in the introduction, there are phases:
For a whole system, this looks like:
Note that this flow starts with the system or component’s purpose. Good engineering always begins with having a clear understanding of what a thing is for. I have watched many engineers rush into designing and building a component without putting time into understanding what the component is going to be used for. By random chance their design has occasionally worked out to match what the component actually needed to do, but only rarely.
Understanding a system’s purpose or a component’s purpose also provides a way to bound the work. If one doesn’t know what a component is for, it is easy to keep working on a design without stopping because there isn’t a clear way to know when the design is good enough to be called done.
There are many points in this flow where one might add checks. At these times one can check on the correctness of the work. These checks improve system quality by building in the opportunity to discover and correct flaws before other work builds on the flawed work. Finding minor problems quickly usually means the cost of correction remains low.
There are also points where a project might have project-wide decisions--go/no go decisions or key decision points. These provide opportunities to check the entire project progress, sometimes occurring in the middle of other work, or at times when irrevocable actions are to be taken, such as funding, launch, or public announcements.
This general pattern applies recursively. One can start by creating a specification and design for the system. The system design will decompose the system into high-level components (Section 6.4). The act of defining a set of components implies identifying a purpose for each one, then specifying and designing each high-level component. The design of a high-level component might in turn decompose into a set of lower-level components, which in turn need a purpose, then specification and design.
The overall flow shows a move from high uncertainty at the beginning to lower uncertainty as the work proceeds. I will address managing using uncertainty in ! Unknown link ref.
Finally, a project’s life cycle patterns will reflect the development methodology that the team has selected. Waterfall, spiral, and agile development all affect the contents of the patterns. I discuss this more in Chapter 22.
The life cycle is provides a general set of patterns for how work should proceed, but it should not define exactly how each work step should be done. That is left to procedures (Section 20.4), which should provide step-by-step instructions for how to do key parts of the life cycle. For example, if a life cycle phase indicates that a design review and approval should occur before the end of a design phase, then there should be a corresponding procedures for design reviews. That procedure should indicate who should be involved in a review, what they should look for, how those people will communicate about the results, who is responsible for approving the design, and how they indicate approval.
The life cycle patterns are the basis for the project’s plan (Section 20.5). The patterns are a set of building blocks that people in the project can use to develop the plan. The plan, in turn, guides tasking: the selection of which tasks (as defined in the plan) people should be working on next.
Life cycle patterns address problems that projects have. They can help the team have a predictable and reproducible flow to how work should be done, so that everyone shares the same understanding of how the team works.
There are six ways that life cycle patterns help a project.
Gaining these benefits is not a result of using life cycle patterns per se; rather, it comes from using patterns that are designed to provide the benefits. For example, if the customer has an acquisition process that specifies certain milestones, then the top-level life cycle pattern for the project should incorporate those milestones. If the project is likely to have auditing requirements, then the patterns should include tasks to generate and maintain auditing records.
Quality of work. The purpose of a project’s approach to operations is, in the end, to produce a system for the customer that meets their objectives. This means it should do what they need, meet safety and security needs, and support future system evolution. In other words, the team’s work needs to produce a system with good quality.
Neither the life cycle patterns by themselves nor the plan that derives from them directly result in good product quality. System quality comes from all of the detailed work steps that everyone on the team performs. If they do their work well, and if mistakes they make are caught and corrected, then the system can turn out well. If some work is not done well, nothing in the life cycle patterns can prevent that.
However, the life cycle patterns can create an environment that will more likely lead to good quality. They can proactively make flaws less likely by ensuring that steps happen in order: identifying purpose and concept before design and implementation, for example. They can insert points in the work that encourage people to think through what they should design or implement. They can also avoid problems by providing a checklist for what should be complete at the end of a work step. They can ensure that when a system is delivered, all the work needed to put it into operation is complete. They can build in checkpoints for reviews and verification to catch problems early. They also help project management organize the work so that it is complete, that is, so that no parts of the system or some work steps are overlooked.
Sometimes the value of a life cycle pattern will come from slowing down work. Most of the work done on a project is done by people who are focused on a particular part of the system; it is not their job to manage how the project goes as a whole. Their job is to get that one part designed and built, according to the specifications they have been given. If the specialists start building before the context for their work has been established, they are likely to design or implement something that does not meet system needs. I have been part of more than one project where the resulting rework caused the project to be canceled or required a company to get additional funding rounds to make up for the resources spent on the mistakes.
Efficiency. Most systems projects will be resource-bound, with more tasks than there are people on the team to do them. In this kind of project, it is important to keep each person busy with useful work. This means that nobody on the team is blocked with no tasks they can usefully perform. It also means that almost all the tasks that people perform contribute to the final system—that there is little work that has to be thrown out and redone because it had flaws that made it unusable.[1]
As project management builds the project’s plan, using the life cycle patterns as building blocks, they must detect where there are dependencies between work steps and plan the work steps so that later steps are unlikely to get blocked. For example, if some part will require an unusually long time to specify and acquire from an outside vendor, then the management will need to ensure that work on that part starts early. The life cycle patterns provide part of the structure on which the plan is based, and provides a template for some of the dependencies.
Life cycle patterns can also help avoid unnecessary rework. This comes partly from the ways that the patterns help improve the quality of work. In particular, a good life cycle pattern can lead people to take the time to think through the purpose and specification of something before they jump into design and implementation unprepared, and then build something that does not meet the system’s needs.
Finally, the patterns can help bound the work to be done. When a project does not define the scope of work to be done, it is likely that someone will start working on something in excess of or not related to the customer needs. Good patterns help avoid this by defining an orderly and thoughtful process for identifying what work needs to be done.
Team effectiveness. Members of an effective team respect and trust each other. Having shared norms and understandings for how work is done and how people communicate is important as part of the environment that allows the team to develop respect and trust.
A defined life cycle for a project addresses part of this by defining a common understanding of how work should be done. Good patterns define expectations of what will be done in different work steps. Everyone on the team can agree when a work step has been completed. Good patterns also create times when people know they are expected to communicate about some work step. This makes it easier for someone to trust that they will be consulted at appropriate points about work that might affect what they are doing, so that they do not need to create separate, ad hoc communication channels or try to micromanage something that is not their direct responsibility.
As I have noted elsewhere (Section 20.8.3), the life cycle patterns can only have this benefit if the team actually follows them.
Management support. The team, or designated parts of it, will be responsible for making a plan (Section 20.5) for the project’s work, then coordinating and tracking the resulting tasks. The life cycle patterns provide templates for the tasks that will go into the plan, and the key milestones that anchor the work. The life cycle sets the pattern for phases that the project will go through, such as initial conception, initial customer acceptance, concept exploration, implementation, and verification. The cycle also sets the pattern for milestones that gate the progression from one phase to another, such as a concept review, a design review (and approval), or an operational readiness review.
The plan will change from time to time, both in response to external change requests and as the project progresses and the team learns more about the work ahead. Sometimes the need for change occurs gradually, with an issue slowly manifesting itself but causing no acute problem that causes people to recognize there is a need for change. A good life cycle will build in times for people to step back to get perspective and detect when there is a slow-building problem to address. Review milestones are often a good time to plan for this.
Having life cycle patterns and corresponding procedures that apply when these changes occur will help the team adjust their work in an orderly way. It will help them ensure that steps don’t get missed as they work out how to change the plan (and the system being built).
Good life cycle patterns can help a project steadily decrease its uncertainty and risk as work proceeds. Most of the time, a project will start with high uncertainty about what the system will look like, and early project phases result in increasing understanding of what the system will need to be. This process will repeat at smaller scales: once the general breakdown of the system into major components is decided on, each of those components will start with high uncertainty about how it will be structured. The uncertainty about the major components will then gradually resolve, and so on. However, this occurs when the project is guided in a way that uncertainty is addressed systematically, not haphazardly.
Customer and regulatory support. Many customers will have a process they go through to decide whether to build a system and to track its development process. For US governmental customers, much of the process is encoded in law or regulation, such as the Federal Acquisition Regulation (FAR) [FAR] or Defense Federal Acquisition Regulation Supplement (DFARS) [DFARS]. The process governs matters like which design proposal is selected for contract, providing evidence of good progress, providing information that determines periodic contract payments, accepting the finished system, and determining whether the project should continue or be terminated.
These customers will expect deliverables from the project from time to time. The life cycle process must ensure that there are milestones when these are assembled and delivered. (It is then the job of project management to ensure that these milestones, and the tasks for preparing deliverables, can be completed by the time line that the customer requires.)
Whether the customer requires explicit intermediate deliverables or not, formally involving the customer may be important for keeping the project on track.
Similarly, regulatory bodies have processes by which a system that must be certified or licensed before operation can apply for that approval. Those processes will define activities that the team must perform, along with milestones and deadlines by which applications must be submitted or approvals received.
Auditing support. A project’s development practices may be audited for many reasons. Auditors may perform a review as part of an appraisal or certification against standards, such as CMMI [CMMI]. They may review processes to ensure compliance with regulatory standards, especially for security-sensitive projects. The processes may also be audited as part of a legal review. These reviewers need to see both the entire definition of processes, including the life cycle patterns, as well as evidence of how well the team has followed these practices.
Each project will have several life cycle patterns, each covering a different part of the work.
Each pattern is defined by its purpose, the circumstances in which it applies, the phases or steps involved, and the dependencies among the steps. It should also include rationale that explains why the pattern is structured the way it is. In a previous chapter I used the example of a simple pattern for building one component:
This pattern applies to building one low-level component where the purpose of the component is already known, and the component is straightforward to design and build in house. Similar but slightly different patterns might apply when the component has to be prototyped before deciding on a design, or when the component is being acquired from a supplier outside the project. This pattern would be used as one part of a larger pattern for building a higher-level component that includes this one.
Each phase of a pattern defines a way to move part of the work forward. It should have a defined purpose that defines what work should be achieved in that phase.
The details of the phase are defined by:
Each action should also indicate who is responsible for performing that work. The responsibility will usually be defined as a role, not a specific individual. For example, a component design phase might involve three actions: design the component, review the design, and approve the design. The design action would be the responsibility of the component developer; the review action would be the responsibility of the developers responsible for components that interact with the one being designed, and the approval would be the responsibility of a systems engineer overseeing some higher-level component of which this one is part.
The rationale for this example design phase might say:
The actions defined for the phase should reference the procedures for doing those actions, when those procedures are defined. For the example design review action, the procedure might be:
The procedure might also name the tools to be used (an artifact repository for the design, a review workflow tool for the reviews).
A team needs clear documentation of the phases if they are to execute them properly. A team can’t be expected to guess at what they need to be doing, or how their work will be reviewed; it needs to be spelled out.
This documentation is assembled during the project preparation phase. The details are usually not completely worked out before any other work is begun; rather, “project preparation” more often proceeds in small increments, working out the rules shortly before the associated work begins.
Each life cycle pattern should have a purpose, and the steps or phases in the pattern should be checked that they can achieve that purpose (and that there is no extraneous work in the pattern).
A pattern should also have an explanation of when it applies and when it does not. For example, there may be multiple patterns for designing a component: one for a simple component that is built in house; one for a component that is outsourced to a supplier; one for a high-level component that is made up of several lower-level components; one for a component that requires investigation or prototyping before deciding on a conceptual approach to its design. All these patterns likely have a lot in common, but procuring an outsourced component will have contracting steps that an in-house component will not.
Someone using the documentation should be able to tell accurately whether they are using the correct version of the patterns. The life cycle patterns will be revised from time to time—as the team grows and as people find ways to improve how they work together. This means that the material that a user sees should indicate not just a revision number but have a clear indication of whether the version they are looking at is not longer current.
The form of the documentation is not as important as the content. It can be a written document. It can be made available electronically, with structured access and search capabilities (such as in a Wiki). Some companies offer tools that help define and document development processes or life cycle patterns, including definitions of phases. What matters is that each person who needs to use the documentation can do so conveniently and accurately.
Each phase or step has a number of artifacts that the team must develop. At the end of a phase, some of those artifacts need to be complete (allowing for future evolution), and others need to have reached some defined level of maturity. The work in a phase consists of the tasks that develop those artifacts.
I discussed artifacts in Chapter 17. The artifacts are the products of building the system, including the system being delivered as well as documentation of its design and rationale, records of actions taken during development, and information about how the project operates.
These artifacts are the inputs and outputs of the work specified in life cycle patterns (and the associated procedures). Using the component design step example, the work uses:
The design step produces:
In general, every artifact involved in building the system should be a product of some work phase or step, and every input or output of work steps should be included in the set of artifacts the team will develop. Ideally, the life cycle patterns will be checked for consistency with the list of artifacts the project uses.
Artifacts are developed at different times during the course of a project. A few artifacts should be worked out as the project is started—especially those recording the initial understanding of the system’s purpose and initial documentation of how the project will operate. These will be refined over time. Other artifacts are developed during the course of development, and the life cycle patterns indicate which ones are to be worked out before others. The artifacts will be in flux during development: the team learns about the system as it designs and develops it; the customer or mission needs often change over time; flaws get discovered in designs or implementations.
Many of the project’s artifacts support how people work together, and the life cycle patterns should reflect these communication needs. For example, one person may work out the protocol that two components need to use to communicate with each other. Two other people may design and implement the two components. The interface specification that the first person develops serves to communicate the details of the interface among all three people. The patterns should record that the design and implementation work steps depend on the work to develop the interface specification. Later, if one of the component developers identifies a flaw in the interface, the people involved can work through how to revise the interface—and the revised specification artifact informs each person how to update their work to match the change. The pattern helps to show how information about a change to the interface specification triggers rework on dependent artifacts.
A good life cycle pattern has procedures to manage the change in artifacts, and how those changes affect other artifacts downstream from them. There are two separate problems these procedures must address:
Different life cycle patterns approach this in different ways, which we will discuss in later chapters on different patterns. The most common approach is to maintain different versions of an artifact, with at most one version being designated as a baseline or approved version, and other versions designated as works in progress. Many configuration management tools have a way to designate a baseline version, and many software repository tools provide branching and approval mechanisms to track a stable version.
What is the team size and background? How is it expected to change over time? A small team can often be a little less formal than a large team, because the small team (meaning no more than 5-10 people) can keep everyone informed through less formal communication. A large team is not able to rely on informal communication, so more explicit processes and communication mechanisms are important. Many teams start small when the project is first conceived, but grow large over time. A team that will grow will need to communicate more formally from the beginning than they otherwise might so that as they add people to the team, the larger team works smoothly.
Conversely, if the life cycle patterns indicate that some action will be performed by some person, does the team actually have the staff to do that work? When a project says that some work is to be done and then does not staff that function sufficiently, it sends a message to the team that they should not take the process as written seriously. This undermines the team’s trust. If the function is actually needed, either the team will find an ad hoc workaround or the function will not get done adequately. Either way, there will be a disconnect between what is written down and what actually happens.
The life cycle patterns are just patterns that provide a guide to work that goes in the project’s plan. The plan is the actual definition of the tasks to be done. When the plan needs to be updated, the patterns provide a template for the work that goes into the plan.
Assembling the plan, however, takes into account many inputs, of which the pattern is only one. Planning involves deciding on the priority and deadlines for work, which is based on project deadlines, risk or uncertainty, and the project’s development methodology.
Chapter 60 discusses in detail how the plan is developed and maintained, including how the life cycle patterns get incorporated.
Consider the following example of how a pattern gets incorporated into the plan. This example shows how the pattern is only a template, and there are many decisions that will depend on other information.
This pattern defines what should happen when a customer requests a change. The basic pattern is that first someone on the team should evaluate the request; this may involve working with the customer to clarify the request, and with other engineers to estimate the scope and cost of the work. The project can then make a decision whether to accept the change or not. If the decision is to make the change, work to build, release, and deploy the update will follow. If not, there is another pattern for how to communicate with the customer that the change will not be made.
The activity starts when the project receives a change request. Based on this, the plan can be updated to include three tasks right away: the evaluation, review, and decision tasks.
At the same time, the planner must make decisions: who should each task be assigned to? What priority should the flow of tasks have? The pattern can indicate the roles involved in the tasks, such as there being a small team responsible for evaluating change requests and a customer representative from the marketing team, but it doesn’t determine which specific people. That’s for the planning and tasking efforts to determine. Similarly, the pattern does not specify how the work should be prioritized relative to other work the same people are doing. The planner incorporates information about how urgent the customer’s request might be and the importance of the customer into the decision. The project might have decided, for example, that there should be a queue of outstanding change requests and they should be evaluated in their order in the queue.
Determining who should be involved in a review of the evaluation might depend on the results of the evaluation. The pattern might indicate that the evaluation should be reviewed by engineers responsible for each high-level component that will be affected by the change. This means that the decision about who specifically will be tasked with the review can’t be made until the evaluation has worked out the scope of the change.
The decision to proceed with making an update will depend in part on whether the team has the time and resources to make the update. The team will need to determine whether adding the work to the plan will cause a problem with meeting deadlines that have been established already, or if it will overload a team that is already busy. This determination will involve analysis of the current plan—something that the life cycle pattern can help with only to the extent that the patterns can help with generating estimates of the work that would be involved.
When the project takes the decision to go ahead with developing an update for the request, the pattern shows that work steps follow to develop a change and then release and deploy the update. When the decision gets made, this will trigger the planning activity to add development and release work into the plan. These are high-level work steps with little detail. The planner will find patterns for these steps and populate those patterns into the plan.
Decisions about the work involved in development will depend on the development methodology that the team has selected to follow. If the update will involved extensive changes and the team is following a spiral-style methodology [Spiral], the development plan might consist of two or three development rounds. Each round would design and implement part of the changes, with a milestone at the end of each round showing how the partial changes have been integrated into the system.
Decisions about the release and deployment work will also incorporate policy decisions about how the team works. Will each change request result in a separate update release? Or will updates be bundled together into releases that combine several updates, perhaps on a schedule defined in advance?
In this section I list some principles to consider when designing a workflow pattern.
The act of designing—or refining—a life cycle pattern is an opportunity to think deliberatively about how the team should get its work done. Life cycle patterns are the templates for the project’s plan, and so they should be designed to achieve the work that is needed to move the project forward well.
Designing the patterns ahead of time means having time to define good work patterns. The pattern does not have to be worked out under pressure, as a reaction to something unexpected happening in the project. It can be discussed among multiple team members to get different perspectives and to ensure everyone’s needs are met. Working in advance gives time to check that the steps in the pattern are consistent with each other. It means that there is time to think about what exceptional situations might happen and define what to do in those cases.
Note that if an organization already has an approach to life cycle patterns, whether documented or not, one should aim for continuity with that approach. Anyone already in the organization will know that approach to organizing work; making a major change would mean losing the advantage of established team habits. On the other hand, if the current approach is not working well, then a new project is an opportunity to improve.
The life cycle patterns encode principles and methodology that encourages good work. Principles to consider include:
Purpose. I have mentioned this principle several times already, and I believe it is a basic principle of effective system-building. The life cycle patterns encode this principle for specific parts of the team’s work.
As with anything else that is designed, a pattern itself starts with a purpose. That purpose might be “build a simple component” or “build the whole system” or “handle a customer’s change request”. A good pattern addresses its purpose thoroughly, without trying to achieve other purposes.
The pattern that results should then ensure that team members follow this approach when building parts of the system. If the pattern is for handling a customer’s change request, for example, the pattern should address understanding and documenting what the customer wants changed (and why), before starting to work out whether to agree to the change or to begin implementing the change.
Time to think. Key parts of a complex system are best served by taking some time to properly understand the purpose or need of that part, and to look at options for how it can be designed or built. A project running at too fast a pace skips this thinking and uses the first thing that someone thinks of—though there may be subtle ramifications of that decisions that are not appreciated until the decision causes a problem later. Asking someone to document the alternatives they considered and rewarding them to do so works to improve the quality of the system.
At the same time, people can take too long to make a decision or fixate on making it perfectly. The time spent on deliberation should be bounded to avoid this.
Decision-making authority. Bezos introduced the idea of reversible and irreversible decisions [Bezos16]. He wrote:
Some decisions are consequential and irreversible or nearly irreversible—one-way doors—and these decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before. We can call these Type 1 decisions. But most decisions aren’t like that—they are changeable, reversible—they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through. Type 2 decisions can and should be made quickly by high judgment individuals or small groups.
As organizations get larger, there seems to be a tendency to use the heavy-weight Type 1 decision-making process on most decisions, including many Type 2 decisions. The end result of this is slowness, unthoughtful risk aversion, failure to experiment sufficiently, and consequently diminished invention.
For engineering projects, many decisions fall in the middle ground between reversible and irreversible. Consider building an aircraft. As long as the designs are just drawings, the designs can be changed with low to moderate cost. Early in the design process changes can be quite low cost; as the design progresses and more and more interdependent components are designed, the cost of rework increases. Once the airframe has been machined and assembled, the cost of changing its basic design becomes high, possibly high enough in time or in money that it is in effect irreversible.
Good life cycle patterns will account for different costs of reversing decisions. They should both build in time for deliberation and consultation before making hard-to-reverse decisions and use lighter-weight decision-making for less risky decisions. Similarly, the patterns should ensure that the authority for hard-to-reverse decisions is assigned to someone with high-level responsibility in the project, while the authority for low-risk decisions should be placed as close to the work as possible.
Checking work. Checking that work has been done well is commonly understood to improve the quality of results. It is essential for parts of a system that require high assurance—safety- or security-critical parts.
The key to checking is that the checks not be subject to implicit biases that the developer might have. This can be handled either by the developer doing analyses that force a stepping back from decisions (perhaps by encoding them mathematically) and that can be checked for accuracy by someone else, or by having an independent person review the work.
Either way, the developer’s pride in their work can feel threatened. Setting out life cycle patterns in which every part of the work is checked enables the project to make checks a norm. Designating in advance that checks will happen, and who will do them, helps depersonalize the effort and in the long term contributes both to quality work and team morale.
Building for the longer term. It is easy to solve an immediate problem at hand quickly and move on, leaving a problem for the future. Taking time to think about the problem (the principle of taking time for deliberative thinking, above) will help but is not sufficient.
It is likely that someone will revisit the work sometime in the future. They may need to understand the work in order to fix a flaw or make an upgrade. They may be auditing the work as part of a critical safety review. They will need to know the rationale for decisions that were made, and they will need to understand subtle aspects of the work. If this information has been documented, these people in the future will be able to do their work accurately and relatively quickly. If they have to deduce this information by looking at artifacts built in the work, they will have to spend time reverse-engineering the work and their accuracy is generally low.
Building into the pattern checks for documentation of rationale and explanations will accelerate future work.
Project-wide decision points. Most projects have times when there is a decision whether to proceed or to cancel or to redirect the project. These include whether to start, times when funding is needed, public announcements, and irrevocable steps like launch. These decision points generally require work to prepare for them, which should be accounted for.
Exceptions. Things often go not to plan. What then? Who needs to know? What needs to be done to respond?
Sometimes this is as simple as setting an expectation for the team. If a component’s specification is inconsistent or cannot be met, who gets informed, and how does the problem get corrected?
Sometimes the situation is time-critical. If a major piece of equipment catches fire, what is the response? What if an insecure component has been incorporated and deployed? What if a large part of the system has been built, and someone finds a fundamental flaw? The responses to situations like these are complex, and there often isn’t time in the moment to work out the details.
Good life cycle patterns include pre-planned responses to these exceptional situations. This might consist of references to procedures that should be followed, or it might reference a pattern used to respond to the situation.
Completeness. Can everyone on the team agree when a part of the work has been completed? The person assigned a task should understand their assignment, so that they can do their work independently. Others will check the work, or mentor the person doing the work—and they should have the same understanding of the assignment.
The definition of actions, as well as the list of outputs and post-conditions for a pattern, should be clear to everyone.
Quality considerations. As with completeness, the people assigned to work on tasks need to have a clear definition of what makes the results of their work acceptable, or what makes one way better than another. Sometimes this is simple: when objectives or specifications, which would be inputs to a work step, are met. Other times considerations of quality arise not from specifications but from things like coding standards. In those cases the quality considerations should be spelled out explicitly so the people doing the work know to use them.
Light-weight patterns. Good patterns are lightweight enough to get their job done, and not more (Section 20.8.3). Working out the pattern in advance is an opportunity to work out what parts of the work are truly needed and which can be omitted or simplified. For example, a pattern should be adapted to the possible cost of making a wrong decision (see decision-making authority above). Patterns that involve easily-reversible decisions should include streamlined decision-making steps, pushing the decision authority to as low a level in the team as possible and involving as little work as possible. On the other hand, more difficult decisions should involve a pattern that calls for greater deliberation, more checking and consultation, and places decision-making authority higher in the team’s hierarchy.
Similarly, the patterns should be achievable by the team. If the team is small, it makes no sense to mandate complex work flows for which there isn’t the staff. Each decision about what to include in a pattern should be measured against what is possible for the team to perform.
A project should choose life cycle patterns that fit with its development methodology.
A development methodology is the overall style of how a project decides to organize the steps in developing the system. This includes decisions like whether to develop the system in increments of functionality, whether to design everything before building, whether to synchronize everyone’s efforts to a common cycle, and so on. These decisions are reflected in obvious ways in the life cycle patterns a project uses.
There are many methodologies named in the literature: waterfall, spiral, agile, and so on. Different sources interpret each of these differently, and they are rarely compared on a common basis. Some of these, like waterfall methodology, have evolved over time and do not have a single clear source or definition. Others, such as agile development, have a defining document (manifesto) to reference.
All of the methodologies I know of have come to be treated as dogma, and are more often caricatured than treated thoughtfully. This is unfortunate because each of the methodologies has something useful to offer, while all of them are harmful to project effectiveness if taken as dogma or used without thoughtful understanding.
These methodologies can be organized and compared based on a few characteristics.
Rather than try to argue for or against any specific methodology—which is difficult, since most methodologies are hard to pin down among many published variants—I focus on these characteristics, and argue for choosing a methodology that has the characteristics that a project needs.
Size of design-build cycle. Methodologies like waterfall use “big design up front”, where the entire system is specified and designed before implementation begins. Other methodologies break up development into many specify-design-implement cycles.
The argument for doing as much design up front as possible is that errors are easier and cheaper to catch and correct before implementation than after. The arguments against are that in some complex systems the design work is exploratory and requires implementing part of the system to learn enough to know how to design—or not design—critical system parts.
Many iterative methodologies claim to be better at supporting adaptation as system purposes change.
Coupled or decoupled design-build. Some iterative methodologies plan to complete adding a feature to the system in one iteration, by executing an entire specify-design-build-integrate cycle for that feature. Other methodologies break up that cycle into multiple steps, and allow those steps to spread across multiple iterations.
Advance planning. Some methodologies emphasize planning out work activities as far as possible into the future, while others focus on planning as little as possible in order to adapt as needs change.
The argument for planning as far as possible into the future is that it gives the team stability: they have a reasonable expectation of what they should be working on now and have a sense of how that work will flow into other tasks soon after.
The argument for planning to shorter horizons is that someone will come along and change priorities or system purpose, and so the work will need to be changed to adapt. Planning too far ahead is wasted effort, it is argued, and gives teams a false sense of stability.
Regular release or integration. When a methodology uses many design-implement cycles, at the end of each cycle it can require that new implementations be integrated into a partially-working system, or it can go farther and require that the partially-developed system be releasable. Most iterative methodologies recognize that very early partial systems may not be releasable because they are too incomplete.
Regular release is feasible for products that are largely software, where a new release can be put into operation for low effort. It is less feasible for products that involve a large, complex hardware manufacturing step between development and putting a system into operation.
The choice of whether to release regularly or not is often dictated by the relationship with the customer(s) and whether the system is still being implemented the first time, or is in maintenance. Once the system has been deployed, development is likely either for fixes or for new features; these are often released and deployed as soon as possible.
Synchronization across project. Some methodologies that break up development into multiple iterations align all the work being done at one time so that the iterations begin and end together. Other iterative methodologies allow some work iterations to proceed on different timelines from other work.
Synchronizing work iterations across the whole project can provide common points to check that work is proceeding as it should and to share information about progress. However, it can also break up tasks that run far longer than others and result in a perception that the synchronization is wasteful management overhead rather than something useful.
Shared short-term purpose across project. Iterative methodologies can focus the entire team on one set of features across all the work going on at one time, or they can allow different streams of work to have different focuses in the short term.
The argument for this practice is that the more people share a common goal, the more they will be motivated to work together to meet that goal and to defer work that does not address that common goal. The argument for having multiple work streams with different focuses is that too often a project will involve work from different specialties and on different timelines: mechanically assembling an airframe and building a flight control algorithm have little in common.
I present three of the most commonly discussed development methodologies in order to illustrate how they can be characterized. Each of these methodologies has many variants, and all are the subjects of debates comparing tiny details of each variant. The purpose of this section is to illustrate how they can be analyzed, not to capture all nuances of every methodology in use.
Waterfall development. This approach to development follows the major life cycle phases in sequential order. It begins with concept development, moves through specification to design, and only then begins implementation.
Waterfall development is well suited to building systems that have decision points that are difficult or expensive to reverse. The NASA project life cycle (Section 23.2.1) follows a waterfall-like sequence for its major phases because there are three decision points that do not allow for easy adjustment: getting government funding approval; building an expensive vehicle; and spacecraft launch.
This methodology can be inefficient when the system cannot be fully specified up front. When the system’s purpose changes mid-development, or when some early design decision proves to have been wrong, the methodology does not have support built in for how to respond. Projects using this kind of methodology are known to have difficulty sticking to schedules and costs that were developed early in the project, usually because some unexpected event happened that was not anticipated from the beginning.
In one spacecraft design project I worked on (Section 4.1), the team assembled a giant schedule for the whole project on a 20-foot-long whiteboard. This schedule detailed all the major tasks needed across the entire system. That schedule ended up requiring constant modification as the work progressed.
Waterfall development requires great care when building a system with significant technical unknowns. The serial nature of execution means that some important decisions must be made early on, when little information is available on which to base that decision. When those unknowns are understood, the project can put investigation or prototyping steps into the specification or design phases in order to gather information for making a good decision. On the other hand, if the team does not learn that some technical uncertainty exists until the project is into the implementation phase, the cost of correcting the problem can be higher than with other methodologies. In addition, the sequential nature of execution can create an incentive for a team to muddle through without really addressing the unknown, resulting in a system that does not work properly.
In the spacecraft design project I mentioned, there were technical problems with the ability for spacecraft to communicate with each other. These problems were not properly identified and investigated in the early phases of the project. As the team designed and implemented parts of the system, different people tried to find partial solutions in their own area of responsibility but the team over all continued to try to move ahead. In the end the problems were not solved and the spacecraft design was canceled.
Waterfall characteristics | |
---|---|
Cycle size | One design-build cycle for the whole project |
Coupled design-build | One cycle, so implicitly coupled |
Planning | Plan as far as possible, especially after design |
Release and integration | At end of project |
Synchronization | n/a |
Short-term purpose | n/a |
Iterative and spiral development. This development methodology is characterized by building the system in increments. Each increment adds some amount of capability to the system, applying a specify-design-build-integrate cycle. Typically the whole team works together on that new capability.
Early increments in such a project often build a skeleton of the system. The skeleton includes simple versions of many components, along with the infrastructure needed to integrate and test them. Later increments add capabilities across many components to implement a system-wide feature.
Teams using iterative development often plan out their work at two levels: a detailed plan for the current iteration, and a general plan for the focus of the iterations that will follow.
This methodology provides builds in more flexibility to handle change than does the waterfall methodology.
Iterative development can be used to prioritize integration (Section 8.3.2), in order to detect and resolve problems with a system’s high-level structure as early as possible. This involves integration-first development, where the team focuses on determining whether the high-level system structure is good ahead of putting effort into implementing the details of the components involved.
Iterative and spiral characteristics | |
---|---|
Cycle size | One set of features crossing the whole system |
Coupled design-build | Generally add to design and implementation for the feature(s) in the iteration |
Planning | At the beginning of each iteration; variants maintain a roadmap of iterations or spirals |
Release and integration | Either; every iteration ends with an integrated working system |
Synchronization | All work synchronized to the iteration |
Short-term purpose | Shared within the iteration |
Agile development. The agile methodologies—there are many variants—focus the team on time-limited increments, often called sprints. The approach is to maintain a list of potential features to build or tasks to perform (the backlog). At the beginning of a sprint, the team selects a set of features and tasks to do over the course of that sprint. By the end of the sprint, the features have been designed, implemented, verified, and integrated into the system. In other words, there is a life cycle pattern that applies to building each feature within a sprint.
Agile development aims to be as responsive to changes as possible. The start of each sprint is an opportunity to adjust the course of the project as problems are found or the team gets requests for changes. The agile methodologies arose from projects that were trying to keep the customer as involved as possible in development, so that the team’s work would stay grounded in customer needs and so that the customer could give feedback as their own understanding of their needs changed.
At their worst, the agile methodologies have been criticized for three things: an excess of meetings, drifting focus, and difficulty handling long-duration tasks. Note that these critiques come from people in teams who claim to be using agile methodologies, and reflect problems with the way teams implement agile approaches and not necessarily problems with the definition of the methodology itself.
Agile development emphasizes continuous communication within a team. In practice, this can lead to everyone on the team having multiple meetings each day: daily stand up meetings, sprint planning, sprint retrospectives, and so on. This likely comes from teams using meetings as the primary way to communicate, and from democratizing planning decisions that could be made the responsibility of fewer people.
Some agile projects have been characterized as behaving like a particle in Brownian motion: taking a random new direction in each iteration or sprint. This can happen when the team only looks at its backlog of needed tasks each iteration, or when new outside requests are given priority over continuing work. The focus on agility and constant re-evaluation of priorities can lead teams to this behavior, but it is not integral to the ideal of agile development. A team can develop a longer-term plan and use that plan as part of prioritizing work for each new sprint.
Finally, many complex systems projects involve long-running tasks that do not fit the relatively short timeline of sprints or iterations. Acquiring a component from an outside vendor or manufacturing a large, complex hardware component do not really fit the model of short increments.
Agile characteristics | |
---|---|
Cycle size | One short iteration with many independent features and tasks, bounded in duration |
Coupled design-build | Some agile practices focus on features, with a design-build cycle within one sprint to implement a feature. Other agile practices decouple designing, building, and verifying, allowing those to be spread over multiple iterations |
Planning | At the beginning of each iteration; variants have a longer-term general plan |
Release and integration | Either; every iteration ends with an integrated working system |
Synchronization | All work synchronized to the iteration or sprint |
Short-term purpose | Each task has its own purpose |
Most projects actually choose to use a hybrid among the different methodologies. They may start from one of the generally available methodology definitions, but they adapt that template based on the needs of their project and their own experience.
I have been part of several projects that had hard decision points, reflected in project milestones. At these points, the project was expected to provide information that would lead to the project continuing or being canceled: the decision to award the team a contract to build the system, or decisions to continue funding. These decision points impose a degree of waterfall-like structure on the work. For example, one project had to present a proposal to a government agency in order to get a contract to perform detail design and prototype implementation. That proposal involved developing the concept for the system, showing how it met the customer’s objectives, and showing that there was a likely feasible design.
However, once the contract had been awarded, the team used a spiral development methodology to build a sequence of increasingly capable versions of the system, and used those to demonstrate the completion of defined features at the end of each spiral. Within each spiral, the software team used an agile-like approach based on two-week sprints.
Every successful project I have been part of has done some degree of planning and high-level design well ahead of detail design and implementation. When the project used spiral or iterative development, the general flow from one spiral to another provided guidance to keep the work on track to reach the defined end point. When the project used agile methods to manage tasks in the short term, the longer-range plan kept the tasking decisions in each sprint from going off track by providing a basis for prioritizing the work.
No matter what methodology the projects have followed, they have all had some kind of regular cycle for checking in so that project leadership could find out when there were problems, and so that team members could maintain awareness of progress across the whole project. In some cases this looked a lot like agile practice, with short daily stand up meetings and regular, more in-depth discussions. In other cases the nature and schedule for checking in depended on the part of the system people were working on—from continuous interaction in some parts of software development to weekly updates from people doing safety analysis.
All the projects I have worked on have also had tasks varied widely in duration. Many software-related tasks were short, in the range of hours to a few days, while testing or hardware implementation often required weeks to complete. Most of the projects did not try to make all these tasks fit into a synchronized schedule across the project.
In practice, therefore, the projects I have seen that have been successful have applied common sense to the choices they make about how they chose the design methodology for their specific project.
In this chapter I survey some of the many different life cycle patterns in use.
The patterns have different scopes. Some cover the whole life of a system, from conception through retirement. Some are concerned only with developing a system. Others focus on more narrow parts of the work.
I group the patterns in this chapter into four sets, based on scope. The first group covers the whole life of a project, without much detail in the individual steps. The second dives into the development process. The third addresses post-development processes—for releasing and deploying a system; these patterns overlap with development processes. The fourth and final group is for patterns with a narrow focus on some specific detail of building a system.
Patterns with different scopes can potentially be combined. Most patterns that cover a system’s whole life, for example, define a “development phase” but do not detail what that is. One of the patterns for developing a system can be used for the details.
Each of the examples will include a comparison against the following baseline pattern for the whole life of a project.
The baseline phases are the same as in Section 21.3:
These patterns organize the overall flow of a project, from its inception through system retirement and project end. I have selected two examples: the NASA project life cycle, which is used in all NASA projects big and small, and the Rational Unified Process, which arose from a more theoretical understanding of how projects should work.
The NASA life cycle has been refined through usage over several decades. It is defined in a set of NASA Procedural Requirement (NPR) documents. The NASA Space Flight Program and Project Management Requirements document [NPR7120] defines the phases of a NASA project.
The NASA life cycle model is designed to support missions—prototypically, a space flight mission that starts from a concept, builds a spacecraft, and flies the mission.
NASA space flight missions involve several irreversible decisions, and this is reflected in how the phases and decisions are organized. Obtaining Congressional funding for a major mission can take months or years. During development, constructing the physical spacecraft, signing contracts to acquire parts, and allocating time on a launch provider’s schedule are all expensive and time-consuming to reverse. Launching a spacecraft, placing it in a disposal orbit, and deactivating it are all irreversible. These decision points are reflected in where there are divisions between phases, and when there are designated decision points in the life cycle.
There are several life cycle patterns for NASA projects, depending on the specific kind of program or project. I focus on the most general project life cycle [NPR7120, Fig. 2-5, p. 20], which is reproduced below:
The pattern includes seven phases. There is a Key Decision Point (KDP) between phases. Each decision point builds on reviews conducted during the preceding phase, and the project must get approval at each decision point to continue on to the next phase.
The key products for each phase are defined in Chapter 2 of the NPR and in Appendix I [NPR7120, Table I-4, p. 129].
Pre-Phase A (Concept studies). This phase occurs before the agency commits to a project. It develops a proposal for a mission, and builds evidence that the concept being proposed is both useful and feasible. A preliminary schedule and budget must be defined as well. If the project passes KDP A, it can begin to do design work.
Phase A (Concept and technology development). This phase takes the concept developed in the previous phase and develops requirements and a high-level system or mission architecture, including definitions of the major subsystems in the system. It can also involve developing technology that needs to be matured to make the mission feasible. This phase includes defining all the management plans and process definitions for the project.
Phase B (Preliminary design and technology completion). This phase develops the specifications and high-level designs for the entire mission, along with schedule and budget to build and complete the mission. Phase B is complete when the preliminary design is complete, consistent, and feasible.
Phase C (Final design and fabrication). This phase involves completing detailed designs for the entire system, and building the components that will make up the system. Phase C is complete when all the pieces are ready to be integrated and tested as a complete system.
Phase D (Assembly, integration, test, launch, checkout). This phase begins with assembling the system components together, verifying that the integrated system works, and developing the final operational procedures for the mission. Once the system has been verified, operational and flight readiness reviews establish that the system is ready to be launched or flown. The phase ends with launching the spacecraft and verifying that it is functioning correctly in flight.
Phase E (Operations and sustainment). This phase covers performing the mission.
Phase F (Closeout). In this phase, any flight hardware is disposed of (for example, placed in a graveyard orbit or commanded to enter the atmosphere in order to destroy the spacecraft). Data deliverables are recorded and archived; final reviews of the project provide retrospectives and lessons learned.
This pattern of phases grew out of complex space flight missions, where expensive and intricate hardware systems had to be built. These missions often required extensive new technology development. The projects involved building hardware systems that required extensive testing. The NASA procedures for such missions are therefore risk-averse, as is appropriate.
I have observed that many smaller, simpler space flight projects have not followed this sequence of phases as strictly as higher-complexity missions do. Many cubesat missions, where the hardware is relatively simple and more of the system complexity resides either in operations or in software, have blurred the distinctions between phases A through C. In these projects, software development has often begun before the Preliminary Design Review (PDR) in Phase B.
At the same time, I have observed some of these smaller space flight projects failing to develop the initial system concept and requirements adequately before committing to hardware and software designs. This has led to projects that failed to meet the mission needs—in one case, leading to project cancellation.
The phases in the NASA life cycle compares with the baseline model presented earlier as follows.
The NASA life cycle splits the system development activities across four phases. The NASA approach does this because it needs careful control of the design process, in particular so that agency management can make decisions whether to continue with a project or not at reasonable intervals. The NASA approach also places reviews throughout the design and fabrication in order to manage the risk that the system’s components will not integrate properly. Many NASA missions involve spacecraft or aircraft that can only be built once because of the size, complexity, and expense of the final product; this makes it hard to perform early integration testing on parts of the system and places more emphasis on design reviews to catch potential integration problems.
The NASA pattern is notable for some initial work on a mission concept starting before the project is officially signed off and started. There are two reasons for this. First, because all NASA missions have common processes, there is less unique work to do for each individual project. Second, NASA is continuously developing concepts for potential missions, and this exploratory work is generally done by teams that have an ongoing charter to develop mission concepts. For example, the concept for one mission I worked on was developed by the center’s Mission Design Center, which performed the initial studies until the concept was ready for an application for funding.
The Unified Process (UP) was a family of software development processes developed originally by Rational Software, and continued by IBM after they acquired Rational. Several variants followed in later years, each adapting the basic framework for more specific projects.
The UP was an attempt to create a framework for formally defining processes. It defined building blocks used to create a process definition: roles, work products, tasks, disciplines (categories of related tasks), and phases.
The framework led to the creation of tools to help people develop the processes. IBM Rational released Rational Method Composer, which was later renamed IBM Engineering Lifecycle Optimization – Method Composer [ELOMC]. A similar tool was included in the Eclipse Foundation’s process framework, which appears to have been discontinued [EPF]. These tools aimed to help people develop processes and then publish the process documentation in a way that would let people on a team explore the processes.
While the UP and its tools gained a lot of attention, their actual use appears to have been limited. I explored the composer tool in 2014, and found that it remarkably hard to use. It came with a complex set of templates, which were too detailed for project that I was working on. Another author wrote that “RUP became unwieldy and hard to understand and apply successfully due to the large amount of disparate content”, and that it “was often inappropriately instantiated as a waterfall” [Ambler23]. Certainly I found that the presentation and tools encouraged weighty, complex process definitions and that they led the process designer into waterfall development methodology.
The UP defined four phases: inception, elaboration, construction, and transition.
The UP does not directly address supporting production, system operation, or evolution; however, the expectation is that, for software products, there will be a series of regular releases (1.0, 1.1, 1.2, 2.0, …) that provide bug fixes and new features. Each release can follow the same sequence of phases while building on the artifacts developed for the previous release.
The four phases in UP compare with the simple model presented earlier as follows:
The Unified Process provides lessons for defining life cycle patterns: keep the patterns simple, make them accessible to the people who will use them, and put the emphasis on what they are for, not on tools and forms. The basic ideas in UP are good—carefully defining a life cycle, and building tools to help with the definition. I believe that these good ideas got lost because the effort became too focused on elaborate tools and model, losing focus on the purpose of life cycle patterns: to guide the team that actually does the work.
Some patterns focus only on the core work of developing a system. These patterns generally begin after the project has been started and the system’s purpose and initial concept are worked out. The patterns go up to the point when a system is evaluated for release and deployment. In between, the team has to work out the system’s design, build it, and verify that the implementation does what it is supposed to.
These examples all share the common basic sequence of specifying, designing, implementing, and verifying the system or its parts. Some of the examples include similar sequences of activity to evolve the system after release.
This pattern is used all over in systems engineering work. It is organized around a diagram in the shape of a large V. It is used in many texts on systems engineering; it has also been used to organize standards, such as the ISO 26262 functional safety standard [ISO26262, Part 1, Figure 1].
In general, the left arm of the V is about defining what should be built. The right arm is about integrating and verifying the pieces of the system. Implementation happens in between the two. One follows a path from the upper left, down the left arm, and back up the right side to a completed system.
There is no one V model. There are many variants of the diagram, depending on the message that the author is trying to convey. Here are two variants that one often encounters.
The first variant focuses on the sequence of work for the system as a whole:
The second variant focuses on the hierarchical decomposition of the system into finer and finer components:
The key idea is that specifications, of the system or of a component, are matched by verification steps after that thing has been implemented.
In general this model conflates three ideas that should be kept separate.
The first two ideas are reasonable. Having a purpose for something before designing and building it is a good idea. There are exceptions, such as when prototyping is needed in order to understand how to tackle design, but even that exception is merely an extension to the general flow. The second idea, of working top down, is necessary because at the beginning of a project one only knows what the system as a whole is supposed to do; working out the details comes next. Again there are exceptions, such as when it becomes clear early on that some components that are available off the shelf are likely useful—but again, that can be treated as an extension of the top down approach.
The third idea works poorly in practice. It is, in fact, an encoding of the waterfall development methodology into the life cycle pattern, and so the V model inherits all the problems that the waterfall methodology has.
In particular, the linear sequence orders work so that the most expensive development risk is pushed as late as possible, when it is the most expensive to find and fix problems. By integrating components bottom up, minor integration problems are discovered first, shortly after the low-level components have been implemented when it is cheapest to fix problems in those low level components. Higher-level integration problems are left until later, when complex assemblies of low-level components have been integrated together. These integration problems tend to be harder to find because the assemblies of components have complex behavior, and more expensive to fix because small changes in some of the components trigger other changes within those assemblies already integrated.
Development methodologies other than waterfall address these issues better, as I discussed in Chapter 22.
There are several life cycle definitions for system development, primarily of software systems, that go by the SDLC name. They generally have similar content, with variations that do not change the overall approach.
I have not found definitive sources for any SDLC variants. It appears to be referenced as community lore in many web pages and articles.
The core of the SDLC consists of between six and ten phases, depending on the source, that give a sequence for how work should proceed in a project. The phases are:
Phases marked (*) are not included in all sources.
Most discussions of SDLC stress that the pattern is meant to help organize a project’s work, not to dictate the sequence of activities. Some sources then discuss how the SDLC relates to development methodologies. A project using the waterfall methodology would perform the phases in sequence. Iterative and spiral development would lead to a project repeating parts the SDLC sequence multiple times, once for each increment of functionality that the project adds to a growing system. A project using an agile methodology would perform tasks at multiple points in the SDLC sequence in any given iteration, as long as for any one part or function of the system the work follows that sequence. I discussed how life cycles fit with development methodologies in Chapter 22.
Many electronics development organizations use a set of development and testing phases:
This set of phases is intended for developing an electronic hardware component, such as an electronics board. Developing this kind of hardware differs from developing a software component: while software source code can be compiled and tested immediately, a board design must be built into a physical instance before much of its testing can happen. Simulating the board can be done earlier, of course, but much testing is only done on the physical instance. This is especially true for integrating multiple boards together.
This pattern also addresses not just the design and testing of the component itself, but also the ability to manufacture it—especially when the component is to be manufactured in large numbers. The NASA, V, and SDLC patterns do not address manufacturing specifically; this pattern can be combined with those if a project involves manufacturing.
EVT. The EVT phase is preceded by developing requirements for the hardware product. It is often also preceded by development of a proof of concept or prototype for the board.[1]
During EVT, the team designs and builds initial working version, often continuing through a few revisions as testing reveals problems with the working version. The EVT phase ends when the team has a version whose design passes basic verification.
DVT. The DVT phase involves more rigorous testing of a small batch of the designed board. The design should be final enough that sample boards can be submitted for certification testing. The DVT phase ends when the sample boards pass verification and certification tests.
PVT. The PVT phase involves developing the mass manufacturing process for the board. This includes testing a production line, assembly techniques, and acceptance testing.
The last two patterns have to do with managing changes to the system: when errors are found, and when customer needs change.
Both these patterns apply to specific, short parts of a project. They apply as needed—when a error report or a change request arrive. Both also potentially involve repeating parts of the overall development life cycle pattern. Both may be used many times in the course of a project.
This life cycle applies when someone reports a defect or error in the system. It includes fixing the problem and learning from it.
Common practice is to use an issue or defect tracking tool to keep track of these reports and the status of fixing them. Many of those tools have an internal workflow, and parts of this life cycle pattern end up embedded in that internal workflow.
There are two different times when people handle error reports: when errors are found during testing, before an implementation is considered as being verified, and later, when a verified design or implementation must be re-opened. In the first case, the people doing verification are expected to be working closely with the people implementing that part of the system; the pattern for that activity amounts to reporting an error, fixing it, and verifying the fix.
The general pattern for addressing later errors is:
From time to time, someone will request changes to the system. The request may come from a customer, asking for a change in behavior or capability. The request may come from the organization or funder, reflecting a desire to meet a different business objective. The request might even come from a regulator, when the regulations governing a system change or when the regulator finds a problem when reviewing the system.
The pattern for handling a change request has much in common with the one for handling a defect report.
After receiving a request, someone evaluates the request to ensure that it is complete and that they understand the request. After that, there is a decision whether to proceed making the change and, if so, what priority to put on the change. After making the decision to proceed, there are steps to design, implement, and verify the changes and eventually release the new version of the system.
Change requests differ from defect reports in two ways. First, requests for changes do not reflect an error in the system as it stands. The team can proceed building the system to meet its current purpose and defer making changes until after the current version is complete and released. Second, most requests are expressed as a change in the system’s purpose or high-level concept rather than as a report that a specific behavior in a specific part of the system does not meet its specification or purpose. A high-level request will have to be translated into, first, changes in the top-level system specification, and then propagated downward through component specifications and designs to work out how to realize the changes. This sequence of activities to work from the change of objective to specifications to designs to implementations is essentially the same as the activities to specify, design, and implement the system in the first place. In the pattern shown above, the “develop update” step amounts to recapitulating the overall system development pattern.
The decision to proceed with a change or to reject it depends on whether the change is technically feasible and whether it can be done with time and resources available. This depends on having an analysis of the complexity involved in making a change. Ideally, the team will be able to estimate the complexity with reasonable accuracy and little effort. Analyzing a change request will go faster and quicker if the team has maintained specification and design artifacts that allow someone to trace from a system purpose, down through system concept, into specifications and designs, to find all the parts of the system that might be affected by a change. If the team has not maintained this information, someone will have to work out these relationships from the information that is available—which is difficult and error-prone.
The life cycle patterns in this chapter have all been developed in order to guide teams through their work. To meet this objective, they have to be accessible and understandable by the teams using them; they can’t be explained in legalistic documents that include many layers of qualification and exceptions. Some of these have passed this test and have been used successfully. Others, such as the Unified Process, have not caught on.
Some of the patterns cover the whole project, while others address specific phases or activities. One pattern often references other patterns: for example, a high-level pattern like the NASA project life cycle uses lower-level patterns for developing components or handling change requests. Some low-level patterns, such as handling change requests or error reports, can end up using or recapitulating higher-level patterns.
The specific patterns that a project uses depend on that project’s needs. A software project that is expected to be continuously reactive to new customer needs works differently from a project that is building an aircraft, where rebuilding the airframe can cost lots of money and time. The NASA approach is influenced by the US Government fiscal appropriation and acquisition mechanisms, which require programs to have multiple points where the government can assess progress and choose to continue or cancel a program.
All of these patterns implicitly start with working out the purpose of some activities before proceeding to do detailed work.
These patterns also implicitly reflect the cost of making and reversing a decision (Section 21.10). The NASA life cycle puts design effort before a decision to spend money and effort building hardware. The change request and defect report patterns place evaluating the work involved ahead of committing to make a change.
A reference life cycle pattern for projects. It models what a full life cycle contains, and can be the basis for developing an actual project’s life cycle.
The previous chapters have introduced the ideas of life cycle patterns and development methodologies, along with the ways that the two affect each other. Chapter 22 introduced a number of characteristics that one can choose to match a project. Chapter 23 presented a number of example life cycle patterns, along with a rough framework for comparing the examples.
In this chapter, I present a reference model of development methodology and life cycle patterns. This approach is based on the approaches I have used myself or have observed others using in successful projects, along with learning from projects that have gone poorly. These recommendations do not attempt to follow any of the development methodologies dogmatically, instead taking the parts from several of them that work well. In other words, I have tried to distill a pragmatic set of solutions from the many options available.
The reference life cycle covers the entire life of a systems-building project. It has four high-level phases: preparation, development, operation, and ending.
Project preparation is about setting up the project: how it will work, who is sponsoring it, who is funding it. Development covers working out what the system is for and then designing and building it, until it is ready for use. Operation is about producing the system, deploying it, using it, and evolving it. Ending is about shutting down the project when its work is done.
This reference also includes a project support “phase”, which includes all the activities that go on throughout the project to support operations.
Some projects are only concerned with building a system; once the system has been implemented and tested, it goes into production or operation and is no longer the concern of the development team. Those projects skip the operations phase. Most projects, on the other hand, have some level of involvement after the system is deployed and in operation, such as fixing bugs or enhancing the system. These projects involve all the phases.
The phases in the top-level life cycle in turn expand into more detailed patterns. Development consists of working out a purpose and a concept for the system, then developing a system to match, ending with a review to determine that the system is acceptable for putting into operation. Operation expands into a pattern of several phases, which I will discuss below.
Some projects will spend most of their time in development, while others spend most of their time in evolution after the system is in operation. Exploratory spacecraft missions usually consist mostly of development, since once the spacecraft is launched there is little opportunity to change the spacecraft beyond the occasional software update. Mass-market consumer software, on the other hand, often spends as little time as possible on initial development and can spend years developing upgraded versions to keep consumers satisfied. This reference life cycle fits both kinds of projects.
The arrows in this diagram show how information and artifacts flow from one phase to another, but they do not necessarily indicate complete temporal orderings. For example, the project preparation phase often lasts quite a while, and overlaps early parts of the development phase. Within operation, different customers might deploy and operate their own instances of the system, and the project may be working on multiple system improvements at once.
Two of these phases—system development and system evolution—involve designing and implementing parts of the system. These are the two phases where a development methodology applies.
I will discuss each of the top-level project phases in turn in the coming chapters.
Some projects require developing a proposal to get funding or approval to proceed.
The life cycle for this kind of project adds a phase between preparation and development to develop a proposal. Developing the proposal typically involves developing the purpose and a preliminary concept for the system, so that the potential customer or funder can understand what they will getting if they agree to fund developing the system. The initial concept is then documented as part of the proposal itself, which is typically a document (often a large document) explaining what the system will be, how it responds to the customer’s requirements, how long it will take to develop, and how much it will cost.
Much has been written about how to do proposal development well. There is best practice for how to organize a proposal development team and what kinds of reviews are helpful.[1]
After the customer or funder has agreed to the proposal, system development proceeds as it does for other kinds of projects.
Projects have times when there will be a decision whether to continue the project, end it, or continue with significant changes. Some examples: whether to start a project, when additional funding is needed to continue, or at periodic progress reviews.
These are often not driven by progress on making the system. They can be driven by external considerations, such as the need for funding, or by a regular cadence of progress checks.
Such reviews or decision points do not fit neatly into the flow of phases defined in the life cycle pattern. When multiple steps are in progress concurrently, as happens during most of the development phase, the decision often happens in the middle of several of them. Preliminary specification or design reviews are also common; they happen part way through specifying or designing part of the system. Design reviews often mean that design should have reached a given level of completeness for the top X layers of components in the system.
I will note some representative decision points in this reference lifecycle, but the actual milestones are project-specific.
The project preparation is for getting together the things that the team will need to operate.
The case for the project. Preparation includes getting funding or approval to begin pursuing the project. This usually includes developing an initial pitch for what the project might be about, who will benefit, and roughly what level of resources will be needed. This initial case for the project will evolve from a vague notion at the start to whatever is needed to get approval and funding.
I have found two guides useful for making this initial case. The so-called Heilmeier Catechism [Heilmeier24] is a set of questions originally developed to guide people pitching project ideas to the US Defense Advanced Research Projects Agency (DARPA). (Appendix B lists the questions.) It consists of eight questions that prompt one to articulate the what and why of the project, along with what it will take to do the work. The second is the CSP project startup document template [Wilkes90], which was developed at the Concurrent Systems Project at HP Labs in the 1990s to guide people to think through what they mean to do in a new project. It is organized around the scientific method, and is phrased in terms of a research investigation; however, it is just as useful for other kinds of projects. There are variations on these guides that add questions, such as: How might the result of the work be misused?
In practice the people starting the project will not begin with answers to these questions. They will have some general ideas for a system project, and their job during the preparation phase is to investigate those ideas to work out answers to the questions. As anyone who has tried to form a startup knows, the system that eventually gets built usually is different from the first ideas—and it is the process of investigating answers to these questions that will find the final answer.
These efforts to work out the project’s case naturally include identifying stakeholders (Section 16.2). They also include some of the work to define the system’s purpose (Section 27.3).
Project operations. Project preparation also works out how the project will operate. This includes:
Decision point. At some point during preparation, a decision must be made whether to pursue the project or to stop, based on the case for the project and a general understanding of its costs. The decision should be included as an explicit milestone for the preparation phase, or immediately after, so that people on the team are reminded to take the time to think through whether the project makes sense before more resources are committed.
It may seem that the decision can be left implicit when the project needs no external resources—but in practice the resources used always represent an investment and there is an opportunity cost if the team could be working on something more useful.
Outputs. The preparation phase results in many document artifacts, which the team uses later as they execute the project. The documents record the many decisions that people make during preparation.
People will use these artifacts in a number of situations:
Timing. Bearing in mind that as the project and team are systems that bear careful design and implementation of their own, working out how the project will run is a process that takes time. Most projects start small, with just a few people and a general approach to how the project will operate, and develop additional details over time. Project preparation thus usually overlaps the beginning of development.
Progress on developing the project’s operations plans is balanced against the project’s progress on getting started and working out the system concept. Bear in mind Section 8.1.5—Principle: Team habits: the team will develop habits based on the procedures and organization they are working with, and changing those habits is hard. If the project leadership takes too long to develop team organization or life cycle patterns and procedures, it can become expensive and error-prone for the team to change behavior. On the other hand, if the project leadership rushes to develop these procedures and organization and gets them wrong, the team can end up in a similar situation.
The resolution to this dilemma depends on judgment by the project leadership; I know of no recipe for getting things exactly right. A few principles can help:
Completion. Project preparation is complete for the most part when the project is set up to execute. This includes having funding or approval to do the project, as well as having team structure, life cycle and procedures, artifact management, basic tools, and resources are worked out.
Preparation is never truly complete, however. Many of the things worked out in preparation will need to be revised as the project goes on. For example, a team’s organization usually needs to change as the team grows from a few people who can collaborate informally to a large team who need more formal organization (Section 19.3.2). The project may also need to change the focus of the system based on funder or customer needs; changing the system may mean changing how the project runs.
Milestones. There are no milestones intrinsic to project preparation in general. The principle of working out how some part of the project will work before the team needs that information applies, but that is not a milestone in itself.
Other stakeholders may impose milestones on project preparation. For example, getting funding from a funder or approval for the project from the organization may be required.
Project support covers all the various things done continuously in the project to keep the team working. Project support starts with the beginning of the project and ends when the project ends.
This phase includes work to monitor and manage parts of the project. Teams are one example (Section 19.3.1); maintaining plans and tasking (Section 20.5) is another. Tracking project risk (Chapter 62) and technical uncertainty (Chapter 61) supports planning.
Other elements of project support include:
These efforts, similar to project preparation, will usually start small and develop over time. Similar principles apply to the timing of work on project support.
The development phase sees the project work out what the system is supposed to do, and then build the system to meet that objective.
Before going into the sub-phases that make up the development phase in the large, it’s worth thinking about how a system actually gets developed. A great many systems have been built over the centuries without the benefit of methodologies; with some experience, good systems engineers usually have intuition that guides them through development.
Development starts with a rough idea of what the system is for: what problem the system will solve, or what it can do for people. An aqueduct begins with the idea that something should transport water from a source to a town. A pump driven by a steam engine starts with an idea that a machine could pump water out of a mine better than a human- or animal-driven pump, and thus allow mines to go deeper than they had before.
The people thinking about the problem to be solved also often have some approaches in mind that might be applied. Someone responsible for moving water into a town might know about aqueducts that have already been built. Steam pumps were developed incrementally by many people over a period of over two hundred years.
For developing modern complex systems, the development process still begins with a general idea of what the system might do and what problems it might solve, perhaps with some key technical approach in mind.
The team needs to get from this general idea to a clear and precise definition of what they need to design and implement. This does not occur in one step; the detailed design of the system does not spring fully-formed from the chief engineer’s head. Instead, the team starts with a vague understanding and refines it bit by bit until it is clear enough for design and implementation to start.
The team does need to understand the system’s purpose before working out how the system should work. However, in practice these are often parallel efforts, where some people work with customers and other stakeholders to clarify the system purpose while some people begin to brainstorm ideas of what kind of system might meet that purpose. As the understanding of purpose becomes clearer, those who are investigating what the system might look like—the concept of the system—will refine their ideas. Those who are working on the system concept track updates to the purpose, often feeding questions back to stakeholders when they find something potentially ambiguous or when they suspect that some part of the purpose might not yet be worked out.
The system concept represents the bridge between understanding the customer’s needs and building the details of the system. The concept sets the general approach that the team will use. Working out the concept is a time for creativity, when the team can entertain many possible ways to build the system, eliminating those that aren’t likely and refining those that are promising. The team evaluates these possible approaches along the way to see if they are likely feasible to build and to meet the system purpose.
The team may be tempted to turn the concept-building exercise into a full system design exercise. This is unwise. First, the techniques used to develop a system concept are meant to be fast and fluid, not working to the degree of rigor that design and implementation require. Second, this can lead to a concept and design period that drags on and on when the team needs instead to make a decision about the high-level structure and then move on to investigating design based on that decision. Third, stopping to review the basic concept before committing to it makes for a better concept that will better guide the team later.
This means that a system concept will be (and should be) incomplete. It should show some of the big ideas of the system’s structure, and it should show that these ideas are likely to meet the stakeholders’ needs and are likely to be technically feasible. It should be accurate, in that anything named in the concept should in fact be a necessary part of the system, but it should not be precise, having all of the details worked out.
Once the team has a concept, it is a good time to step back. Is this system still worth building? Is it likely to be feasible? Is it going to be a good answer to customer needs? And is it plausible that the resources needed will be available?
As the development work moves forward, the team will refine the concept. They will find things missing in the concept and have to find designs that fill those gaps. They will find inconsistencies or mistakes, and they will have to correct them. At the same time, customer needs may change—so the initial concept will always be different from the final system.
The level of detail and analysis needed in the concept depends on the project. A project that is building a revolutionary system for potential future customers probably only needs a rough sketch of the system, since investigations will continue for months or years into what those customers really need. On the other hand, a project that is answering a request for proposal typically needs a much more developed concept in order to explain to a funder what they will get and why their funding will be a reasonable risk.
Once the purpose and concept are completed, the team can turn to actually developing the system itself. In practice this is rarely a sharp transition; instead, some part of the team may begin moving forward in working out a system specification even before the concept is finalized, or they may begin prototyping parts of the system that seem especially uncertain.
Development consists of many sub-phases. Purpose development comes first, in which the team determines what customer needs the system will address. After determining the system’s purpose, the team develops a high-level concept for the system, then builds the system itself. The development phase ends when there is agreement that the built system is ready to be produced, deployed, and put into operation.
The first two steps set the direction for the system development work. Purpose development establishes a record of who the stakeholders are for a project, and what each of them needs the system to do. This record of the system’s purpose will be incomplete, initially, but it must be accurate at the time it is documented. The concept phase then provides the time to explore different ways that a system might be built to meet those needs. The concept records a high-level picture of how the system will behave, the environment in which it operates, and some of the main top-level components that will make up the system. The concept phase is also the time when constraints related to security and safety are refined, turning general objectives coming from the customer or other stakeholders into more precise statements of what those objectives mean. Part way through or at the end of concept development is a good time for a review and decision about whether to continue the project.
The system development step in turn consists of many tasks. In this reference approach, the development phase is organized first into a number of system feature development phases, using the development methodology to determine what those phases are. Each system feature development phase, in turn, is organized as a sequence of specify-design-implement-verify patterns.
In this section, I will first discuss the development phase as a whole, then go into more detail about each of the subphases and development methodology.
Beginning. Development begins as soon as the project has a general idea of what customer needs to meet, has gotten funding and approval to start working on the system, and the project leadership has completed enough preparation that people can know the basics of how to do development work.
As I noted earlier, project preparation work is rarely complete by the time development begins. Enough of the preparation should be done that people can begin working out and documenting the system concept, and later parts of development should be gated on other preparation steps.
Completion. Development ends with a system that is ready to be released for production and deployment. Being ready means that the system purpose identified in concept development has been met in the system’s implementation, that this fact has been verified, and that the customer and other stakeholders agree.
The acceptance phase addresses checking that stakeholders (Section 16.2) agree the system is ready for production. The customer—or a proxy for the customer—provides a final validation check that their needs will be met. The organization and funder, as other stakeholders, may weigh in to validate that their objectives have been met, such as that the system will be sufficiently profitable, before investing in production. Some systems will require regulator approval or certification before the system can proceed to production; for example, civil aviation authorities require type certification for commercial aircraft before mass-producing and deploying new aircraft models.
Outputs. There are five kinds of artifacts that are created in the development phase:
Milestones. The primary milestone comes at the end, in the acceptance phase. This milestone can go by different names. The NASA life cycle calls this the operational readiness review, for example. Passing this milestone implies that the system is ready for production (manufacturing) and deployment. As I noted above, this involves checking that the system meets stakeholder needs, and their agreement that this is true. This can include also regulatory approval.
There are other possible project-wide decision points or milestones for checking whether the project is on track and can continue or not. These do not necessarily fall at the beginning or end of phases; sometimes they happen in the middle, in order to correct the project’s trajectory or as dictated by external needs.
Other subphases in development define their own milestones.
The purpose development phase is for working out in detail what the system is to be, in terms of what it will do for its users and who those users are (Chapter 9).
The people responsible for working out the purpose work with the customer (or a proxy for customers, when the customers are hypothetical; see Section 16.2.1). This requires the team to work directly with the customers, in order to understand not just what the customers are saying they need but also to identify implicit needs and to find constraints on the system that the customers may not be able to articulate.
The team does similar work with other stakeholders. They identify the objectives that their organization has: is it to make a certain level of profit? Are there time constraints on demonstrating capability? Who might be the funder, and what are they looking for? And finally, who might have regulatory authority over the system, and what regulation or standards apply? All this information creates constraints on how the system can be built and what it can do, and will be considered when determining whether these other stakeholders will agree for the project to continue.
The needs found in this phase define objectives that the system should try to address. The constraints, on the other hand, define things that must be true about the system. ! Unknown link ref
I discuss working out system purpose further in Chapter 31.
Inputs. The project should already have a vague idea of who the system will benefit and what their needs are. This is usually worked out when making the initial case for the project, as part of project preparation (Chapter 25).
Completion. The purpose development phase is complete when the list of stakeholders is complete, when the needs of each of those stakeholders are understood and have been documented, and the stakeholders agree that their needs have been documented correctly.
Outputs. The purpose phase produces two artifacts:
These artifacts together define the system’s purpose and constraints on its design.
Milestones. The purpose phase can end when each of the stakeholders, or a reasonable proxy for them, has reviewed the list of their needs and agrees that the list is complete and accurate.
The concept development phase is the transition between working out system purpose and beginning to design the system in detail. It is a time to work out an initial, rough idea of how a system might be built to meet the purpose and constraints worked out in the purpose development phase. It is a time to brainstorm many different possible approaches and to be creative. These different approaches can be evaluated and narrowed down to one concept. That concept is the start, not the end, of design; it will guide the work in the subsequent system development phase.
The system concept is a sketch of the system on paper or similar media. It should cover all the major behaviors of the system, but it should not go into great detail about how those will be achieved.
The concept has two general parts: an external view and an internal view. The external view takes a black-box perspective of the system, and includes:
The internal view is an initial sketch of the insides of the system’s black box. This view includes:
The concept does not usually go more than one or two levels deep in the component breakdown.
This information can be recorded in different forms, and it usually takes more than one to capture it adequately.
Documents recording analyses complement these records. The whole collection of concept documents also records the rationale for decisions taken, and perhaps includes records of alternative designs that were considered and not chosen.
The concept is used for three purposes. First, it reveals whether there is likely to be a feasible approach for implementing a system that meets the customer needs. Second, it provides an illustration to customers and other stakeholders that they can use to validate whether the concept meets what they expect their needs to be. Third, it provides a guide as the team begins to specify and design parts of the system for real.
A likely-feasible concept is one where there is likely to be some way to design and implement each of the high-level parts of the system, and that combining those parts will likely satisfy stakeholder needs. The concept can only be likely, because it is supposed to be developed quickly; the uncertainties about whether the concept will actually work are not completely resolved until the whole system has been built and verified. The process of developing the concept can generate a list of what technical uncertainties people have found or suspect. (These uncertainties guide work planning as the project moves into the system development phase.)
The concept gets reviewed by stakeholders, including customers or customer proxies. While a stakeholder might look at the list of their needs as generated in the previous phase and think it complete, I have found that when they step through how a system concept will operate they get a different perspective and come to realize things they missed in the list of needs. When they find that a system concept appears to meet all their needs, the act of validating the concept with them provides them confidence in the project.
Finally, the road that the team follows from initial idea to a complete design (and implementation) has to start somewhere. The concept provides that starting point. The high-level components identified in the system concept become the starting point to specify, design, and build all of the rest of the component in the system.
Chapter 32 discusses the work involved in making and documenting the concept. To summarize that chapter, developing a system concept involves brainstorming many possible approaches to meeting customer needs and sketching them out. These different approaches get evaluated and compared to find out how well they meet the system purpose and how feasible they are; this often involves doing simple analyses. The evaluations show where there are gaps in meeting customer needs or in the technical solution. The best possibilities get refined or combined and improved until the approaches have been narrowed to one best option.
I have said that the concept should be likely feasible, and that the technical uncertainty and project risk uncovered in the investigation should be acceptable. The obvious next question is, how likely or how much uncertainty? In fact these uncertainties and risks are not generally quantifiable, as they deal in unknowns and the point of the concept development exercise is to expose unknowns and not to work them out. Qualitatively, some projects can accept more risk than others: a startup that is developing a speculative new technology can accept far more risk than a project proposing a system for a fixed-price contract. The decision will require a judgment call on the part of project leaders.
Inputs. The concept phase starts with the list of stakeholders and their objectives and constraints, which was developed in the purpose development phase. It can also use whatever informal investigation has been done in advance about system function or possible implementation approaches.
Completion. The concept phase is complete when either the team has found what they believe to be the best approach to designing the system, or they have determined that they cannot come up with a feasible approach.
A feasible system concept provides an understanding of how the system will function when viewed from the outside as a black box, and when that function has been shown to meet stakeholder needs.
A feasible system concept also defines some amount of internal structure and behavior, enough to support an argument that the team can plausibly build a system that works that way. This means that there are likely ways to build each of the components, and that the amount of time, money, and people required to build and verify the system is within what is available to support the project.
The system concept phase must end while the concept is still a concept. In many projects I have seen the temptation to keep improving the concept—make things a little more certain, make things a little better—before declaring the concept done. When this is left unchecked, concept development slides into system design and development, and leaves out the check of reviewing the imperfect and incomplete concept. Skipping that check means that easy and inexpensive course corrections don’t happen and the problems that will always be there aren’t detected and corrected until they are more expensive to fix.
Outputs. The concept development phase produces a number of artifacts that record the system concept, along with the rationales for why that concept was chosen. I noted earlier what the documentation of the concept should include. These artifacts are placed under configuration management, as they are likely to be revised as the project continues.
Milestones. The concept development phase ends with a conceptual design review (CoDR). This review checks the system concept to ensure that the concept meets stakeholder needs, is internally consistent, and is likely feasible to build. Customers and other stakeholders participate in this review when possible. Team members also participate, as a way to both check each other’s work and to share a common understanding of the concept. Some independent reviewers should also participate in order to check for gaps or biases that the team may have missed.
The conceptual design review is often used as a project go/no go decision point. If the team has not found a likely feasible concept, or one that meets organization and funder needs, this is a time for the organization to decide not to continue with the project. In this way the least resources are used before deciding to stop the project.
The system development phase is about creating the system based on the concept worked out in the previous phase. At the end, the project has the artifacts for a working system ready to hand off to production and deployment. Along the way, the project may need to meet other milestones—preliminary and critical design reviews for government customers, or feature demonstrations for funders.
The reference development methodology structures how the team does the work to design, implement, and verify that system. It is based on the spiral or incremental methodology. Project leadership works out a set of intermediate milestones where the team builds and demonstrates some set of system features working—usually integrating different parts of the system along the way. There is a life cycle phase leading up to each of these milestones, in which the team does the tasks needed to add features to the system. These are called feature development phases. Each feature development phase has an expected duration. If it appears not to be on track to meet that deadline, the team takes this as a signal that corrective action is needed. Unlike in the spiral methodology, this methodology leads to multiple overlapping feature development phases, running in parallel on different timelines and working toward different milestones.
This approach was motivated by several goals.
Compare this approach to waterfall and agile development methodologies.
Waterfall development, practiced strictly, does not handle uncertainty or adaptation well: the system is designed up front, and implementation follows thereafter. In practice, projects nominally using the waterfall methodology often develop intermediate milestones to organize the work.
Agile development, on the other hand, can lead teams to constantly change direction—unless they develop a plan with some longer-term objectives. When they do so, agile development ends up looking a lot like this reference methodology. Short sprint periods can also work poorly for parts of a project doing work that does not complete within one sprint, like building an airframe or developing detailed analyses.
Example. Consider the following example, taken and simplified from a spacecraft project I worked on. The mission involved multiple spacecraft working together to perform a science mission.
The mission’s concept development defined the overall design of the system: multiple spacecraft, communication links between them, communication with ground stations, and so on. The concept also defined an initial breakdown of the system, where the spacecraft had a set of major subsystems like structure, power, avionics, sensors, flight software, and so on. The concept identified some existing software and hardware designs that could be re-used for this mission.
The development phase, then, was about building hardware, software, and operational procedures that would implement that concept.
The team worked out the major steps that had to happen to build the system, such as designing the avionics, designing the structure, testing and integrating them, and putting sample spacecraft units through environmental testing (heat, vacuum, vibration). The project also would build software to run each spacecraft, which involved tasks like prototyping algorithms for attitude control and then verifying that they would work in testbed equipment. These major steps were partly worked out based on experience on previous missions, and partly from working backwards from the high-level system design to determine major functions to be implemented.
The following shows the first part of the sequence of feature development phases for the main flight software (simplified and abstracted from the original). The flight software had a series of milestones that started with the basic software infrastructure and a simulation environment for testing it. Later milestones then added capabilities one after another. Each milestone integrated new functions across several different components. In most milestones, the work involved behind the scenes was as important as what was overtly demonstrated; for example, the first demo was as much about establishing a software configuration management and build system as it was about demonstrating simple software running.
This project made extensive use of software skeletons or scaffolds, mockups, and emulations. This is typical of a project that prioritizes integration over feature depth. In this case, the main spacecraft control software for the first couple of demos was a simple skeleton of what it would become. The software modules involved could start up, and interact with some others in simple ways, but there was no real logic in the control. Building this part first reduced integration risks that the control software modules would not interact properly with the middleware and operating system on which they ran—and indeed showed middleware bugs that cause the system to crash. By the third demo, the team added basic attitude control logic to the control software. This attitude control still only had limited function; its purpose was as much to show that the control software could interact with (emulated) sensors and actuators.
A system feature development phase is a stream of work that adds a defined set of features (the purpose of the phase) to the system, ending in a milestone with those features implemented, integrated, and demonstrable. It starts with design work that has already been done and the purpose of the work, and ends with system artifacts updated to meet the phase’s purpose.
This approach to organizing development is focused on the features rather than on the components or component breakdown. One feature development phase usually involves several components (and their subcomponents). It promotes the integration of work across parts of the system.
Inputs. A feature development phase takes as input the system concept, design, and implementation artifacts that have already developed, plus a definition of the features that are to be implemented in this particular development phase.
Completion. The feature development phase is complete when the system has been built or modified to implement all the features named for this phase. The completeness and correctness of the implementation is documented in verification records and by demonstrating selected features working in the new system version.
Outputs. Feature development produces several different outputs.
Along the way, the design phase work may also produce:
Milestones. A feature development phase has one milestone, at its end. At this milestone, the completion conditions listed above should hold. The verification records are checked to ensure that the implementation passed verification, and the team who worked on the changes demonstrate key features to the rest of the project.
As will be seen next, the feature development phase is made up of several subphases, and each of these have their own milestones.
Reference pattern for feature development. A feature development phase recapitulates the life cycle of the overall system development life cycle. It starts with purpose, works out a concept, then proceeds into the specification, design, implementation, and verification of parts of the system to build in that purpose.
The concept for a feature development phase includes working the general design approach for adding the phase’s features. As with the system concept, the feature concept involves brainstorming different ways to implement the features, along with evaluations of the alternatives until the team selects one concept. The concept for the features should give a general idea of what components will be modified or created in this phase, along with the internal structure among those components and a narrative of how they will interact (the concept of operations for the features).
Identifying the components that will be affected is key to being able to scope how much effort will be required to implement the features, and who will need to be involved in the work.
The next step is to develop or modify specifications for the components involved (Chapter 33). These detail how the components are to behave and the non-functional attributes they are to provide. This may involve adding to or modifying the top level system specifications, or flowing those specifications down to components. Security, safety, and reliability specifications are particularly important.
Design follows specifications, working out how each component can be built to provide the behaviors and properties it is specified to have (Chapter 37). Design may require evaluating alternatives, perhaps by modeling or prototyping (Section 8.3.5; Chapter 41).
Two separate and independent implementation steps follow. One step implements components and changes to components, following the design. The other step works out how to verify the features in the feature development phase, including verifying both the individual components by themselves (using unit tests, for example) and the features that are provided by the components integrated into the system. If the verification implementation runs ahead of the component implementation, the component implementers can verify as they go (using test-driven development).
As parts of the feature set are implemented, they are verified. By the end of the feature development phase, the components created or changed in the phase and the features the phase is adding are all verified.
The feature development phase ends when the team successfully demonstrates that the system now has the features they have worked to implement. This demonstration might amount to showing that the new system version has passed its verification checks, but doing an actual demonstration gives the people who did the work an opportunity to show the rest of the project what they have done and for the project as a whole to celebrate their work.
Once again, note that this work is organized around the features, not the components. This methodology does not necessarily mean implementing each component’s changes in isolation, verifying those, and then verifying their integration. Rather, the team can order the work however works best for the particular task at hand. For example, an integration-first approach might lead the team to build simple skeletons or mockups of component changes and focus on checking out how the components will interact before implementing detailed changes to the components—which means verifying integration before verifying the unit components. (Of course, the finished changes still need to be verified as a whole before the verification work is done.)
The reference pattern for the feature development phase, in the diagram above, includes review milestones for each of the steps (concept, specification, design, implementation, verification) involved. These reviews serve two purposes. First, they are an opportunity for someone independent to check the work in order to find things the team doing the work might miss. Second, they provide an opportunity for the team working on the features to pause long enough to ensure that they all understand the work in the same way.
Finally, the team responsible for a feature development phase may decide that the phase is large enough that it should be split up into subphases. Each of the subphases might have its own milestone goals; those subphase goals build on each other to reach the features of the main feature development phase. These subphases might focus on individual components or smaller groups of components, or they might split the work into sequential steps, or some combination of the two. These subphases follow the same pattern as the higher-level feature development phase of which they are part.
Interaction between parallel feature development phases. The feature-oriented focus of this methodology can cause problems. If the team is working on two sets of features in parallel, these features could affect some of the same components. Someone working on feature set A might change component C to support A’s features. At the same time, someone working on feature set B might also change component C. In the worst case, the changes might be in conflict and the changes for A might preclude the changes for B working, or vice versa.
The underlying problem is known as serializability in database and parallel computing systems, where it has been studied extensively. In these systems, different approaches to handling concurrent changes are measured by whether they produce the same result as if the work was done serially, one task at a time rather than concurrently. That is, the work is serializable if it ends up with component C looking as if the work for feature set A were done entirely and then the work for feature set were done, or vice versa. This has led to many algorithms for coordinating concurrent work.
The simplest approach is to make changes serially: the people working on feature set A change C first, and when they are done, people working on feature set B get a turn. This is useful when the component cannot be physically shared, like a paper drawing or a mechanical device. There are two costs to this approach. First, one group must wait for the other to be done. Second, when group A changes C in ignorance of what group B will need, group B may have a lot of rework to do when its turn comes (and it is likely to need to consult with group A to keep their changes working).
Another approach is to let the two groups independently change C in
parallel, keeping two separate versions of C and merging the changes
when both groups are done. This is the approach taken by distributed
version control systems like git
[Git], which were developed for
use by geographically separated, non-communicating software
development teams. These tools rely on being able to reliably compare
the different versions and to guide people through reconciling
conflicting changes. The cost comes when the two groups make
incompatible changes that cannot just be merged together.
The third way, and the one I have found most successful in complex systems projects, is to have one person or a small team be responsible for the shared component C. That person (or team) becomes part of both groups A and B working on parallel feature set changes. This responsible person can choose to handle the changes serially, or may choose to use a version control tool to manage their work. The advantage of this approach is that the person responsible for C understands the rationale for why the component is designed as it is, and will make changes that fit with the designs already completed. That person can also understand the needs of both sets of features, and design changes to support both rather than having to undo and redo incompatible design work.
A system feature, in the end, is made up of behaviors and properties of a number of components. That is, system features are emergent from the individual components involved.
The work to implement a system feature is thus made up of the work on each of the components, along with the effort to integrate those components and their changes. The team working out the concept for the feature determines how parts of the high-level feature are allocated to components. That is, they work out what behaviors or properties are needed from each component so that together they produce the high-level feature. Along the way during concept development, the team works out what components are affected by the feature development work.
The feature development life cycle pattern for the high-level feature applies for developing the changes to each of the affected components. Just as the feature as a whole has concept, specification, design, and implementation steps, so do each of the components. Developing the concept for the feature includes developing a concept for each affected component. Developing the specification for the feature leads to developing specifications for each component, and so on. The implementation of the feature is the implementation step for each component.
The people who are working on all these component pieces coordinate their work so that it all integrates properly and produces the desired features.
That coordination means that the work on each component moves at a pace at least partly constrained by the work on other components: for example, the specification step for any one component cannot be completely finished until the specifications for all the affected components are finished. Otherwise, the specification work in some other component could reveal a surprise that affected the specification that was thought to be finished.
At the same time, teams rarely just stop and sit idle when the work on some component lags. They proceed from specification to design to starting implementation, accepting the risk that some surprise may happen that will require them to re-do some amount of work. The choice of how much work to do at risk has to be made based on the usual estimates of likelihood and consequence. If the work on some other component is almost done and is in the final stages of cleaning up details, the likelihood of finding something that will require a change to other components is unlikely. On the other hand, if the work on some other component is just getting started, then the chances of a surprise are high. If part of the component in question appears to be fairly immune to changes in other components, then there is little risk of having to redo that work. For example, if the component will definitely need to communicate over a network with other components, then getting network communication designed is low risk.
The figure above illustrates how the work for a feature is coordinated across all the components. The top row shows the steps or phases for the feature as a whole. that work is broken down into the work for two components, shown in the middle two rows. The components each follow the feature development pattern of concept, specification, design, implementation, and verification. The last row covers the thread of work done to address integrating the changes to individual components, and it follows a reduced form of that pattern. The feature integration thread of work is primarily about checking that the work on the components properly combines to produce the high-level system features, and so it focuses on verification methods for this integration.
The figure also shows that the concept development work for the high-level feature and the affected components may often be done as a single task. If the feature and components are simple enough, a small group can work out the concept together and produce one set of concept artifacts that cover both the feature as a whole and its effects on specific components. In this case, the artifacts for each component will reference the shared concept artifacts; after a while, the records for a component may reference several concepts for different features.
If the feature or the components are more complex, the work may need to be divided up so people can work on different parts in parallel, combining and reconciling the pieces before the concept is completed. The artifacts for the components will then reference their own concept for that feature as well as the high-level feature concept documents.
The feature development pattern in the last section covers the simplest case: when the team is designing and building a straightforward feature. There are three variants to consider: when the component carries enough uncertainty that prototyping is warranted; when the component will be acquired from outside the project rather than built in house; and the specific needs for implementing hardware components.
Prototyping. Prototyping is used when there are possible technical approaches to designing some part of the system, and the technical uncertainty is too high. In these cases, taking steps to reduce the uncertainty before committing to one particular design can lead to better outcomes.
The uncertainty can take different forms. In one case, the team might have an idea, but they don’t know if it will work correctly. In another case, they may not have an idea for a solution, and they need to explore and learn in order to find possible solutions. Or the team might have a solution, but lack skills essential to completing design or implementing it. Finally, the team may have a solution that is not technically mature enough, and they need to validate its suitability. In each case, developing a prototype of some kind can help.
The prototyping effort is added to the design step. The prototype might take the form of a simple implementation, or of a model of a possible solution. Any prototyping effort should have a clear purpose: to see if an idea works (and working out what it means “to work”) or the like. The focus must be on learning what is needed as quickly as possible. The work should prioritize speed of learning over quality of the prototype implementation.
Prototyping can be a necessary part of learning about a design and managing its uncertainty, but its contribution to the system is indirect—by leading to a good design. The amount of effort or time spent on the prototype should be bounded so that the prototyping effort does not take over the development effort.
The principles about prototyping (Section 8.3.5) apply. The prototype artifacts should be built as quickly as possible to maximize efficient learning, without putting in effort to make them high quality. The artifacts that come out of the prototyping work must not end up in the real implementation.
Acquired component. Sometimes components are best acquired from somewhere else rather than being designed and built by the team. This might involve reusing a component from another project, or using an open source design, or purchasing a component from a supplier. Acquiring a design or component can take advantage of work that others have already done, reducing development costs. It can take advantage of expertise that the team does not have itself, such as a supplier that can manufacture an electronics board or a software vendor that has developed a component with a particular algorithm.
The pattern for an acquired component proceeds with developing a concept for what is needed and a specification for the component. The specification is the basis for a request for proposal (RFP), which is sent out to potential suppliers that are expected to offer potential solutions. The suppliers in their turn use the specification to develop a design, which might simply be an off-the-shelf product or might involve development work on their part. Once the suppliers have a design, they respond to the team. The team evaluates whether the design in fact meets the specification and determines which option is best, if there they have more than one potential choice. In many cases the team will build a simple prototype using a supplier’s prototype implementation, if they have one, as part of the evaluation. After that, the supplier implements, builds, and delivers the component. In other words, this pattern moves the design and implementation work away from the project team and onto the supplier.
The team, however, still does some amount of verification once they have received the implementation. This acceptance testing may be more limited than it would be for a bespoke design, if the supplier provides information about the verification steps they have taken. Nonetheless, the team should spot check any verification work that the supplier has done and must check that the supplied component integrates as expected into the rest of the system.
Acquiring components like open-source designs or software do not have to go through the process of developing a formal RFP. However, these components do still require evaluation before deciding whether to use the design or not. The team must ensure that the license terms are compatible with the system. The team must also ensure that the potential component meets the specification of what is needed of it. Finally, the team must evaluate the quality of the component—which for open-source components, includes not just the quality of the artifact itself but also its governance and supply chain security [Goodin24][CVE24].
This pattern involves support roles that I have not detailed out elsewhere. For example, the acquisition might involve someone who manages contracting or payment. The acquisition will likely involve checking that the license terms and intellectual property rights associated with the component are appropriate for the system the team is building, which may require legal expertise.
Hardware components. Hardware development has different constraints than some other kinds of component development, and so a different development pattern applies. The primary cause of the differences is that a hardware component involves physically building one or more artifacts, which can take time and resources. This makes iterating on a design to work out bugs or to change features much more expensive than it is for software or higher-level designs. In addition, some verification testing is destructive, putting a component in increasingly harsh environments or under harder loads to determine when it fails.
Hardware development also differs from other kinds of component and feature development in the way terms like “design” are used. A design for an electronics board is a full description of how it is to be implemented; in some cases, it can be sent to an automated production system to create a complete physical board. Similarly, many mechanical designs are complete enough to send to a CNC machine or additive printer to create the physical artifact. By comparison, a software design is more abstract; it cannot be directly translated into a working program. Software source code is closer to mechanical or electronic designs, as source code can be sent to compilation tools that produce the executable artifact.
These constraints have led to disciplines about how to organize hardware development. I discussed the EVT/DVT/PVT pattern earlier (Section 23.4.1), which defines a sequence of phases for developing and verifying a hardware component. The NASA approach uses different language [NASA16, p. 124] to describe the sequence of hardware artifacts to be developed and verified. The two approaches are similar, with one naming the phases and one naming the artifacts.
This approach splits up the design, implementation, and verification phases into multiple iterations. There are typically four iterations.
The fourth step, producing production or flight units that can be deployed, can occur as part of development or later, in a production phase after the system has been accepted (Section 28.1). If a component is going to be mass produced, verifying the manufacturing methods is worth doing before declaring that the component is complete. After acceptance, the manufacturer will build more units. On the other hand, if only a handful of units will be built and they are expensive to build, such as with individual spacecraft, delaying the production of those units until after acceptance can manage risk.
Finally, the development of a hardware component is part of the development of the larger system. This leads to two ways that the hardware development steps can be organized, depending on how the hardware development will be synchronized and integrated with other parts of the system.
The first way is to plan out the hardware component development as its own thread of work. This way has the advantage of keeping the team focused on designing and building the component.
The second way is to break up the hardware development thread into smaller steps, and put some or all of those steps in feature development threads. For example, when building a circuit board that will run a control system, it will be hard to verify that the board works without some version of the software that runs on it or the interfaces to sensors and actuators of what it controls. In other words, verifying the integration of the hardware component with other parts of the system is an essential part of checking that the component actually works. This is the way virtually every project I have worked on has actually planned out its hardware development work.
As an example, this sequence of feature development steps is loosely based on two different control system implementations in projects I have worked on. The sequence shows how different hardware and software components come together to implement increasingly complex features. This approach integrates the hardware and software parts in incremental steps.
The acceptance phase is the time for final checks that the developed system is indeed ready for production and deployment. It is the last step in the overall system development life cycle.
There are three kinds of checks involved: that the system can be put into production and deployed; that the customer (or their surrogate) validates that the system is what they need; and that regulators approve the system, if needed.
The check for production and deployment involves verifying that the manufacturing and distribution process is ready for operation, and that all the procedures and tools are in place to install a manufactured product for customer use. For a software-only product, the manufacturing and distribution procedure involves packaging the software release and putting it on distribution servers (or manufacturing distribution media if it is not distributed over networks). The deployment readiness involves verifying that the packaged software has prominent and understandable instructions on how to install it and start using it. On the other hand, for a mass-produced hardware product, verifying manufacturing and distribution involves checking that the manufacturing line can correctly build the system, that it has the proper supply chains in place to support the manufacturing, and that the products can be shipped and warehoused before delivery to customers.
Validating that the system meets customer needs involves customers trying out an instance of the system—not just looking at documentation about the system. This often involves getting one or more customers to use a test installation of the system to do the tasks that the customers need. For some systems, this kind of validation can be done by beta testers, who are given an almost-ready version of the system and try it out in their environment while providing feedback of what works or doesn’t. Other systems that involve more installation and setup can involve setting up test installations that the customers come to use.
Regulatory approval involves different procedures in different industries. An aircraft, for example, must be reviewed and certified by the appropriate civil aviation authority. A spacecraft mission typically requires licenses for launch, communication, and certain kinds of earth observation. Other systems may need approval by an industry safety organization. Most of the work to get these approvals or licenses is part of the development phase, and the acceptance phase is the final check that the necessary approvals are in place.
Once these checks are completed, the final milestone is for the organization and the project to decide whether to proceed to production and deployment or not. Many systems are designed and built, but in the end the organization behind the project decides that the result does not justify the investment in production. Many commercial aircraft, for example, are designed and built, but in the end there is not sufficient sales interest to start production and the aircraft model is quietly retired.
Once the system has been developed and verified, it is ready to be manufactured, deployed, and put into use. The initial work of building is done, but there is much more to go. There are several ways the operation phase can proceed, depending on the kind of system, kind of customer, and the role that the organization that developed the system plays.
The general flow is first to manufacture or produce the system using the artifacts that have been developed, then deploy instances of the system. After that, the system instance is in operation. Further development of the system, to evolve it or to fix problems, continues in parallel with customer operation. Finally, at some point, the customer will decide to retire and dispose of the system instance. The steps of deploying, operating, and retiring system instances can occur multiple times in parallel for different customers.
Production is not the application of tools to materials. It is the application of logic to work.
The production phase covers manufacturing the artifacts to be deployed.
Bear in mind that this is a brief overview of manufacturing, intended to explain the main points that people like systems engineers or project managers will need to know in order to understand the general scope of the work, and to understand how the manufacturing steps are related to other parts of the system-building work. Manufacturing has been studied and refined for a couple centuries, and there is an extensive literature with far more information.
There are several kinds of production that different projects might use. These include:
Production of a new system for a new installation can also differ from production of parts for maintaining or upgrading an existing installation. A new system might consist of a complete collection of hardware components that will be assembled from scratch for the installation. Producing replacement or upgraded parts, on the other hand, consists only of manufacturing a few parts and making them available for deployment into existing installations.
A review and approval to begin production milestone checks that the project has everything ready before committing to production, as discussed below. The review checks that the system development has completed all its milestones and that a system will be ready to deploy when manufactured. It also checks that everything needed for production itself is ready: the manufacturing tools and people, suppliers, testing. It also checks that the organization is prepared by being able to pay for supply and manufacture, and that people are ready to deploy systems once their parts have been manufactured, so that capital does not remain tied up in unneeded inventory.
Production relies critically on security of the supply chain, management of the developed artifacts, the manufacturing process, and the delivery mechanisms. All these elements of the production process have been attacked in recent years. For example, the SolarWinds attack [Zetter23] compromised the production process for their software, which was then distributed to and installed by many other organizations and led to attacks on those other systems. There are other reports of fake hardware components (e.g. pressure sensors [Control19]) being injected into a supply chain. These attacks can result in loss of system components, delaying deployment to a customer, exposure of intellectual property, deployment of a faulty or dangerous system, or creation of security problems for the system’s customer.
The overall production process has the following steps:
This flow depends on the supply chain of parts used in manufacturing or production. Any physical parts or stock used must be on hand to perform manufacturing; this implies that the stock is in inventory, and that it has been supplied from some qualified source. Sourcing implies selecting the suppliers and setting up contracts for them to provide the stock. The contracts with the suppliers should include clear specifications of exactly what stock or components are to be supplied, along with evidence that the delivered parts meet the specification.
Procedures for receiving materials from suppliers and maintaining inventory are part of the definition of manufacturing procedures. The procedure will typically need some amount of space for maintaining this input stock, along with managing information about what stock is on hand and what should be used next. The storage space maintains the input components or stock in an environment that will keep the material in its designed storage conditions. The procedures include determining when to order more stock. The receiving and storage facilities should have security that ensures that material is not stolen or replaced.
The production process relies on accurate configuration or version management. The artifacts used to manufacture the production components should have consistent versions, and those should match the versions used for final verification during development. If inconsistent implementations were manufactured, the components might not work together—and the resulting problems are often subtle.
The manufacturing procedures specify who does what steps, in what order, using what tools. These procedures are designed during system development and verified during production verification testing (see the section on hardware development above).
After system components have been manufactured, they are checked to ensure that there are no manufacturing defects. This is typically called acceptance testing. For many hardware components, this involves putting the component through a set of tests that are defined during system development. These tests do not stress the component to a level that will induce faults, like testing at high temperatures or voltages; the tests only look for potential manufacturing problems. Some mechanical or electrical components go through a “burn in” period, which operates the component long enough to catch early component (“infant mortality”) failures. For some other kinds of components, only a sample of each batch of components gets tested, under the assumption that manufacturing defects will tend to cluster in one production batch (for example, one day’s production shift).
The production process involves a significant amount of record keeping. Each produced component has its own set of records. These records start with the component’s identity, typically represented as a serial number. The record identifies what version of the input development artifacts were used, often by associating a release version number or code with the serial number. The records include when, by whom, and using what equipment the component was built, so that if parts start failing an analysis can identify other components that may be at higher than expected risk of failure. The records track what parts or stock were used to manufacture the component: the serial number of components used, if appropriate, or the supplier, model, and batch number of stock.
In addition, each manufactured component must be identifiable. That typically means that it should be clearly labeled with its model or version information and serial numbers, at minimum. The labeling is typically in both human- and machine-readable forms.
Once a component has been manufactured and checked, it is placed in inventory and later delivered for deployment. The components in inventory are stored in secure spaces that maintain the components in their designed storage environment—often dust-free, within a particular temperature and humidity range, and so on. The inventory is managed to know what components are in stock and ready to send for deployment.
The production process needs to be resilient to disruptions. One company I worked for was building hardware systems outside the US, and investors asked the company how they would handle a political or military disruption in that country. (The answer was that the company would go out of business because it had no alternative manufacturing option.) Many production or manufacturing processes are also in places that can be vulnerable to natural disasters, including earthquakes and storms.
Finally, the manufacturing process is generally a human process, and processes involving humans have a tendency to drift over time away from their originally-intended procedures (see e.g. Leveson [Leveson11, Chapter 12]). This drift can come from changes in how people are trained, people finding potential simplifications in the procedures, changes in the environment in which the people are working, and many other causes. The designs of robust, safe manufacturing procedures include periodic audits to check that people are performing the procedures as originally designed, and to re-design the procedures if they are found to have problems in use.
Inputs. The production step uses many inputs:
I use two terms loosely: input component and stock. By input component, I mean something that is used as it is in manufacture, such as a chip or a valve. By stock I mean material that has to be worked during manufacture, such as a metal or wood block that is machined to make a component, or plastic that is melted and formed in a 3D printer to make something else. Others may use other terms for these two kinds of inputs, but the distinction remains.
Outputs. Production has two major outputs: Deployable artifacts that are in inventory storage or on their way to a customer, and records of each artifact.
Milestones. Production does not begin until there has been a review that ensures that the organization is ready to perform production activities. The approval milestone checks that all of the manufacturing, testing, inventory, and logistics procedures are complete and performable. These checks typically depend on results from production verification testing. The review also checks that all the necessary suppliers are qualified and under contract to deliver manufacturing inputs. Finally, approval to begin production depends on having the capital or cash flow needed to support production, and that the organization is ready to deploy the manufactured system once it has been produced.
Each component has an acceptance testing milestone, as discussed above.
I mentioned earlier that different projects follow different kinds of production patterns. Here are a few examples that show some of these different approaches.
Software only. This example covers a software-only system that is delivered electronically to customers for installation.
When building a software-only system, many people don’t put much thought into what happens between when a version of the source code is marked as ready for release and the delivery to a consumer. In practice there are several steps between the two, and those steps require careful design.
The input to production is a version of the software—either as source code or as binaries—that has been verified to meet its specifications, and validated against the original customer needs. This code is under version control and has been labeled as being ready for release.
The output is one or more installation packages on servers that customers (or deployment teams) can access over networks. Some software packages are not or cannot be delivered over networks, in which case the output is some physical artifact, such as a CD or USB drive, containing a copy of the installation package.
The production process involves the steps to generate these installation packages then stage them on distribution servers. The procedure typically involves building binary versions of the software from the appropriate source code artifacts, then performing acceptance tests on the binaries. The binaries are then bundled with other material, such as manuals and configuration files into an installable package. The package also includes metadata recording what the package is, its version, and the environment in which it is intended to be used. The procedure also adds security information, such as signatures or encryption to ensure the integrity of the package. The installation package is then copied to distribution servers, and tested to ensure that the package can be downloaded and verified correctly. Once the package is available for distribution, the final step is to let customers know that the package is available.
If the software is intended to run in multiple environments, such as on different operating systems or CPU architectures, the procedure will need to be repeated for each target environment.
In recent years, the integrity of the software production and distribution process has received increasing attention [CISA21]. This has led to standards for protecting the production and distribution processes.
Single spacecraft mission. Building a spacecraft is different from producing software: it involves physical artifacts, and it produces only one or a few instances of the spacecraft.
A project will typically build at least one spacecraft that will fly the mission, but may build a backup or an extra that is used on the ground to verify behavior during the mission.
The objective is to deliver a flight-ready spacecraft that is ready to ship to the launch site, be placed on a launch vehicle, and fly the mission (the deployment), or to deliver a test unit that is otherwise identical to the flight unit to testing teams.
Before assembling the flight instance, many projects often separately manufacture all or parts of additional spacecraft that are treated as Qualification Units for testing, especially for environmental testing that pushes the test unit beyond normal operating limits and might damage it. These units may be built and tested as part of the development phase or during production, as appropriate to a specific project’s rules.
The production process starts with acquiring and building all the components, then assembling them according to procedures worked out during development. The assembly is typically done in a “clean room” that keeps out contaminants that could affect the spacecraft’s ability to function, such as dust entering into cable connectors or hinge bearings. The team typically performs incremental acceptance testing along the way to ensure that subassemblies have been built correctly while they are accessible.
The team assembling the spacecraft document what components are used in each unit as they are assembled. The accumulated records are maintained for the entire life of the spacecraft, as they can be essential to establishing the causes of problems encountered in flight.
Once the entire spacecraft has been assembled, the team performs final acceptance testing, ensuring that testing remains within limits that will not inflict damage. They then package up the built spacecraft for delivery, typically in sealed containers that will protect it from contamination and shock during shipping. The packaged spacecraft is then delivered to the launch site, where it is mounted to the launch vehicle in preparation for launch.
Some spacecraft require final preparation shortly before launch. This can include charging batteries, entering final configuration data, or loading gases and fluids (such as fuel). These steps follow carefully-defined procedures, as they often involve hazardous materials (such as hydrazine fuel) and because there is risk of damaging the launch vehicle in ways that could cause in-flight failure.
The overall production step typically has strong requirements for safety and security. A malfunctioning spacecraft can lead to the failure of a mission, at the cost of significant invested capital. In some cases a malfunction can risk life and property on the ground, such as when a spacecraft causes failure of a launch vehicle, enters the atmosphere and damages or injures something on the ground, or creates debris that damages other spacecraft or injures people on orbit. To this end spacecraft are regulated and must obtain safety approvals before being allowed to launch (see, for example, the US regulations [14CFR450]).
Mass consumer product. This kind of production is for a device that is produced in large numbers for use by the public. These are often produced regularly, in multiple shifts or over multiple days, though not necessarily continuously. The production rate is often ramped up and down to reflect demand. Mass production for consumer products is often done by a contract manufacturer rather than in house, but not always.
Mass production requires a supply chain that can deliver the right parts on a steady schedule, with warehousing to maintain enough parts to keep the production line going and absorb any expected interruptions in delivery.
While mass production for consumers often does not use security standards as high as those for high-assurance systems, security still applies. In particular, using component parts different from that specified can cause unexpected failures in use. Consumer products also need security to keep the features of a new product secret until it is released, and security to avoid theft during and after production.
The manufacturing process uses assembly instructions for workers. These instructions are developed during the system development phase, and are verified during PVT. The instructions must be understandable by the people who will actually do the assembly, who often have different backgrounds from the people who develop the system. The instructions must also account that people may switch from working on one product to another and back over time.
Manufacturing may involve molds or jigs used to create mechanical parts. These are designed and produced during development and verified during PVT.
Products need acceptance testing and possibly burn-in after being assembled. The acceptance tests are also designed and verified during the system development phase. The tests often use test equipment that is also designed and verified during development.
Manufacturing results in many assembled and packaged products ready for delivery. These are then delivered to customers or to warehouses using a logistics provider.
The production process should be checked regularly. Because production goes on for a long time, the people or procedures may drift from the procedures originally developed. People find shortcuts, or worker training may change, or the environment in which assembly is done may change. The production activities may also reveal mistaken assumptions embedded in the assembly and testing procedures. Regular checks or audits will find where these discrepancies exist, and allow people to either bring the assembly and testing procedures back on track or create change requests to update the procedures.
The objective of deployment is to set up a system instance for a customer and get them successfully using that system.
There are several kinds of deployments. The first variation is: who is doing the deployment? Consumer products are set up and installed by the customer. More complex systems are delivered and set up by a team that is part of the project. I will refer to this as “assisted deployment”. Other systems are deployed and used internally by the organization that created them. The second variation is whether one is deploying a complete new system installation, or installing an upgrade into an existing system.
The overall flow of events is the same for all these variants:
A system is deployed into an environment. That environment might be a customer site for a physical system; it might be spread over multiple sites; it might be an attachment to a launch vehicle; it might be resources on a compute server somewhere. In all those cases, the customer finds the places where the system can be installed. The deployment team and the customer usually interact before deployment starts to let the customer know what is required for the system, and for the customer to let the deployment team know what is available.
The environment for a software system might include the number and kind of compute servers used, the amount of memory or storage on each, the reliability and security of the servers, and the reliability of each server.
The environment for physical systems might include physical space, along with the temperature and atmosphere in that space. It might include the mechanical mounting needed, along with electrical, water, networking, and other supply lines.
Some customers will be migrating from an existing system to the new system being deployed. The migration might include moving information from the old system to the new system, or it might involve moving physical artifacts or supply from one to the other. Developing the migration procedures are a development activity on their own; in effect, they are a second mini-system to design and implement.
Complex systems will have users who need to be trained in order to operate the system safely and correctly. The initial group of users are trained during deployment, so that they can verify that the system works correctly and can take over its use once the installation has been accepted. Other users will learn to work with the system later, perhaps years later.
The installed system includes education and training materials for these users. These materials are assembled during the development phase of the project.
Different kinds of users may interact with the system. At the simplest, there are users who directly command and use the system’s primary behavior. A system may also have administrators who are responsible for specialized tasks, such as managing the set of users or the system’s security. It may have people who are responsible for maintenance and repair. I likely has other people who set policy for how the system should be used. All these people use the education and training material, and that material must address each of their needs.
Deployment presents a number of ways that someone could attack and compromise the system. The system components will be in transit from the production facility or warehouse and could be tampered with; they will be received at the customer site and might be accessed before being installed. The system components may be partially installed but not fully configured to be secure during the deployment process. The deployment procedures themselves could be altered or hijacked. All these potential exposures mean that the deployment procedure must be designed with security in mind, and that security must be evaluated as part of the system requirements.
Deployment includes setting up the customer on a customer service system. Once the system has been installed and the initial users have been trained and given access, the customer will begin to take over system operations. As they do this, they are likely to find they do not actually understand some parts of the system and have questions. They will use the customer service system to communicate with the project team for questions and to report problems.
Accidents and incidents may happen during system operation. When these happen, the customer works with the team that developed and maintains the system to investigate what happened. If the accident is serious enough, regulatory agencies may be involved. During the deployment process, the team establishes the necessary working relationships with the customer that will help the customer to detect when accidents have happened and to bring in the team for investigation. The investigation may determine that there is a flaw in the system, in which case a problem report and change requests are sent to the team to guide fixing the flaws. Section 28.7 below addresses how the team handles such changes.
Setting up the customer for ongoing success using the system is the last part of deployment. Once the system is in operation, the customer’s users are responsible for the ongoing safe and secure use of the system. Users of complex systems tend, over time, to find shortcuts and workarounds for how they use the system. They may forget part of their training, and new users may not be trained fully correctly. The environment in which the system operates may also change—parts might be moved, air conditioners changed out, or electrical feeds changed, for example. All of these can slowly change how the system is working and lead to accidents. Regular monitoring or auditing of system and user behavior is necessary to detect and correct these drifts and avoid accidents, and this auditing must be backed by management policy and actions. (See Leveson [Leveson11, Chapter 12] for background.) The deployment activities, therefore, must include working with the customer to establish the necessary monitoring activities and to establish necessary management policies.
Customer deployment. The components for these systems are delivered to the customer, who is responsible for installing or upgrading the system. The process includes:
Assisted deployment (internal or external). When someone from the project team does the deployment, the process is similar to customer deployment.
The process includes:
Inputs. The deployment step takes as input:
Deployment can also involve migrating materials or information from a previous system. If so, the procedures for doing the migration are also an input.
Outputs. The deployment step results in:
Milestones. When the customer handles deployment, the milestones involved are their concern.
When the team handles deployment, there are three potential milestones:
Deployment follows many different patterns, depending on the kind of system and customer. The following four examples illustrate some of the range of ways that the general deployment step can happen.
Digital product. Start with a digital product, such as a software application. These are often deployed by the customer, and involve deploying no physical artifacts. The customer downloads the application over a network and runs an installer package to perform the deployment.
The deployment process begins with the customer ensuring they have the resources needed to support the application. This includes operating system and CPU architecture compatibility, and the amount of available memory and storage needed. The customer gets and checks this information, presumably online, before deciding to download and install the application.
Next, the customer downloads an installation package and runs the installation. The package performs checks to ensure that the application is supported in the local environment, and copies in the application contents. The download or the installation package may interact with the customer for payment or licensing.
This process is more or less the same whether the customer is installing a new application, or installing an upgrade to an application they already have.
At this point, the customer has an application they can use. However, they may not know now to use it yet. The customer can learn about the application using training media provided with the application. If the customer is updating an application, they usually look for information on whatever changes update might include.
The customer is responsible for copying in any information that they may already have that they want to use with the new application.
Most consumer applications provide some kind of customer service, which the customer can use to report problems they find or to ask for help. These services are often provided on line as web sites.
Consumer product. Now consider a simple consumer hardware product: a home light fixture.
In this example, the customer is responsible for all of the deployment steps. Unlike the previous example, the deployment involves hardware artifacts and includes steps required to maintain safety.
The customer starts the process of deploying a new light by determining what kind of light they need—in the ceiling, on the wall, stand alone, and so on, as well as the needed brightness and the electrical supply voltage. They then research what fixtures are available from their preferred suppliers, implying that an organization that is building light fixtures sends out specifications and advertising materials to those suppliers well before the customer goes looking.
Once the customer selects, purchases, and receives the fixture, they review the installation instructions that the team has developed and included with the fixture.
The customer then installs the fixture using those instructions. The instructions should include basic safety steps, like turning off power to the affected circuit before working with the wiring. The customer tests that the light works after it has been installed.
Complex system, shared deployment responsibility. The previous examples have been simple, performed entirely by the customer. The next example covers a more complex deployment.
Consider an information system that supports a repair and maintenance workshop. This example is based loosely on a system I worked on for local government public works agencies, which maintained a wide range of equipment from buses to lawn mowers to backhoes.
The repair and maintenance organization had multiple shop sites. Some shops were specialized for working on particular kinds of equipment.
The system provided record-keeping support for managing work orders (repair orders), scheduling resources like work bays or large equipment, and managing parts inventory. It also interfaced with the customer’s other IT systems: security and user authentication systems, and systems to place orders to buy parts and to pay for them.
For a particular installation, the customer asked for a set of features to be added to an existing software package. The development phase of the project for this customer involved work to determine their specific objectives and changes, implement changes to the base system, and then validate the customized system with the customer. Once the customer accepted the changes at the end of the development phase, a production phase generated the software installation packages and other materials for the deployment.
Physically, the system consisted of a small set of servers in a server room, plus workstations of different kinds at the workshops. This equipment used communication equipment between the sites and the server room. The server room provided power, cooling, communications, and support services like backup and security for the servers.
The customer wanted to perform a phased installation and roll-out, where initially only a few people would use the system and over time its use would be extended to more and more sites. The goal was to minimize risk by avoiding disruption to the shop’s existing work, and to contain any problems that might come up as the shop users learned to work with the system. A phased installation would also allow the customer and deployment team to monitor the performance of the servers and communication systems in order to identify unexpected behaviors before they caused problems. The customer decided to continue using their existing (paper-based) system for all existing work, so no data would be migrated into the new system.
The project’s deployment team was responsible for installing and configuring the initial system, and for training the initial users. The customer installed servers and communications, along with workstations at the shop sites. The customer was also responsible for adding users to the system and would take over training and configuration after the system had been rolled out to half the shop sites.
The deployment process proceeded as follows:
This system thus had a phased transition between deployment and operation, rather than a hard split between one phase and another.
Spacecraft. Deploying a spacecraft covers the activities from when it is delivered to the integration site to be integrated into a carrier or onto the launch vehicle to when it is on orbit and ready to perform its mission.
The general sequence for spacecraft deployment is:
When done with deployment, spacecraft is ready to perform its planned mission, is in communication with other systems, and operations team is managing the spacecraft
Deploying a spacecraft is different from the other deployment examples above in two key ways. First, a spacecraft poses far higher safety risks than the other examples. The deployment process reflects this by using procedures that have been designed and checked to meet safety constraints, and the deployment team are trained accordingly. Second, significant parts of the deployment occur beyond human access: while the spacecraft is on orbit, people cannot stop by to observe or fix a potential problem. The spacecraft’s design thus must provide sufficient information to the operations team on the ground to be able to detect and analyze problems without visiting the spacecraft. The operations team also uses detailed records of the spacecraft’s configuration, and so the production process must record all the details of what components were used, their provenance, and inspections of the work.
In this phase, the system is placed into operation. The customer uses the system, performing administration and maintenance as needed. Most of the system operation is the customer’s responsibility; in this section, I focus only on what the project does to support the customer’s operation.
The system operation phase affects the project team in two ways. First, the team will sometimes support the customer during operation. Second, point of the team’s work is to build a system that can go into operation, which means that the system’s design supports all the activities that the users will do. This includes the rare and exceptional activities, not just everyday usage, so these activities are included in the concept and specification to which the system is built.
The customer is responsible for maintaining the system. That may mean only following procedures for periodic checking, but for many systems maintenance can be far more intrusive, and involve regular replacement of some components. The customers rely on maintenance procedures that are designed as part of the system to keep the system operating safely; these maintenance procedures are designed to take safety, security, and reliability constraints into account. The customer also periodically orders replacement parts to install into their system.
The customer also takes care of their users. This includes adding and removing user access to the system and training those users. The project team supports these tasks by including features to manage users and their roles as part of the system. The team also develops training material that the customer uses when bringing on new users.
The system may have problems from time to time. These may reflect flaws in the system, improper usage, wear and tear, or combinations of all three. The customer, as the system owner, is responsible for handling the problems. However, the project team sets the customer up to be able to address problems by developing instructions for detecting and diagnosing problems, and training some of the customer’s staff on how these work. The project may also provide services to help diagnose and repair problems. The project also provides some form of customer support that the customer can use to report problems back to the project.
Most complex systems have human elements—users who operate the system and in doing so act as a control system that manages system behavior. As I noted in the previous section, these users can change how they interact with the system over time, finding shortcuts or using the system in ways they are not expected to. The customer establishes usage policies and performs monitoring and auditing tasks that check that people continue to interact with the system in safe and secure ways. The project team sets the customer up to perform this work by documenting what constitutes safe and secure system usage, including the rationale for why some interactions are acceptable and others are not.
Accidents happen. When some loss or injury occurs due to the use of the system, both the customer and the project team have a responsibility and interest to determine why the accident occurred in order to avoid future accidents. The accident investigation may also be mandated by regulation, in which case regulators are involved. The customer may be able to pursue the investigation on their own, if they have sufficient information about how the system is supposed to be used safely. The project assists in the customer’s investigation by providing that information, which includes the documentation of how to use the system safely, and why. However, for serious accidents, the investigation often requires a more in-depth understanding of the system’s design and implementation. The project prepares for supporting these investigations by maintaining complete records about the system’s concept, specification, design, and implementation, including explanations of the rationale for why choices were made and safety or security analyses that the team did about the system’s design.
Finally, the customer may find that their needs change over time, or that there is some aspect of the system that does not work as well as they had planned. These changes can be externally driven; for example, regulatory changes that affect the customer’s industry can affect what the customer needs from the system. The project team can receive change requests (along with problem reports) through a customer service mechanism.
Inputs and outputs. The operation phase is ongoing, unlike some other phases. It continues as long as the customer continues using the system. It is also primarily the customer’s responsibility.
The working system, as accepted by the customer at the end of deployment, is the primary input. That working system includes parts that support the customer’s tasks:
From the point of view of the project, the customer’s operation produces a few outputs:
Milestones. Most organizations require some kind of authorization to operate in order to place a system in operation. This is typically a review that all of the system deployment steps, including acceptance, have been completed successfully and that the system meets the customer’s policies. All these steps should have occurred earlier, and the authorization to operate is usually just confirmation that none of the steps were skipped.
The system remains in operation as long as the customer chooses and as long as they maintain the system in good repair. The customer thus periodically performs maintenance tasks and audits that usage remains safe and secure. The customer periodically determines—perhaps implicitly—whether to continue the system in operation.
Operations vary widely depending on the kind of system. Here are some examples illustrating the range.
Consumer product. A consumer product is generally the responsibility of its users. The development team is responsible primarily for designing a system that the users can understand and providing enough documentation or training material so that the users learn how the system works. The development team also provides documentation on any cleaning or maintenance tasks the users should perform.
Some consumer products can require occasional more complex maintenance, and a product team might offer a maintenance service in addition to the system itself.
Aircraft. Operating a commercial aircraft is a joint endeavor between the air carrier and its staff, the manufacturer, and the civil aviation authority (CAA). While the carrier’s pilots are responsible for an aircraft in flight, the carrier has overall responsibility for safe operation. The carrier is responsible for setting policy and training its staff in order to meet CAA regulation. The manufacturer supports the carrier by, first, getting type certification for the aircraft design, and then providing the carrier with documentation on the general limitations of the aircraft’s design.
The air carrier is generally responsible for ensuring all its employees and contractors have training and know which rules to follow—pilots, flight attendants, ground handlers, maintenance personnel, dispatchers and so on. Individual people are responsible for complying with the rules and limitations of their certificates—pilots, dispatchers, and mechanics, for example.
The manufacturer works in concert with the air carrier and repair facilities to develop training materials and is responsible for promulgating maintenance documentation, including service bulletins generated from operational reports back to the manufacturer about problems discovered through use of the aircraft. This means that the project team develops this material during the development phase.
If there is an incident or an accident with the aircraft, the carrier typically works together with the CAA and other government organizations as well as with the manufacturer to investigate what occurred. The records of the aircraft’s design and manufacture, along with safety analyses, implementation, and verification, are one of the inputs to these investigations.
Summarizing, the project team has the following responsibilities that affect operations:
Uncrewed spacecraft. Unlike the other examples, an uncrewed spacecraft is operated completely remotely. The only way to interact with it is through command and telemetry communication channels. Without the ability to interact physically with the spacecraft, its operators rely on design records and hardware instances on the ground to interpret the information they receive.
A spacecraft is typically managed by an operations team. This team uses ground systems—which are designed and implemented as an integral part of the overall mission system—to watch the telemetry sent by the spacecraft and send up commands. The operations team plans upcoming activities for the spacecraft, such as observations to take or maneuvers to make, based on mission plan. The team uses design information about the spacecraft’s capabilities to determine what activities to plan, and the order in which different steps must occur. The team turns these plans into commands that are sent up to the spacecraft, which then follows the commands. The spacecraft sends telemetry messages down to the ground systems. The operations team processes and interprets this data. They use information about the sensors generating the information, such as records of how the sensor has been calibrated, its position and attitude on the spacecraft, and the format of data it sends.
The operations team also monitors the telemetry data for off-nominal conditions. It detects that the spacecraft has had a problem by comparing the data received against what is expected from the plan, such as expected attitude information, and looking for data values that are out of normal range, such as a high temperature or low battery voltage. After identifying that a problem has occurred, the operations team looks for the causes of the problem and then works out how to return the spacecraft to normal operation. The investigation relies on the spacecraft’s design records. The team often uses simulation models or duplicate spacecraft systems on the ground to see if they can replicate the problem and to verify that any recovery plans will work as intended. Once they have a plan, they formulate the corresponding commands and send them up to the spacecraft.
For example, consider the first crewed Starliner CST-100 flight [Foust24]. During the early part of the flight, several thrusters began showing poor performance that led the flight systems to shut them down. Even though the spacecraft was carrying crew and eventually docked to the International Space Station, no one could physically access the thrusters to determine what had happened. In the end, teams on the ground replicated the performance problems using duplicate thruster units. Having learned the likely cause of the failures, NASA changed the flight procedures for departing the ISS and returning to ground. (The agency also determined that the failures posed sufficient safety risks that the vehicle did not carry crew on the return to Earth.)
Factory system. Consider a generic plant that produces chemicals. Its operation involves multiple chemicals that can cause serious injury and death to both workers and the surrounding population in an accident. While parts of the plant’s operations are automated, there are many manual operations—cleaning, responding to a failure, maintaining machinery, and so on. The plant, therefore, relies on its operators following safe procedures. This generic example is inspired by several real-world examples; see Leveson [Leveson11, Section 2.2.4] for one relevant case study.
Chemical plants are subject to incentives that work against safety. The desire for profitability leads to streamlining operations or shutting down safety-specific systems, which then break safety requirements. Individual staff are likewise incentivized to work quickly, and often look for workarounds that make their jobs easier or faster. These can also break safety requirements. Finally, staff turnover leads knowledge gaps at all levels, so that workers and management don’t know what is needed to maintain safe operation.
Plants like this are operated by a company. The company’s upper management are the ultimate authority that is responsible for safe and profitable plant operation. They set policy for how the plant’s workers will balance profitability against safety. The plant management act on this policy to run the plant, making specific operational decisions to set procedures. The plant workers then follow the procedures to operate the plant (or shut it down when needed).
The hierarchy within the company forms a control hierarchy, involving decisions, feedback, and commands. Upper management sets policy, gives instructions to plant management, and observes feedback metrics. Plant management give instructions to staff, adjusting those instructions to meet the company’s policies. The staff in turn control portions of the plant.
Two steps are needed for this control hierarchy to keep the plant operating safely. The first is that everyone working on the plant or overseeing it must have an accurate understanding of how the plant has been designed for safety. The project staff who design and build the plant make this information available to people in the company, both as reference documentation and as training material. The second is that the behavior of each level of the control hierarchy must be regularly monitored to ensure that the people are operating their part of the system consistent with safety designs. If there is a deviation from safe practice that violates safety constraints, the company takes corrective action to stop the unsafe behavior. This is true at all levels of the company, and especially for upper management: cost-cutting measures meant to improve profitability are a common cause of accidents, and upper management must be answerable to checks that will prevent such decisions.
These control systems are part of the system to be designed and implemented during the system’s development phase. Accurate controls do not arise spontaneously; they come from intentional design. A safe system’s implementation defines roles for upper management, plant management, and plant staff, and includes the procedures that each is to follow. These procedures are verified both analytically and (where possible) by testing, in order to ensure that each level will behave in ways that keep plant operation safe. The analyses account for human factors—what kind of information each role can receive, how likely that is to convey the correct understanding of what is happening in the system, the incentives driving people in each role, and how accurately they can implement instructions.
In some cases, auditing operations will find that people are not following the designed procedures but that these changes do not pose a safety risk. These changes first must be checked thoroughly for evidence that they do not violate safety constraints in the system. If they are found to be acceptable, they should lead to a formal change to the documented procedures (in the form of a change request; see Section 28.7 below). The documented procedures must always remain consistent with what people are actually doing so that all staff clearly understand what is acceptable operation and what is not.
The system evolution phase is about making changes to the system after it has been released and potentially deployed to customers. These changes can happen for many different reasons—such as a planned roadmap for adding to the system over time, requests for changes from customers, fixing problems, or changes in regulation. System evolution can occur in parallel with system deployment and operation.
Overall, system evolution is a recapitulation of system development (Chapter 27). It involves working out a purpose for the change, a concept for how the system will work when changed, leading to specification, implementation, and verification. These steps use information about what has already been specified and implemented in the system, along with the reasons why it is that way, to work out how to make changes that achieve the desired results without disturbing the system’s existing behaviors.
Making changes starts with a change request. In whatever form the request takes, it identifies who is asking for a change, what their purpose is in the change, and why it is worth doing. In practice change requests are usually maintained in a database. Requests can come from many sources. They may be part of the project’s long-term plan to continue developing the system. They may come from customers, who ask for new or changed capabilities. They may stem from the investigations into reported problems or accidents, in order to avoid problems in the future.
A project does not act on all change requests. Some of them will be technically impossible; some will be infeasible because of time or resources. Others might be reasonable requests that have to wait until higher-priority requests have been addressed. The team looks at each request received to determine its importance, its feasibility, and its cost, and makes decisions about whether to accept or reject the request based on the analysis. If a request is accepted, the team determines a relative priority compared to other work or a potential deadline. These are used in planning the team’s upcoming work (Section 20.5).
Determining whether a request is feasible involves determining how much of the system will be affected by a potential change. While working out the concept for the change, a team member determines what parts of the system will be affected, using documentation about the system’s structure and design (Chapter 12). The result is a preliminary analysis listing the set of components that will be changed and the general nature of those changes. This information is then used to estimate the effort that will be needed to design and implement changes for the request.
Changes happen iteratively. There may be multiple iterations in progress concurrently if multiple changes have been accepted. Handling multiple concurrent iterations requires careful configuration management discipline (Section 17.4).
Making the changes involves changing the specifications and designs for affected components. These changes can be difficult to make accurately because they are done to an existing, complex set of relationships between components. Making a change without causing flaws depends, then, on being able to accurately understand the structure of the system and how parts of that structure contribute to emergent properties like safety constraints. This relies on having rationales, analyses, and earlier designs available, so that people can work from an accurate information base.
Once a change has been specified, designed, and implemented, it is verified. Verifying the work for a change has two parts: ensuring that the modified system meets the new specifications and the purpose of the change request, and ensuring that the rest of the system continues to work correctly.
Once the changes have been verified, they can be deployed to customers as an upgrade or incorporated in new deployments, using the production (Section 28.1) and deployment (Section 28.3) patterns already discussed.
The team continues to evolve the system until the team is relieved of responsibility for fixing problems or when the system is taken out of operation.
The overall process for system evolution includes:
Inputs. The system evolution phase starts with change requests. A change request is a record of the desired new behavior or properties, or the problem that should be fixed. It also records who is making the request, their reasons for doing so, and information about priority or deadlines if appropriate. The change request may reference incident analysis reports or other background information needed for context.
The evolution phase will take in the current development plan and the current system.
Outputs. The primary outputs are updates to the system artifacts, including updated concept, specifications, design, rationale, and verification artifacts. These artifacts feed to production and deployment phases, which result in other outputs.
The development plan is updated as a side-effect of deciding whether to process a change request, and its priority or deadlines if so.
Milestones. There is one milestone unique to the system evolution phase: the decision whether to proceed or reject a change request.
In addition, this phase incorporates all the milestones associated with the development pattern while developing a new version of the system.
Consumer software. Many consumer applications are released initially as a simple initial version, with a roadmap to add features in future releases. This approach lets the developer test the market and develop awareness of their application as early as possible, and with the least investment possible before adapting to customer needs.
These upgrades are often planned to be released on a regular schedule, with a plan or roadmap of what new capabilities will be released each time. Additional bug-fix releases are released as needed between the planned upgrade versions. These are driven by a balance between problem fixes and the roadmap, which is updated by a marketing team listening to customer requests.
Spacecraft. In most missions, the spacecraft hardware cannot be changed once the spacecraft is launched. The opportunities for evolving the system are to update on-board flight software and ground systems.
Flight software is updated for several reasons: correcting bugs found after launch, adding fixes to work around hardware problems discovered in flight, and adding new capabilities. New capabilities might include new kinds of data analysis and science operations, such as the autonomous dust devil detection uploaded to the Mars rovers [Castano06]. The project team develops and tests these software updates using simulations and replicas of the spacecraft on the ground before risking sending changes to the spacecraft. This test equipment is an important output of the original development phase. The ability for the spacecraft systems to continue functioning even after a buggy software update is also an important system property, often addressed using internal fault detection, software rollback, and “safe modes” where the spacecraft operates with only a minimum amount of well-tested software running [Wertz11, Chapter 14, p. 410].
Flight software updates are driven first by problem fixes that are needed and second by mission opportunities to use new capabilities. It is uncommon to plan to regularly produce new flight software versions during a mission.
Ground systems are easier to update, since people can access them directly. For example, a mission can add new ground communication stations or upgrade the workstations and severs in mission control. New mission planning or data analysis tools are regularly tried out during a mission. Some ground system updates are planned on a regular schedule over the course of the mission, though more happen when problems or opportunities are identified.
Some spacecraft mission systems in recent years have tackled in-flight upgrades. The GPS constellation is regularly updated with new spacecraft [Albon24]. Low Earth orbit constellations, such as the Starlink communication constellation, use spacecraft in low orbits that have intentionally limited life spans, and they are regularly replaced with newer-generation spacecraft. The System F6 project, on which I worked, looked at flying in new capabilities over time [LoBosco08].
Factory system. Consider the chemical plant example from the previous section. Over the plant’s life, there can be many reasons why the plant will change from what was originally implemented. New technology can become available that will improve the factory’s operation. Parts can wear out and need replacement, but duplicate parts might not be available any longer and a substitute must be found. The factory’s chemical process may be changing to meet new demands, leading to changes in the plant’s equipment. And finally, there will be changes to operational procedures as noted in the previous section.
All of these involve changing the design of the plant. Following the pattern for system evolution ensures that the necessary design and implementation steps are done so that the plant continues operating safely.
For example, when substituting a different model part for one that is not longer available, there are a number of questions to answer. Does the replacement part meet the functional and safety assumptions of the original? Will the replacement fit into the physical space available, and connect to other parts properly? Does it fit into the control mechanisms, both automated systems and manual control? Is the replacement manufactured with equivalent reliability, and does the supplier provide the same assurances about provenance? How do maintenance and operation procedures need to change to reflect the substitute part?
No system lives forever, and most are deliberately taken out of service when their usefulness has ended.
Most systems continue in operation until there is a decision to retire them. For some systems, this comes when the purpose for the system has been completed—for a spacecraft mission, for example. For others, it comes when the system has worn out enough that ongoing maintenance and repair costs outweigh the cost of replacement, such as for vehicles that wear out. Yet others are replaced because newer systems become available that can meet the customer’s need better.
A system being retired and disposed of typically goes through three periods. In the first period, the system is in normal operation, but the decision has been made to retire it. During this period people plan how to shut the system down and transition its functions or information. They should conduct dress rehearsals to verify that the procedures will work as expected. The system then enters the second period, where it is no longer in normal use but may remain at least partly operational to support transition and archival. Once those are verified complete, the system is shut down for the last time, is dismantled, and its resources are disposed of.
There are two primary aspects of retiring a system to consider: what to do with information or materials that should be migrated to a new system, or archived, and how to dispose of the artifacts that make up the system.
I discussed migrating into a system in Section 28.3 above. The task of migrating out of a system is part of the same process, involving developing a plan for migrating information or materials from the old system and into the new.
There will be other information that people will want long after the system has been retired, in many cases. This can include logs of system activity or user access that may be needed for later accident investigations or legal inquiries. It can also include information or materials that the system processed that are not being migrated to another system, but that may be valuable in the future. This information is moved from the working system to some kind of archive. Developing the procedures to archive the information, how the information will be organized, and the system to hold the archive requires development on its own, just as migrating information from one system to another does. This development phase involves determining what information needs to be archived and how it will be used once it has been stored, which in turn leads to a concept, then specifications, then a design.
Archived information is usually retained for a long term. If a system has been used for business or manufacturing, retention is mostly governed by regulation—anywhere from one to 30 years in the US, depending on the kind of information. Scientific and medical data is often of value indefinitely, though legal retention requirements may be shorter. Scientific data is often re-evaluated decades after it was first gathered; for example, data collected from the Viking landers on Mars in the mid-1970s was re-interpreted thirty years later after other missions gathered more information about Martian soil composition [NavarroGonzalez10]. This particular example also illustrates a problem with many data archives: the mission data were recorded on microfilm and had to be scanned to get digital data to process.
Long-term archival media often have two problems. First, the media wear out and decay over time, which has led to information believed to be safely archived to be found to be unreadable [Purdy24]. Second, even if the media are readable, there may no longer be machines that can read them. I have a number of backup tapes for which I have not been able to find a drive to try reading them.
Sometimes physical artifacts are retained from a retired system. It is common to keep parts of aircraft and spacecraft in museums after they are retired, for example.
Disposing of system artifacts can range from trivial to complex. Erasing a software application and its data, for example, is easy; once the storage media have been erased, there is no further meaningful trace of the system remaining. Disposing of a system that processed hazardous biological or chemical materials, on the other hand, can be difficult.
The retirement and disposal procedures must be secure. An unauthorized attempt to shut down a working system can cause major losses, and can lead to safety hazards. Information and materials are being moved around during migration and archival, and are potentially accessible to being copied or corrupted. Physical artifacts that are being decommissioned can carry confidential information about both the way the system works and about the customer that has been using the system.
Inputs. Retirement begins with a system in operation, along with records of its specification and design.
Some systems develop data archival, shutdown, and system disposal procedures during the development phase. If so, then these are input to system retirement. If not, then the procedures are developed during the retirement phase.
If the system’s function is being migrated to a new system, the specification and design of the new system is an input, and is used to develop a migration plan during the retirement phase. An unpopulated but functional installation of the new system is also involved.
Outputs. There are three kinds of outputs from system retirement:
Milestones. The overall retirement phase starts with a milestone decision that the system should be retired.
After that, the three threads of activity—migration, archival, and disposal—each have readiness milestones for reviewing and approving a plan for each, and a verification milestone to confirm that each was completed correctly. The disposal readiness milestone also checks that migration and archival have completed.
There is also a decision milestone to determine when the running system should be taken out of service in order to start migration to a new system and archival.
There are many different ways systems are retired. Here are three examples that illustrate different approaches.
Simple software system. When retiring software such as a workstation or phone application, the objective is to remove the software from the system on which it runs, so that none of the software or its related files remain. This is typically done by running an uninstall program that is set up to remove any files that were added on installation, plus any internal files that might have been created (configuration, logs, caches). This uninstaller is typically developed as a part of the application and packaged with it. In some cases, the software can be disposed of by erasing the storage devices that held the software and its related files.
Sometimes retiring an application means that the server on which the software was running is no longer needed, and so the server can be retired. Disposing of the server is similar to disposing of a vehicle, as discussed next.
Vehicle. Retiring a vehicle, such as a car or aircraft, involves getting rid of the vehicle’s physical parts while recovering as much value from the parts as possible. At the same time, records about the vehicle are retained for longer in order to meet financial record-keeping needs as well as supporting analysis of maintenance or reliability for other similar vehicles.
The overall process is:
Spacecraft disposal. The objective when retiring a spacecraft is to ensure that it will pose no future hazard to the Earth, other spacecraft, or other bodies. Some of the most important hazards are impacting the Earth and causing damage or injury; colliding with other spacecraft; or contaminating other planets or moons that potentially carry life. Collision can occur either with the whole spacecraft, or with fragments of it if the spacecraft breaks up on orbit. Interfering with radio spectrum is another, though lesser, hazard.
There are four approaches usually used to retire and dispose of a spacecraft.
If a spacecraft is going to remain in orbit after its useful mission is complete, such as if it is being placed in a parking orbit or being left in a low decaying orbit to enter the atmosphere passively, then regulations require passivating the spacecraft. This involves removing any energy that could cause the spacecraft to explode, change its orbit, or activate radios—eliminating ways that it could cause collisions or interfere with communications. This typically involves venting any fuel and other gases or fluids and permanently shutting down any electrical systems.
All of these disposal approaches can experience problems. A spacecraft may lose its communication capacity before passivation commands have been sent to it. Thrusters may fail, interfering with the ability to put the spacecraft into an orbit that will enter the atmosphere or impact as planned. The design of the disposal methods must account for these potential problems, and safety analyses must show that the spacecraft and its procedures will avoid the identified hazards with acceptable likelihood.
The NASA life cycle standards require that a mission develop the plan for retiring and disposing of a spacecraft during the development phase [NPR7123]. This includes the plan for how the spacecraft will be disposed of, including meeting safety requirements. The plan must also include the procedures for archiving all mission and project data.
When a project ends, there are three objectives: completing obligations and support for stakeholders (Section 16.2); saving information and artifacts that might be needed in the future; and releasing resources that the project used.
Note that ending the project is separate from retiring any particular instance of a system. Ending the project is about stopping development and support for a system product, independent of whether there are instances of that system in operation or not. Some projects will combine these, such as for exploration space missions that build and fly one spacecraft.
A project might end for one of many reasons. It might have a fixed term or have completed a defined system deliverable. It might run out of money or time. It might no longer fit the organization’s or funder’s strategy, perhaps because a better replacement system is planned. Competitors might have won over customers and there is no longer demand for the system. The team might be unable to deliver, with the project behind schedule or over budget or lacking key features.
The first step is a decision to wind down the project. This is typically a decision made by the organization that hosts the project, or its funders; the project staff generally do not make the decision on their own.
A decision to end the project is followed by a plan for how to do so, which defines the steps the team will take to meet the final objectives. The plan typically gets review and approval before proceeding. (In some environments, at least part of the plan must be worked out early in the project, long before any decisions are made.)
The following sections list some of the steps that are involved in ending a project. The specific steps will depend on the project; for example, not all projects have contracts with funders that must be closed out. The team can use this list to help build the plan, bearing in mind that some steps should be done in particular orders. Customers should be notified of the project’s impending end before ending contracts; shutting down production should happen only after all upgraded components have been manufactured.
Obligations to customers. If there are any system instances still in operation, the first step is to let those customers know that the project is ending. If system instances are owned and operated by customers separate from the project, then they will want to work out how to keep their system in operation after project support ends or they will decide to retire the system. The terms on which the system is licensed may affect what the customer can do after the project ends.
The project develops any final updates or fixes for the system, and releases them for deployment along with appropriate documentation and training. The project builds or acquires such spare parts inventory as is needed for remaining customers before shutting down production.
If there are contractual relations with the customer, the contract is closed out. This might include final billing or payments, or other deliverables.
Finally, the customer service mechanisms are shut down.
Obligations to team. Ending a project means loss of work for everyone working on it. It can also mean the loss of social relationships.
The first obligation to the team is to keep them informed once the decision has been made to end the project. The people should understand why the project is ending, the plans or timeline for winding down, and their roles during that time.
People will be needed on the project for different lengths of time. Some roles will end shortly after starting to wind down the project, such as doing development of new system changes. Other roles will last to the end, such as closing out finances and contracts. Each person needs an expectation of how long they will be needed so that they can make plans for what to do next. (In some jurisdictions, notices of layoffs are required well in advance.)
At the same time, many people will have incentives to move on to something else before their project role is complete. The plans for ending the project must take this into account, and often include incentives for people to stay on as long as they are needed.
Finally, the team’s experience represents an asset. These people can be a resource to other projects in their organizations. Helping people transition can help other projects and, done well, generates good will that helps incentivize people not to leave early.
Obligations to funders. Some projects will have contracts or other agreements with funders. These projects provide final reports and other deliverables to the funder. They can then finalize financial accounting with the funder and close out the contractual relationship.
Obligations to regulators. Some projects for systems in highly-regulated industries may need to work with their regulators when the project is shutting down. This might include filing notices that the project is ending. The project is responsible for determining what other requirements their regulators may have.
Obligations to organization. The project takes two final steps: saving information and releasing resources.
There are several reasons that information about the project may be needed in the future. There may be a need to restart the project, in which case the new team must be able to learn about the system’s design and implementation, as well as the reasons behind its design. The intellectual property in the system may be valuable for licensing or sale. There may also be investigations related to the system or the project that need information about how the project was conducted.
The project may archive the artifacts needed to restart the project. It may also archive records of project execution, known issues, and any plans that will not be completed. Some projects will archive physical artifacts: molds and forms that support production, for example; some artifacts may be kept for museums.
The end of the project is time to gather a retrospective on how the project went. A bit of introspection about what went well and what didn’t will help people on the team to do better on future projects, and helps build institutional knowledge.
Archiving project information has security concerns. The process of moving information to an archive must maintain the information’s integrity and confidentiality: it must not be modified, lost, or disclosed during the move. After that, the archive must maintain the information’s integrity and confidentiality.
The project also releases the resources it has held. This includes:
Lastly, the people on the team will move on as discussed above.
Some projects end because they are canceled, even before they have completed their development phase. Anecdotally, it seems that more projects are canceled than go to completion—this is a consequence of using competitive approaches to programs, and the net effects of competition are generally regarded as valuable. The information in this chapter applies to canceled projects just as to other projects.
Consider two examples, based on projects I have worked on.
In the first project, the team was writing a proposal for a US DoD spacecraft system. In the proposal-writing phase, the team has to establish the basic architectural and management approaches for the project, show they meet the department’s needs, and establish the price at which the team proposes to build the system. The team progressed through establishing the initial concept and architecture for the system, and we began evaluating the solution to see how good a job it would do for the customer and how much it would cost to build it.
We had a checkpoint milestone where we reviewed what we had found. At that review, it became clear that while our team had a decent solution for the needs, we did not have a great solution, and that other companies we expected to propose designs would likely have better solutions (because they had more experience in a couple of key technical areas). We made the decision not to pursue the proposal.
This was a good decision. Assembling a proposal is not a small task; we had a team of about 15 people working long hours. For US government projects, the proposer generally pays for the proposal development. Choosing to spend our team’s time and money on this project meant that the team couldn’t work on some other project. We judged that the opportunity cost was not matched by the probability of successfully winning the contract, so we freed up the team to work on a different system that did prove successful. If we had continued to work on the original proposal, we would have spent the budget available to develop proposals and could not have spent it on the proposal that succeeded.
In the second example, a different US DoD spacecraft program, the team was about two years into a multi-year contract. The team had performed excellently in a competitive first prototyping phase, and was the only team to be selected to move on to a second phase for building an initial working version. A key subcontractor on the team had staffing and management problems, and were not delivering results. Within the team we were struggling to fix the execution problems or find another way to build the necessary components, all the time keeping a large staff on payroll and running through budget. While the technological solutions for many system capabilities were probably sound, the team could not deliver. The customer observed the problem, and after working with the team to try to resolve the problems, went through the process to cancel the project.
This was also a good decision. In hindsight, the team lacked necessary capability in the subcontractor and in the project management team. If the project had been allowed to continue, it is unlikely that the team would have solved the problem and more money would have been spent without benefit in the end.
The take away from these examples is that there are many sound reasons for canceling a project. Sometimes the cancellation is designed in (as with competitive acquisition); other times it is because continuing to invest money, time, and the care of the team building the system has become unlikely to pay off.
For a more general discussion of US DoD project failures, see the report by Bogan et al. [Bogan17].
The last several chapters have presented a reference life cycle pattern. This pattern is intended to inspire thoughts about how a project can organize its own work. It is not in itself ready to use off the shelf. Each project will have its own needs, its people will have preferred ways to work, and some projects will have life cycles mandated by regulation or industry standards to follow.
XXX purpose of the life cycle – get a system built, deployed, and sustained that meets customer and other stakeholder needs – in doing so, develops the whole system including all its development artifacts, not just the end product – plea for thinking about flexibility with discipline, and the idea of the artifact edifice
The reference pattern does not discuss the roles involved. A full definition of each phase will include definitions of who performs different tasks in each phase, and in particular who is responsible for milestones. I argue elsewhere (e.g. Section 8.2.6) that, to be meaningful, reviews must be done by people with an independent perspective on the material being reviewed, and that approvals must reflect a check on the work fitting into the project’s big picture.
The objective for any project is to develop and adopt a life cycle that meets its needs. In the next section I discuss several principles that a good life cycle will follow; these can help people evaluate a life cycle they are considering. Some other considerations include:
The project works out what life cycle patterns it uses, and documents the patterns. This effort starts during project preparation (Chapter 25). It does not necessarily need to define the entire life cycle all at once; it can be done iteratively, as long as the work keeps ahead of what the team needs. In practice I have found that enough should be completed in the project preparation phase that the team understands the general complexity of the work ahead, has chosen a development methodology, and can name major milestones they will need to meet. The remainder of the life cycle can be worked out during purpose and concept development and likely will be refined or adjusted as the project moves along. (I worked on one project that had a limited budget, and spent much of that budget writing elaborate management and engineering plan documents before even beginning to work out the high-level system concept. The result was a pile of such documents that were never looked at again, which was a waste of their efforts.)
In no case should the team get ahead of the defined life cycle. See Section 8.1.5—Principle: Team habits for a discussion of this principle.
The life cycle patterns have value only if the team actually uses them. This means that the team must know that the patterns exist, understand them, and agree that they are useful. The people in the team must also understand that they have a responsibility to follow the patterns, or to raise an issue when they find a problem with the life cycle’s definition. Achieving these means educating people as they join the team about what the life cycle is and how to learn about it, as well as monitoring that everyone actually follows the patterns. The team can also learn about and accept a life cycle a bit more easily if they are involved in developing the patterns; at minimum, they should be able to give feedback before the patterns are adopted.
The life cycle patterns are documented in a way that the team can find them and learn about them when they are joining the team and when they need to refresh their understanding of how some step works. The documentation is an artifact that should be managed using the principles in Section 17.4: it should be versioned and under change management; it should be stored in a way that the team can find it when needed; and it should be secure enough that it will not be tampered with.
There is no one right way to document them, as long as the documentation is well-organized and accessible. Some organizations prefer to define the life cycle in a prose plan document, which can be printed in its entirety if needed. I have had some success maintaining the documentation in a wiki or in a collection of web documents; the advantage of these is that they allow linking between parts of the document. The patterns should be explained and listed explicitly; they should not be hidden in a workflow system that doesn’t let team members see and understand the whole context for their work (see Section 4.6 for an example).
The documentation for each phase or step in the life cycle should include the information listed in Sections 21.5, 21.6, and 21.7.
In Section 21.10, I listed principles that a life cycle pattern should meet. The reference life cycle pattern in this part reflects these principles, though it cannot address all of them. Here are ways that a life cycle built using this reference as a base can address them.
Know the purpose for something before developing it. The development phases in the reference life cycle all start with a purpose development step, in which the purpose for the system, component, or feature gets worked out before proceeding on to concept and design. The system evolution phase reiterates these patterns.
The project preparation phase is a time to think about the purpose for the project as a whole, and to work out the purposes for the different aspect of project operations; for example, what the team organization should achieve, or what is expected of life cycle and procedures.
Documenting these purposes means that when the questions are revisited—and they will be—people can understand the reasons why decisions were made, instead of forgetting why and making up new and probably different reasons.
A good life cycle definition will ensure that these phases have review and approval milestones that check that the purpose has been worked out and documented.
Build in time for and incentivize deliberative thinking. The concept steps in development and evolution support this kind of deliberation, as long as the team culture actually incentivizes taking the time to work through a concept deliberately.
The procedures and instructions for reviews complement the life cycle patterns by prompting reviewers to ask questions about deliberations taken, and encouraging them to reject work that has not been thought through. Again, projects that are in a rush will tend to disincentivize this, usually storing up trouble for themselves for later. The project leadership can create an example and incentivize taking enough time to think.
Assign decision-making authority to an appropriate level based on the nature of the decision. The reference life cycle does not address this as written. The structure of the team, and how roles are organized in the team, complement the life cycle and determine how authority is distributed. The specific decisions about what role can take what decisions is encoded in the details for life cycle phases and in the procedures that apply during those phases.
For more details, see ! unknown reference XXX.
Build in ways to check work, and design them so they are a team norm and not prone to triggering defensive reactions. The reference life cycle includes reviews at regular points in the work in order to support this principle. The definitions of procedures for reviews augment the life cycle by making it clear what is to be reviewed and how people are to go about the reviews.
Build for the longer term. The reference life cycle supports this somewhat by providing development steps when thinking about how to design for the long term and when documentation to support future revision can happen. A project can define more specifically what kinds of documentation is expected from development phases, and review procedures can make it clear that such documentation must be provided before a piece of the work advances in its development steps.
Project-wide decision points. I have pointed out some times when the life cycle might have review and decision points, such as after purpose development, during concept development, and in the acceptance phase at the end of development.
Think about exceptions that might happen, how to handle them, and when to change course. I have not tried to address this principle in the reference life cycle. Working out how to handle exceptions is a process like designing for safety or reliability (Chapter 43): working out the kinds of hazards (exceptions) that might be foreseen, then deciding what should be done about each one. The particular kinds of exceptions depend on the project: a delay in getting a new funding round affects a project in a startup but not a small project in a well-funded organization, for example.
Some kinds of exceptional conditions are not really a matter for the life cycle, but rather for the development methodology, procedures, and planning approach that the life cycle patterns organize. Risk management (! unknown reference XXX) and the way that planning accounts for uncertainty (! unknown reference XXX) are ways to anticipate specific exceptional conditions and, in many cases, avoid them.
The choice of development methodology affects how easily the project can adjust when it needs to change direction (Section 27.5).
Define the work so that everyone on the team can agree when a step has been completed. This is achieved by clearly documenting each step or phase in the life cycle.
Give a clear definition for each step of the quality considerations by which the work can be judged. Similar to the previous principle, this is met by the documentation for each phase or step.
Make the pattern as light-weight as possible without compromising quality. The reference life cycle in these chapters is only a skeleton of a complete life cycle definition. I believe that everything in it is necessary for most projects, though some projects will likely be able to trim out some parts. As long as a project’s life cycle does not add too much to this reference, the life cycle itself will likely be acceptably lightweight.
There are three ways that I have seen a project end up with a too-heavyweight process. One is to add too many new phases or steps to the life cycle, to the point that people in the team have trouble figuring out where the project is and what steps they should be doing. Another is to make the work inside one phase too complex: adding more reviews than the minimum necessary, for example. The third is when the procedures that say how to do parts of the steps get complex. In Section 4.6, I discussed how complex one procedure (in this case, for qualifying component vendors) caused problems for a large launch vehicle project.
As many people are familiar with the NASA life cycle, or may be obliged to use it (or a variant of it), in this section I discuss how the canonical NASA life cycle compares to this reference life cycle. I will use the general NASA life cycle defined in NPR 7120.5 [NPR7120, Figure 2-5]. I presented an overview of this life cycle in Section 23.2.1.
The NASA life cycle is divided into seven major phases:
This life cycle was developed over several decades as NASA learned how to develop and operate complex missions. Elements of this approach have been adopted by many other organizations—terms like “System Requirements Review” and “Preliminary Design Review” have become nearly ubiquitous in the aerospace industry.
The overall flow of the NASA life cycle is organized around two constraints: fitting in with the US Federal funding cycle, and managing risk for a few highly expensive steps. The funding constraints come at the transition from Pre-phase A to Phase A, when the mission is approved and funded enough to develop its concept, and between Phases B and C when the agency commits to funding the full mission [NASA16, Section 3.5, p. 25]. The distinction between Phases C and D comes with Phase C covering development of designs and fabricating components, but actual assembly of a spacecraft does not start until Phase D, at which point there should be little residual risk that the system design will not work out.
The NASA approach was, however, developed for hardware-heavy systems and people who today develop spacecraft or aircraft that have a greater amount of software components sometimes find it difficult to map software project best practices onto the NASA approach. There are usually two issues: software development best practice puts integration earlier than the way many people interpret the NASA model; and many software developers combine design and implementation, especially for novel software functions. I show one way to reconcile these approaches in the mapping in this section.
The reference life cycle I have presented is organized around types of work—conception, specification, design, and so on. The NASA life cycle is organized at the highest level around milestones that check progress early, allowing corrections before committing agency resources. This means that the NASA life cycle splits several of the early phases in the reference life cycle in two, with a major review or checkpoint of the project’s progress before continuing. These two approaches are compatible: almost every project will have some kind of project-wide milestones alongside the milestones specific to the work phases.
In the following, I present how each of the NASA phases maps to the reference life cycle.
The reference life cycle defines the project preparation phase and project support “phase”. The preparation phase involves a rough definition of the project and establishing basic operations abilities. Project support covers support functions, like managing teams, finances, or artifacts.
In the NASA environment, the initial support is provided by one or more agency centers and external collaborators, using budget, tools, space, and people for general concept exploration. Each center has its own procedures for starting up a concept exploration project.
Similarly, the NASA agency provides essential support services to its projects.
In one project I worked on, the NASA Ames Research Center had a Mission Design Center that was charged with exploring potential mission concepts. A small group developed the mission idea and explored ways it could be realized. Ames and the agency provided all the key support infrastructure: staffing, finance, office and lab space, and IT services, for example.
The Pre-phase A work develops a concept for a mission, presumably in response to NASA agency priorities. It is expected to limit its work to the concept of a mission: what it might achieve, who would benefit from the mission, and high-level technical approaches that might support such a mission.
There is one major review in this phase: the Mission Concept Review (MCR). This checks that the potential mission is well formulated and that there is sufficient interest to justify funding “project formulation”—working out a detailed concept and high-level design.
At the end of Pre-phase A, after the MCR, the agency makes a decision whether to continue the project and fund it for “formulation”: the phases where the concept and high-level designs are worked out. This involves greater financial commitment than the early studies, and is the start of the “real” mission.
Pre-phase A maps to the purpose development phase (Section 27.3) and part of the concept development phase (Section 27.4). The purpose development phase covers identifying what the mission might do, and who the mission stakeholders might be. The concept development produces an initial sketch of a mission concept, without breaking the concept down into great detail.
This phase is the first of two that are about developing a feasible high level design for a mission and ensuring that necessary technologies are available. Phase A includes developing a complete mission concept and high-level system designs. The team identifies any new technology that the mission will require and works out what will be needed for it to be ready to use in flight.
The depth of design and requirements is not clearly specified in the NASA procedural documents. However, my experience is that it is generally taken to include the spacecraft and its major subsystems, ground systems and their major subsystems, potential launch vehicles, and testing and other ground support equipment to a similar level. The exercise is intended in part to develop the general structure of the system and its likely cost, and in part to find those parts of the system that will require new technology.
Phase A includes developing a list of new technology that will be used for the mission, an evaluation of its maturity, and plans to develop that technology so that it will be mature enough for flight.
This is the first phase where a NASA project is funded for itself, as opposed to using resources allocated for general mission concept development. The various management and development plans required by NASA procedures get developed in this phase.
Phase A includes two key reviews:
The NASA Phase A maps to the second part of the concept development phase in the reference life cycle, along with concept and specification and preliminary design steps for the highest-level components in the system.
Phase B continues the work from Phase A, completing a preliminary design and refining any new technology to the point where it is sufficiently mature to use in flight. This often involves building models and prototypes of parts of the system.
Phase B also involves developing the safety and security of the mission. The high-level design should incorporate designs for safety, security, and other critical mission success factors, and the design should be backed up by analysis showing why the design is sufficient. (See Chapter 43 for more on safety design.)
At the end of Phase B, the project should have a high-level design for the entire mission. That design should meet all the mission objectives, be technically feasible, and fit within cost and schedule available.
After Phase B, the agency allocates money to actually implement the system. The process can be complex and time-consuming, potentially involving legislative approval. The estimates for cost and schedule should be accurate enough that the project is unlikely to exceed them, which would require repeating the process to find more funding or time. This imposes limits on how much risk the project can carry going from Phase B to Phase C.
There is one key review in Phase B:
The end of Phase B maps to a slice across the development phase in the reference life cycle. It includes the concept, specification, and preliminary design of the first two or three levels of components in the breakdown hierarchy (Section 11.3; Chapter 38). In general this might include the major spacecraft subsystems—payload, structure, propulsion, attitude control, and so on. The portion of the design step includes prototyping or modeling of components that pose technical risk, and the design may go to deeper levels of the breakdown hierarchy if needed to understand and address that risk.
The mission-level PDR follows reviews of the component-level preliminary designs.
This phase is when most of the development and production work is done. It involves designing, building, and verifying all the components in the system, to the point where they are ready to be assembled into the working spacecraft and ground systems.
Phase C is designed around the spacecraft being difficult and expensive to assemble, involving building large structures, using complex manufacturing tools, threading complex wiring harnesses through the structure, and putting large amounts of money at risk during the assembly. This leads to organizing the final assembly work to avoid as much risk as possible by ensuring that all the components are ready to assemble before committing to the final assembly steps.
During this phase, the team completes all of the designs and implementations of the system components, and verifies all of them. This usually includes producing engineering and qualification units of hardware components (Section 27.8) for testing, including destructive testing for some parts. It also usually includes integrating all of the engineering or qualification units and the corresponding software into a testing version of the entire spacecraft in order to verify the entire integrated system.
Verification in Phase C typically includes verifying the human interfaces in the system. Can an operations team use the ground systems to accurately control the spacecraft, using simulated telemetry showing the spacecraft in different conditions (including off-nominal conditions).
Phase C is typically divided into two parts: the first part for completing all the designs, and the second part for implementing and producing the components. The Critical Design Review separates the two parts, where all the designs are checked.
I have seen the Critical Design Review milestone cause confusion: how far should work progress before the CDR? What is the boundary between “design” and “implementation”? For hardware components, such as an electronics board, engineers work on the board design: the layout of the components and traces that will be fabricated. The NASA CDR definition ([NPR7123, Table G-7, pp. 113-4]) indicates that the CDR should include “integrated schematics” and “fabrication, assembly, integration, and test plans”, which would indicate that the board design is complete. That the document also indicates that the CDR and Production Readiness Review are often coupled lends credence to the interpretation that the CDR reviews the board designs.
If this same interpretation were applied to software, it would imply that the software would be essentially complete by CDR. Software source code is the equivalent of electronics board design: while it is thought of as implementation, it must be processed through a build system to produce the actual executable software, just as a board’s design file is used to manufacture the boards.
However, the NASA Systems Engineering Handbook states that the CDR for a software component should occur “prior to the start of coding of deliverable software products” [NASA16, Section 3.6, p. 29]. In other words, the documents appear to disagree, though NPR 7123.1 is presumed to have precedence.
Further, software is often developed iteratively, implementing one version after another, each version adding some amount of functionality over time. Some lower-level functionality is left as a mockup, perhaps not even fully designed, until some of the higher-level integrated functionality has been implemented and verified (the idea of integration-first development (! unknown reference XXX), done to reduce risk as quickly as possible). Software development best practice also has verification proceeding continuously throughout implementation, with feedback to the implementer as early as possible. This often implies having some of the hardware components built and available for testing the software before the software is completed.
An official answer to how a team should resolve the discrepancies and interpret the CDR for a NASA project will have to come from the relevant NASA authorities.
However, in practice, I have found that focusing on the review before implementation is more useful for components in the upper and middle levels of the breakdown hierarchy. For example, this might include the major components with a subsystem, such as power distribution or generation within the electrical power subsystem, or attitude control algorithms in the guidance, navigation, and control subsystem. Components at these levels realize the important relationships between components in the system structure (Section 12.2) and the way components work together to produce emergent properties (Section 12.4). Analyzing these designs allows one to check whether key system behaviors will be met, and that properties like safety or reliability are handled correctly. These are the properties that are difficult to change if the implementation is found during verification not to meet them. The design and implementation of low-level components should be reviewed, but as long as there is an obvious, low-risk approach for them their review need not block the design reviews of the system as a whole. This interpretation, of performing the critical design review before implementation, means that the team is then free to implement software components incrementally if that is the best approach for that part of the system.
There are three reviews in Phase C:
The NASA Phase C maps to completing the development phase, the acceptance phase (Section 27.9), and the system production phase (Section 28.1) in the reference life cycle.
The CDR milestone maps to a slice through the system and component development phases, at the end of the design step for most or all of the components. The PRR for a component is equivalent to a review at the end of the production unit development step (Section 27.8). Note that the reference life cycle has a manufacture and deployment check milestone in the acceptance phase; this applies when the entire system is manufactured together, rather than the model implied in the NASA life cycle where different hardware components go to production individually. Finally, the SIR is equivalent to the deployment readiness review that is at the beginning of the deployment phase (Section 28.3) in the reference life cycle.
Phase D covers the work between the end of designing and building all the parts and having a spacecraft on orbit ready to begin its mission proper. This includes assembling the spacecraft and ground systems, and verifying that they work (and work together). The verification typically involves testing the assembled spacecraft in vacuum, under strong vibrations, and in thermal environments equivalent to what it is expected to handle in flight—but not testing beyond those levels, in ways that might damage the vehicle. After testing, the team proceeds onward to integrating the spacecraft with its launch vehicle, final preparations, launch, and starting operations on orbit. The team on the ground finally checks the spacecraft out before declaring it ready to begin its mission.
Some missions build a second copy of the spacecraft to be used on the ground for debugging issues with the one in flight and to test possible commands before sending them to the operational spacecraft. The duplicate is typically assembled in Phase D. It might use qualification units for hardware that were used for testing in Phase C, rather than flight-ready units.
There are several reviews in Phase D. All of them are final checks that some part of the mission is ready for taking an irrevocable step. These include:
The NASA Phase D maps directly to the deployment phase in the reference life cycle. It takes in manufactured components and procedures, assembles them into a working system, tests that it has been assembled properly, and starts it in operation. The milestones in the NASA Phase D are different from the deployment phase milestones mainly because they are specific to launching a spacecraft.
In this phase, the team operates the mission through its end.
There are two kinds of reviews that occur in Phase E:
Phase E is equivalent to the system operation (Section 28.5) and evolution (Section 28.7) phases in the reference life cycle. The Decommissioning Review is equivalent to the decision to retire the system at the beginning of the system retirement phase (Section 28.9).
The final phase in the NASA life cycle involves retiring and disposing of the flight systems, retiring or releasing ground systems, archiving mission data, and closing out the project.
There is one review identified in the NASA life cycle:
This phase corresponds to the system retirement (Section 28.9) and project ending (Chapter 29) phases in the reference life cycle.
The first part of the life cycle: purpose, concept, and specification.
work to build a concept
XXX rewrite to reduce use of conops term
XXX harmonize stakeholders with list from earlier chapters
XXX harmonize concept contents with earlier chapters
XXX pull out document management section
XXX split purpose and purpose investigation out from concept
XXX discuss internal and external view, consistent with section 26.4 on concept development phase
At the very beginning of a system development project, there is generally only a rough idea of what the system should be. The understanding of the system objectives is too vague to launch into development or writing tests right at the start.
The purpose of developing the initial system concept is to get a reasonably clear initial definition of what the system should be or do. The definition should accurately reflect what the customer needs (whether the customer is an actual customer or a representative of an expected customer). The definition need not be perfect; it will be revised as the project moves forward and the initial concept gets validated or the understanding of the customer’s needs improves.
Internal to the project, the initial concept provides the information that the team needs to begin working out the structure or architecture of the system and to begin writing the high-level system specification. In the absence of a well-structured concept, the development team cannot begin working out how to implement the system without taking risks that the design is wrong and will have to be redone.
Outside the project, concept development supports the relationship between the project and the customer. Clear agreement between the customer and the development team makes for efficient and less fraught development. Agreement on the concept can support writing a contract for development that protects both the development team and the customer from feature creep or extra costs.
The documentation of the concept will be used throughout the entire life of system development and operation. The concept is the record of the big picture for the system. While it gathers the information needed to guide system design, the big picture is also important to new people joining the team who need to learn about the system they will be working on. The concept also serves management as a definition of the goal that system development is trying to reach, allowing the team to check from time to time whether they are designing and building the right thing. In that way a good definition of the concept is essential to being able to validate the system design.
The concept development work in a project seeks to establish a clear statement of what the system should be and do. At the end of the work, there should be a record of the customer’s objectives for the system, a concept of operations that embodies those objectives and explains what the system is, and review and approval from both the customer and the project leadership.
The core of this is to determine and record what the customer wants. This is rarely an easy task: there may or may not be a well-defined customer, and the customer may or may not be able to articulate what they need and what they want.
Start with working out who the customer is. Some projects are customer driven, meaning that there is a customer for the system and the project is working in response to that customer’s needs. A customer who contracts with a development team to build a system for them is the simplest example of this kind of project. Other projects are RFP driven, meaning that there is a customer but they are asking one or more teams to propose a system design before committing to funding development. These projects differ from customer driven projects in that the customer provides a request for proposal (RFP) that should contain all the information needed about what the customer wants. (In practice the RFP is rarely sufficient.) Still other projects are visionary, meaning that there is no specific customer driving what the system should do, but instead the system is being designed in the expectation that there will be customers for the system in the future. This kind of project includes innovative systems that are expected to help create a market for themselves.
Every project should document who their customer is and how they work. Where there is an actual customer, this should include information about how they make decisions about systems, who the decision-makers and influencers are, and any contacts that might be able to provide background information or advice about the customer. For visionary projects, common practice is to develop a profile of one or more hypothetical customers, including how they are expected to decide whether to acquire a system and where information can be found to characterize these potential customers.
A document recording what the customer wants is the first major work product in the concept development phase. This document serves as the primary record of what the customer has asked for. It is used in later work to check whether the interpretation that the development team puts on what they think they have heard from the customer is accurate. It is vital that the customer objectives document be free of biases that the project team brings, because this document is used to detect when the team have applied their biases.
The customer objectives document, therefore, should be written using the customer’s own language, and should only be a summary of what the customer has said. One way to achieve this is to collect primary source material from the customer—such as notes or recordings from meetings, a request for proposals, or externally-sourced market analyses—and then write a summary of their contents. If possible, the summary document should include references to the primary sources so that someone can check the summary for accuracy. Writing the objectives document as prose, or as a bulleted list, is reasonable. The customer objectives document should not be organized into formal requirements; that step comes later and formal requirements should derive from the customer objectives document.
Where possible, the customer objectives document should be shared with the customer (or representative potential customers) to validate that the document is accurate. It is normal for there to be many iterations with the customer to get the details right; indeed, that is the point of gathering the customer objectives.
The next major document, or collection of documents, records other constraints on the system’s design. This includes things like
This document of other constraints should reference the source material for each kind of constraint.
With the customer definition, customer objectives, and constraints documented, the team can develop a concept of operations, or CONOPS. This document records a simple model of how the system can be structured and operate at a high level. The contents of the CONOPS should be limited to how the system will interact with things outside itself, whether that is the function and structure that the customer will see, the interactions with other systems, or its interactions with regulatory organizations. The CONOPS should be fairly brief, and diagrams are helpful. The CONOPS should not become a specification of the system’s design; again, the specification and design are derived from the CONOPS. The CONOPS document is often written in the language that the development team uses. It is common to use graphical notation standards for some elements where appropriate.
The contents of the CONOPS should derive from the customer objectives and other constraints. A good CONOPS document will include references to those sources to show why the concept is structured the way it is.
At the end of the concept development phase, the team will have gathered information about what the system needs to be, both to satisfy the customer and to satisfy other stakeholders, and created a conceptual version of the system’s functions in the CONOPS.
The CONOPS should be shared with the customer for their review and approval. Because the CONOPS document is written for the development team using the team’s language, it is often necessary to interpret the contents of the CONOPS to the customer. The customer should validate that the functions included in the CONOPS cover everything they are expecting. The development team’s management should also review and approve the concept material to validate that it meets organization policies and regulatory needs.
The goal for the initial concept development phase is to have a clear but informal understanding of what the customer wants, what constraints other stakeholders place on the system, and agreement from all the parties that the documented understanding is correct, so that the team can move on to formalizing the design of the system.
Put another way, the system design efforts depend on knowing what the system is supposed to do. The amount of effort put into system design should be low until the team has confidence that they understand what the system design should do.
Specifically:
The concept for the system will change over time. This can happen in the early stages of a project, when one is still working with the customer for the first time to understand their objectives. It also happens later: when the customer’s needs change, when the team realizes that they have misinterpreted the customer’s objectives, or when regulation changes.
In general terms, while working to develop the system concept for the first time, any changes can be made as needed. At some point, the initial versions of the concept documents will be “done”, reviewed, and approved. The documents are then baselined: marked as stable, meaning that people can use the information in the baselined concept documents to develop system architecture and specifications without worrying that the documents will be shifting on them all the time.
After the concept has been baselined, the team must follow a more careful process to make changes. First one identifies what has changed: a change in the customer objectives, or in constraints such as regulations. Next one analyzes the change to determine where the concept documents need to be revised. The revised document versions exist in parallel with the baseline version, but only the baseline version is official until the revised documents are reviewed, approved, and marked as the new baseline.
When the baseline version is updated, changes may need to propagate to other documents. For example, the system architecture and specifications derive from the concept documents. A change that adds a function to the system concept in the CONOPS document will induce changes to the architecture (possibly adding new components) and specifications (adding functional requirements to some components‘ specifications). If the change happens late in the project, when parts of the system have been implemented and verified, the change may propagate all the way to updated software, hardware designs, and test cases.
This explanation of the process simplifies the steps somewhat: individual documents are versioned or baselined. A change to the customer objectives results in an updated version of the objectives document. This leads to an updated version of the CONOPS, which in turn can lead to a need to review and approve the updated CONOPS, as shown in the diagram above. The new versions of the documents should not be baselined until they have collectively been reviewed and approved. They should be baselined together, so that the official, stable versions of all the documents remain consistent with each other. This means that updated versions should remain work in progress until the review step has been completed.
Concept development is all about knowing what the customer wants. But who is the customer? How does one learn what they want? There are multiple answers to these questions.
We can divide projects into three general groups:
Customer-driven projects are those that have a specific customer whose needs or desires drive what the system should do. The development team is focused on satisfying this specific organization.
In these projects, the development team can communicate with the customer to learn how the customer works and what their needs are. Ideally, the customer will be involved all through system development, so that the team can check what they are building directly with the customer.
The Agile Development approach to software development grew out of customer-driven projects. The Agile approach advocates for the customer being continuously involved, including helping to prioritize work in each development sprint. This kind of direct involvement is only possible when the development team can interact constantly with the customer. Note that we do not advocate dogmatic Agile Development (nor do we think many projects actually use it); we will discuss this more in chapters on validation and management.
Some customers will begin working with the team with only a general concept in mind. The team will need to draw out from the customer what that concept means, and explore all the corner cases with them.
Other customers will bring a partially-developed concept of what they want. In these cases, the team must, first, ensure that they properly understand what the customer is saying, and second, explore the concept with the customer to find any missing information.
Working with a customer on the concept is full of pitfalls. The most serious is that the team will interpret what the customer is saying in a way that the customer does not actually mean. The development team can bring their own interpretations to understanding what the customer says; the result can be a system design that doesn’t meet the customer’s actual needs. The team should use communication techniques that allow them to validate their understanding of what the customer is trying to say, such as active listening ! Unknown link ref methods. The key ideas in these techniques are
There are many references available on these techniques.
The concept needs to include everything that the customer actually wants. Most commonly, the customer will be thinking about the most important functions they need but will not be considering all of the other functions that are needed to make the important functions work.
The team needs to work with the customer to elicit these other functions or use cases. While there is no recipe for finding all these other use cases, we have found that there are some questions that help ferret them out.
An RFP-driven project is one where a customer is asking for proposals from development teams about how they will design and build a system. The customer is usually asking for multiple, competing teams; the customer will choose one or more teams for a contract to build the system.
The customer writes a request for proposals (RFP) document that defines both the characteristics of the desired system and how the customer expects to judge between multiple competing proposals, if there are any. The RFP should thus document the customer’s objectives. In many competitive acquisition cases, the RFP must be the only official source that a team can have so that all proposing teams work from the same information—thus treating all teams equally.
When deciding to respond to an RFP, the team must learn what acquisition rules the (potential) customer is using in order to determine what restrictions to follow when communicating with the client. The team must also learn how the customer makes decisions, including who makes the decisions, who influences the decisions, and how the decision will be made. When responding to a commercial RFP, this can be easy: there is a contact who sends out the RFP and who can answer questions as needed, there is someone they work for who reviews and decides whether to accept a proposal or not, and the decision is based on what the decision-maker thinks meets their needs at the best price. For a US Government agency RFP, on the other hand, the decision process is defined by Federal Acquisition Regulations and by the agency’s supplemental rules. There are formal processes for submitting questions; there is typically a defined scoring and weighting system that a formal review team must use to rate each proposal.
The information gathered about how the customer communicates and makes decisions should be included in the Customer Definition document.
When the customer is doing a competitive acquisition, the team also needs to gather information on the other teams that may be choosing to submit a proposal. This information helps shape the proposed design and the proposal itself to make them look better than the competition to the customer. This can include relative strengths and weaknesses of the other teams, such as whether this team has proprietary technology that will do a better job for the customer (a weakness of the other team), or whether the other team has more flexibility in pricing (which might be a strength of the other team). This information should be gathered into a Competition document.
In practice RFPs are rarely complete or unambiguous. This is because they are written only by the customer, and there is little opportunity for dialog so that the customer can get alternative perspectives and check that their work is clear and complete. When it is possible, the team should engage in the kind of dialog with the customer that they would in a customer-driven project in order to confirm their understanding of what the RFP says and to flesh out the request to include a more complete picture of what the customer actually needs. When this is not possible, the team should find people who can accurately represent the customer’s way of thinking and needs, such as people who have a similar position in a different organization in the same industry, or someone who has worked closely with the customer in the past and knows the business or people involved.
Whether one can get clarifying information or not, the concept documents should include documentation on where the team has made assumptions or interpretations of the RFP source material. These points are matters where there is greater than usual risk that the team’s assumption does not match what the customer is thinking. This means that there is a higher than usual risk that the concept or design that the team proposes will be interpreted differently than what the team means—and so it is worth putting extra effort into making those parts of the proposed concept or design as clear as possible.
There are two end results of the process for responding to an RFP: first, a decision whether to complete and submit a proposal, and second, submitting a proposal if the first decision is positive.
The decision about whether to submit a proposal or not depends on
Determining whether the team has resources requires estimating the resources needed. For the first steps of concept development, this may be small, perhaps one or a handful of people to gather information and to get an initial understanding of what the customer wants. As the work progresses, more resources will be needed—to gather more information, to do concept development, to gather competitive market data. At each step of the process, it will become clearer how many people or other resources are needed for the next step of developing the proposal. At the same time, the team must be able to estimate how much resource will be needed to build the system if they win a contract. This will be unknown to start, but as the system concept and architecture work move forward the estimates will improve. The team must develop the architecture enough to be able to determine prices to charge the customer and to be able to determine if the team will have the capacity to do the work. These analyses grow out of the concept of operations and later architecture documents.
Determining whether the team has a reasonable chance of winning is a combination of knowing how the customer will judge proposals, how strong other teams are likely to be, and how well this team can satisfy the customer. This information is gathered in the Customer Definition document, the Competition document, and in how the Concept of Operations and architecture respond to the customer’s objectives.
Finally, determining whether building the system can be worthwhile depends on knowing what the team’s organization values. Does the organization require a particular profit margin? Is there a minimum or maximum contract price that is considered “interesting”? Does the system fit within the organization’s business strategy? These kinds of questions are captured in the Business objectives document, and analyses use customer objectives, CONOPS, and architecture documents to develop an answer to them.
Developing proposals is a complex specialty, and much has been written about it. We refer the reader to ! Unknown link ref for further reading.
A visionary project, as we are using the term, is one where the system being designed and built is not for a specific, existing customer. Instead, the system might be marketed to several potential customers down the line, or the system might be part of a strategy to change an existing market or create a new one, thus creating new customers who may not even exist yet.
Consider building a new commercial passenger transport aircraft. The air transportation system is mature, and so one can name who buys these aircraft: airlines, aircraft leasing companies that provide the aircraft to airlines, businesses using aircraft for private transportation, and government organizations that fly passenger aircraft. No aircraft company in recent decades has built a new large passenger aircraft to be sold only to a single customer; instead, the companies work out the needs of many potential customers and design an aircraft that will be good for many of those customers. Since airlines come and go often relative to the lifetime of an aircraft design, many of the potential airline customers do not yet exist when the aircraft company has to decide on the capabilities of the new aircraft. This is a case where the market exists but there is not a single customer to satisfy.
In contrast, consider the first generation of global satellite data and telephony networks (such as Iridium and Globalstar). When they were being designed, there was no mass market of ground-to-space mobile communications. These companies, and others that did not end up deploying their networks, had to work out who their potential customers might be and what they might need. Indeed, all of these first generation providers went bankrupt at some point as they developed both their network systems and at the same time built up a subscriber base. This is an example of a project that was creating a new market.
In both these cases, there is not a single definition of a customer. Instead, the team must determine the market—the set of customers—who might want the system. The team looks for the set of features or capabilities that will satisfy a large enough market to be worth supporting. The plan will often be to start with a small market segment and grow over time by adding capabilities to satisfy more people, while having learned more about the first set of customers and gaining some revenue to help fund growth.
All this information will need to be collected from a number of sources, including market analysts, surveys of potential customers, and the experience of people who have worked in related industries. Finding people or organizations who can act as a proxy for a class of possible customers is helpful. It is important to gather from multiple sources in order to cross-check the information and to account for sampling bias that can happen if information comes from only one perspective.
The information about the target market segment(s) will change regularly over the course of the project as customers come and go, or as new opportunities appear. This means that the design and implementation of the system will likely need to adjust as time goes by. This also means that the team needs to continue to survey the market and talk to potential customers.
At the same time, it is a rare project that can successfully chase arbitrarily changing customer objectives. The design and implementation team needs enough stability that they can complete a version of the system. Marketing and sales teams need stability so that they know what they can actually sell to a customer. The stable version of the customer objectives should be baselined (see section on configuration management below). Changes to the baseline should occur only periodically, when the team decides that either there is a change in the understanding about customers that is vital to reflect in the design of the system right away, even at the cost of delaying the system being ready for use, or when there is a change that does not delay or significantly change the system being designed and built right now.
The idea of a minimum viable product (MVP) is fashionable in recent years. The general approach is to create the simplest system that will meet the needs of just a few customers, put the team’s focus on building up that first version, then plan on adding capabilities as time goes by to make the product attractive to more customers. This is an example of planning how to handle changes in understanding what customers want.
Visionary projects can expect that there will be competition with other teams’ products. Indeed, customer choice is a fundamental precept of the Western market system, and often required by regulation. A team should develop a record of what their competition might be, whether that is another organization offering a similar product (as happens with large passenger aircraft), or whether a customer could meet their needs a different way, or whether customers will choose no to buy a new product and live without its benefits (which is common with new technology trying to create a new market). The team should also build up an analysis of what sets this team’s system apart from alternatives—why a customer would choose this system over other options. Maintaining the Competition document with this information will help the team make decisions about changes to the customer objectives or business objectives around which the team is designing the system.
The concept development phase involves artifacts such as the customer objectives and concept of operations. We can now define each of these artifacts, but first we will address document management as a necessary supporting capability.
Each of the artifacts worked on and produced in the concept development effort should be placed under document management. The document management system should provide:
A project should establish a document management system early in the initial concept phase. The concept will be represented in the artifacts listed below, and when these artifacts are reviewed and approved, a baselined version of each should be available in the repository.
Organization. Ideally, a project will designate one tool for storing all electronic information, and organize the documents stored in that tool so that it is convenient to find each kind of document. In practice most projects use different tools for different kinds of artifacts—a source code management system for software, a document system for ordinary documents, design repositories for hardware designs.
A document does no good if the people who need to use it cannot find it. A project must provide a single starting point for finding documents, whether those are stored in a single tool or spread over multiple tools. The contents of each repository must be well organized; we have too often seen projects build up a long, long list of documents, each with a document number and unhelpful title, requiring users to scroll through the list or guess at search terms. Creating an index that organizes the artifacts by the relevant phase and component helps people significantly.
Repository organization takes effort. We recommend making at least one person explicitly responsible for maintaining the organization in the repository, maintaining indexes, and (if necessary) updating the organization to address how people actually use it.
This means that the repository should:
Versions. The tools for storing artifacts must be able to maintain both a baselined version and multiple working versions.
The baselined version must be clearly identified as the baseline, so that people know what the official document is. A baselined version must also be immutable: it has been approved as a stable version. The baseline version should be replaced when a new version is approved as the baseline.
Working versions, on the other hand, can be updated often. People store working versions in a repository for multiple reasons: to preserve a copy of the work in case their local copy is lost or damaged, to share work in progress with others, and to provide a version as a proposed new baseline. People may be working on different changes concurrently—one person addressing one change, while another person works to address some other issue.
This means that the repository should support:
Approvals and workflow. The team relies on the integrity of baselined artifact versions. Any updates to the baselined version should, therefore, be carefully controlled. The typical workflow is that someone develops a working version of the artifact, then proposes it for a new baseline. The proposed version then gets reviews, and is either approved to become a new baseline or is given issues that need to be addressed before it can be approved. Once approved, the proposed version is promoted to become a new baseline.
Every project needs to have a clear, written procedure for this workflow. It should be clear to every team member how they go about proposing a working version to be baselined, how the review and approval steps are performed, who is responsible for approval, and the steps required to turn a proposed version into a new baseline.
Some artifact repositories provide support for these workflows. Software repositories, for example, provide functions to create branches (working versions), and to control the process where a branch is merged into the master branch (baselined version). Other tools provide a general workflow functionality that one can use to implement and enforce these steps.
We have seen some projects that do not use automated workflows, instead having a well-documented manual procedure for each of the steps. While this can be error-prone and while it does mean that one or more people must be responsible for managing the repository contents, this approach works well as long as the team is not too large and no more than a few dozen artifacts are being managed. This is especially useful when a project is starting up and has not yet determined what tools they will be using.
Finally, we noted earlier that sometimes it is important to update the baselines of several artifacts at once so that they stay consistent with each other. For example, consider when a customer requests a new function be added to the system. The new function must be added to the customer objectives document. The customer objectives and the concept of operations will then be inconsistent: the objectives will include the function, but the CONOPS will not. Someone will then need to update the CONOPS to add the function, followed by reviews and approval. It can be best to baseline the updated customer objectives and CONOPS documents at the same time, once they have both been updated and the updated CONOPS has been approved.
The repository, thus, should:
It is desirable, but not required, that the repository:
Other considerations. The previous sections have outlined the functions that a repository should provide. Effective artifact management requires some other capabilities.
These capabilities include:
In addition, the repository will work in conjunction with issue tracking or change order management tools. Those will be discussed in a later chapter.
Purpose. The customer definition captures information about who the customer is. It is used partly to help inform the process of developing the initial concept for the system, but it is also the place for recording things like who the points of contact for the customer are.
Form. The customer definition document is generally a prose document; it does not need structuring the way objectives or requirements do. Some organizations may have customer relationship management tools that will capture some of the content defined below.
Input. The customer definition document contents come from a number of informal sources. These include:
Dependents. The customer definition document affects the other artifacts developed during the concept development phase because knowing who the customer is is the first step in defining what the customer wants.
Content. The customer definition includes:
If this is a visionary project, the customer definition does not describe a single, specific customer, but instead describes the general class of customers who are expected to want the system. The definition may include information about one or more customers who are representative of the class. It will also need to include more general information—for example, the range of ways that customers in the class decide about acquiring a system, or the range of company sizes and budgets.
Completion. The customer definition document usually gets regular updates through the life of a project, such as when the points of contact at the customer change or when a new market analysis shows that the potential customers for a visionary project have changed. The customer definition should be baselined initially when most of the content has been recorded.
Purpose. The customer objectives document is a record of what the customer wants out of a system. This document is a summary of what they have said they want and what has been drawn out from them in discussion or research. The objectives document is the source of all the rest of the artifacts developed for the system.
The objectives should record
The customer objectives document should be as close to the customer’s words and organization as possible. The document is a summary of the customer’s wants and needs. It is used as a proxy for the customer throughout system development, rather than having every developer talk directly to the customer to check their specification or design.
Other information must be kept out of the customer objectives document. The other information is captured separately in things like the regulatory and business objectives documents, which we discuss next. We have seen organizations include their business objectives, such as profitability, in the list of customer objectives. We have also seen teams include internal technical objectives like being able to reuse parts of existing designs included. Doing so creates confusion: is an objective in the document actually something the customer wants, or is this something the customer doesn’t care about? There will come times in the development process when hard decisions must be made about some part of the system design; at those moments, the team must be clear about what is actually a customer need and what is an internal need. If a customer need can’t be met reasonably, then the team needs to talk to the customer to resolve the issue. If an internal business or technical objective is proving hard to meet, the decision should be handled internally and the customer should not be involved—they don’t care or know about the issue.
Form. The customer objectives document is a predecessor to formal specification of the system, so it does not need to be a formally-structured document. A prose document with plenty of diagrams works well.
Input. Where the objectives come from depends on the kind of project and the kind of customer. If this is a customer-driven project, the information will come from discussion with the customer. For an RFP-driven project, the information will come from the request for proposals, possibly supplemented by information gathered in discussion or from market research. For a visionary project, the information must come from market research.
Dependents. The customer objectives document is the source, direct or indirect, for every technical artifact developed for the system.
The CONOPS derives directly from the customer objectives. The CONOPS is the first level of turning the informal statement of customer objectives into something more formal. The top-level specification of the system—which is developed after the initial concept phase—derives in turn from the CONOPS, with references back to the customer objectives.
The initial concept development phase ends with review and approval of the customer objectives document and the CONOPS.
The customer objectives provide input to contracting materials. If there is a contract between the development organization and the customer, there is usually a statement of work defining what the development organization should deliver. The statement of work will need to match the material in the customer objectives document. For RFP-driven projects, the development organization’s proposal to the customer must match the customer objectives document; the proposal is one of the sources that leads to a contract and its statement of work.
Content. The customer objectives document should include everything that the customer has said they want the system to be. This should include things like:
The document should organize this information in understandable ways. The information from the customer will likely come in small increments, in arbitrary order—especially if it is obtained in discussions or from market research.
The objectives document must not include material that is not directly about customer needs or wants.
Completion. The customer objectives document is ready to be baselined when it includes everything that has been obtained from the customer or other input sources, and when the customer agrees that the objectives document is complete and correct.
Determining when everything is recorded is not easy. There are three conditions that we have used to decide that the objectives are complete:
To do these, we have built up a collection of the messages or documents received from the customer and of the notes from discussions. We maintain this collection as the source material that the objectives document references. While writing the objectives document, we mark a copy of these sources to show each piece of information that should be included as an objective, and cross them off as they are incorporated. This usually leads to a final (tedious) review of all these sources to check that nothing has been missed before declaring that we have properly checked everything off.
Where possible, the customer should review the objectives document and approve that it correctly includes all their needs, and nothing else. If the customer cannot do such a review, then someone who is independent of the team and can be an accurate proxy for the customer should review the document. For a visionary project, where there is no customer, this could be someone who has done market research. For an RFP-driven project, this could be someone who is familiar with the customer.
Comments. For some users, working in terms of use cases ! Unknown link ref will be familiar. While documenting use cases—with users and functions—is helpful, it cannot capture all of the information from the customer in their original language. Resist the temptation to document the objectives as formal use cases unless the customer is providing information that way. Formalization comes in the concept of operations document, which derives from the objectives.
While the system being built needs to meet the customer’s needs, there are other stakeholders whose needs must be addressed as well. These are, broadly, internal objectives of the development organization and external objectives of third parties, such as regulatory bodies. Some of these objectives will define capabilities that the system must have. Other objectives provide constraints on how the system can function or be implemented, without defining specific capabilities.
Many kinds of systems are subject to regulation. Some systems require licensing or certification, to prove that they meet regulations; others only need to be able to show compliance on demand.
These regulations pose constraints on the design of the system. Some are simple: aircraft emergency exits must be marked in particular locations. Some are complex: the crew of an aircraft must be able to properly determine what is happening with an aircraft even when there are complex failure situations—which involves human factors as well as the design of aircraft sensing systems.
A system will not be able to be put into operation unless it can meet these regulations. This means that the regulations must be incorporated into the design, just as the functional desires of the customer must be. One cannot do this unless one knows what the regulatory constraints are, and so one must search out and document all the regulations that apply.
The regulatory objectives document should at minimum list the source regulatory documents that apply to the system. Before design validation is complete, the information in the regulatory objectives document must translate into a detailed collection of requirements against which the system can be checked.
It is often necessary to involve either experts in the regulation of a particular industry or the regulatory agencies themselves to properly gather all of the regulations that apply.
Regulatory examples. We look at regulation of two kinds of systems: aircraft and spacecraft. These two examples show different approaches to regulation. Most (but not all) aviation regulation is typically provided by a single government organization, the national civil aviation authority (CAA). In the US, this is the Federal Aviation Administration. All the civil aviation authorities worldwide are harmonized through the International Civil Aviation Organization (ICAO). In contrast, regulation of spacecraft is spread over multiple organizations, and there is little or no international harmonization of regulations.
Aircraft regulation. Aircraft regulation is focused on managing the risk to aviation non-participants (such as people on the ground) or casual participants (passengers on board an aircraft). The body of regulation is complex, taking a number of different approaches to both protect people in general while allowing those who can take responsibility for aircraft behavior the maximum feasible freedom to do as they need. This results in a combination of rules: licensing of aircraft types, constraints on where different kinds of aircraft can be flown, pilot training and certification, air traffic control over where aircraft are flying, and many others. It requires the combination of all of these rules to meet the objective of controlling risk to the public.
The regulations that apply to aircraft in particular (as opposed to the larger aviation system) begin with classifying the kind of aircraft by the risk it poses. Ultralight aircraft are lightly regulated, primarily defined as a maximum weight, speed, stall speed, and so on. Pilots either do not need a license or only need a limited license for ultralight aircraft. They generally can only be flown in daylight. There are intermediate kinds including those for general aviation, aerobatic and utility aircraft, commuter aircraft, and finally transport aircraft. Each category has limitations on its weight, speeds, number of passengers, acceptable pilot qualifications, and allowed maneuvers. The restrictions increase as the number of passengers, weight, and speed increase because each of these induces greater risk to the public.
CAAs throughout the world have encoded the regulations for each category of aircraft. In the US, for example, the regulations for transport aircraft (the largest category) are defined in the Code of Federal Regulations, Title 14 (the FAA), Part 25 (Transport category airplanes). Other parts of Title 14 cover topics like airports, the structure of airspace, air traffic control, carriers or operators, and navigation facilities; these other parts define the environment in which the aircraft will operate.
Most kinds of aircraft require a type certification. This is issued by a CAA to show that the CAA has verified that the aircraft’s design meets all these regulations. This is the first enforcement mechanism used to ensure that an aircraft complies with regulations. There are additional mechanisms, including registering individual aircraft and periodic inspection of the aircraft and its records by CAA-authorized auditors. The final level of enforcement comes from air traffic control granting permission to fly or not.
There are some regulations that apply to aircraft that are not typically handled by a CAA. This includes radio communication, which is typically regulated by a national communications authority (in the US, the Federal Communications Commission) and harmonized worldwide through the International Telecommunications Union.
Spacecraft regulation. Unlike aircraft, spacecraft do not have a unified regulatory regime. This is in part because there is no single unifying principle behind the regulations, as there is for aviation (safety of the public). Most spacecraft pose a negligible danger to the public during operation, as they are small enough to be destroyed when they re-enter the atmosphere. Historically, there has been concern about the military value of the information produced by spacecraft; more recently, there is increasing concern about the dangers one spacecraft poses to other spacecraft.
At the time of writing, in the US, spacecraft regulation includes:
These regulations are spread over multiple agencies, and are changing rapidly as commercial uses of space change.
No systems operate in isolation. Instead, they operate within the context of a larger system of people, businesses, and organizations. This might include:
The interactions and dependencies within this larger system also create constraints on how the system being designed must function. It is important to identify each of these organizations or systems, document how the system will interact with them, and then document the more specific objectives that are involved in working with them.
This information should be collected into one or more documents that record, first, the structure of the larger system and its interfaces with the system being designed; and second, the sources of constraints or objectives for each interface.
Information about the ecosystem in which the system will operate is likely to change frequently over the course of developing a system, especially for visionary projects. This means that it is important to update information about these objectives, and when it changes, flow those changes down into the system design.
Example: communication services. Consider a system of multiple vehicles—such as cars, trucks, or small UAVs—that need to communicate continuously with a central operations facility. The system itself is the vehicles and the operations facility. The communications are likely to be provided by a third party: a cellular communications company, for example.
As the system design progresses, the team will be able to define more and more accurately what capabilities are needed from the communication system. How reliable does it need to be? Can there be areas with poor or no coverage? What data rates are needed?
At the same time, communication providers will have their own constraints and capabilities. This might include pricing—both how pricing is calculated (Flat rate? Amount per data transferred?) and what the rates are. It might include their coverage area, and their mechanisms to provide information about outages or new coverage. It might include terms of use, with restrictions on what kind of data can be transmitted and what security measures the system must take in order to be connected to the provider’s network.
Example: spacecraft launch provider. Most spacecraft launches are performed by a company different from the organization that builds and operates the spacecraft. The launch service provider is responsible for receiving the spacecraft from its builder, integrating it onto the launch vehicle, and placing the spacecraft in a designated orbit. The launch provider is in turn responsible to regulatory agencies that ensure that the launch operations are safe, and in many sites the launch provider must work with a range safety organization (in the US, the US Space Force provides range safety for the Eastern and Western Test Ranges).
There are two classes of interactions between the launch vehicle and the spacecraft: the effects that the spacecraft can have on the launch vehicle, and the effects that the launch vehicle can have on the spacecraft. The provider gives the spacecraft designers specifications of the launch vehicle, including how the spacecraft will be attached and released; what vibration, pressure, and thermal environment the spacecraft will be in during processing and launch; and what communication is possible between the launch vehicle and the spacecraft. The provider also gives constraints on what the spacecraft can do, such as constraints on the spacecraft’s mass, volume, center of gravity, or gas releases. The provider also gives safety constraints, such as the allowed propellants or toxic materials, the state of batteries or other energy storage systems, or the permitted electromagnetic radiation. These constraints usually derive in part from the launch provider’s safety certification with the appropriate regulators or range safety organizations.
Most launch providers make a Payload User’s Guide available that documents this information.
Example: safety-critical component provider. A recent project we worked on involved acquiring a number of sensors for measuring the environment around a vehicle, so that the vehicle could safely plan a path around obstacles. Some of the sensors were not yet available in production, and the team had to work with the providers to obtain evaluation units.
The interaction between the team and the sensor provider was typical of interactions with providers in general. Negotiations between the team and the provider covered topics like:
These issues do not affect the core technical function of the component. Some of them do, however, place constraints on how the team can use the component (it might not be possible to repurpose the sensor for any arbitrary function). Other issues, such as quality control or acceptance testing processes, affect the safety of the system that incorporates that component.
As a result, these constraints also need to be captured in an objectives document, and the system’s design must be validated against the terms.
An organization that is devoting resources to build a system must be able to obtain those resources. At the minimum, the organization must be able to hire and pay the people who design and build the system; it must be able to pay for the tools and prototypes it uses; it must be able to pay people to gather customer objectives and work with regulators and all the hundred other tasks involved.
Most organizations are also building an ongoing business, not just coming together long enough to build one system and then disbanding. Sustaining a business requires obtaining funding, getting sufficient return on the work the organization does in order to fund continuing work, and building capabilities that allow the organization to keep building or maintaining systems into the future.
All these imply that an organization needs to have a business strategy, which leads to business objectives. The organization may have a strategy of developing a product line that serves a wide variety of customers. This might translate into an objective to build a simple initial system product that is able to generate X revenue, and that can be extended over time to address the needs of more customers.
Many organizations develop these objectives at the executive level but do not feed the information downward explicitly to the team who must design a system. This is a problem because the design team knows that such objectives exist but don’t necessarily know exactly what they are, and thus can’t make accurate design judgments. We have seen, over and over, questions in a design team like “should we design this board with extra capability now, or design the minimal board and replace it later?” These have often led to arguments because the design team did not have the information needed to make a choice between a higher up-front investment cost for extra capability or incurring cost later in a redesigned board.
There are many different kinds of business objectives to document.
Some objectives are easily quantifiable:
There are general business case objectives:
Finally there are business strategy objectives:
These business objectives change continuously. When there is a proposal to change the objectives, the team must follow a disciplined process to determine what the effects of the change might be. This involves tracking down how the change will affect technical requirements and designs, which in turn affects whether the changes will affect the system’s ability to satisfy customer, regulatory, safety, or security needs. Changes to the design will also affect development cost and the time required to bring the system to operation. Sometimes a change to business objectives will make sense: changing the rate at which the system should scale up after the initial operational version may not affect the development time much but will increase customer satisfaction. Other times a change will have negative consequences: setting the goals for the size of the addressable market too high too early may require a higher development budget and longer development time than is available. Making a well-informed decision about these changes is only possible if the team can determine what the effects of a potential change in business objectives are.
Safety is the condition that a system, when operated in the intended way, does not produce too many events that cause harm. There are four parts to this statement:
In the end, a system must be shown to be safe by showing that the rate at which it causes harm is below a threshold. The process of designing a system to be safe is well known to be a difficult task, and there are many books and standards that try to give guidance on how to do so. As the system is designed, it must be evaluated to show the likely rates at which harmful events will occur.
This is a complex topic, and later chapters will address the design and analysis of safe systems. For now we focus on safety objectives.
Performing these evaluations requires defining what kinds of harms are to be measured, along with the acceptable rates at which they occur. There is no possible way to justify that a system is “safe” or “unsafe” without defining the harms they refer to.
A project, therefore, must define and document its high-level safety objectives in terms of the harms and the acceptable rates of those harms occurring. This is the safety objectives document.
Some industries have conventional definitions of harm and rates. The automotive industry has adopted a scale of zero to three for “severity” in the ISO 26262 standard ! Unknown link ref, focused entirely on injury to persons. Severity 0 is no injuries, 1 is light to moderate injuries, 2 is severe injuries with survival probable, and 3 is severe or fatal injuries. The aviation industry has defined a five-level scheme in the ARP 4754 standard ! Unknown link ref, ranging from minor (slight increase in crew workload or minor passenger inconvenience) through hazardous (serious or fatal injuries among passengers) and catastrophic (many deaths, loss of aircraft).
These two standards differ in two respects. They consider different ranges of harm: ISO 26262 has any severe or fatal injury as its highest category, while ARP 4754 considers the distinction between fatal injury and mass fatality. They also consider different kinds of harms: ISO 26262 only considers injury to persons, while ARP 4754 considers effects on the crew’s ability to control the aircraft and damage to the aircraft.
These point to deficiencies in the standards, and to the reason why a project should define its safety objectives more carefully. There are many harmful incidents that these standards do not address, such as damage to property, economic harm, or damage or injury to non-person cargo. Consider an incident involving a truck that damages an overpass, but does not injure anyone directly. The cost of repairing or replacing the bridge can run to several millions of dollars; the economic impact on the community of not being able to use the bridge can be equally high. In addition, depending on the industry, the range of severity in these standards can also be too limited: they do not account for harms that spread beyond the people and vehicles immediately involved in an incident. The use of aircraft as missiles in the 9/11 attacks showed how an aircraft safety incident can result in mass casualties or worse.
In addition to defining the harms that system design will consider, the safety objectives document sets targets for how often those harms can occur. Guidance issued for commuter aircraft ! Unknown link ref, for example, gives a maximum allowed rate of incidents per flight hour: XXX
Minor 10-3 Major 10-5 Hazardous 10-7 Catastrophic 10-9
The safety objectives document should define a maximum rate and the time interval over which that rate applies for each category of harm.
Some organizations may choose to say that the system they build should allow zero safety incidents above a certain level. This is possible only if the system can be guaranteed never to perform operations that could induce such serious events. For example, an aircraft can be guaranteed never to cause catastrophic harm, involving multiple fatalities—but only if the aircraft has a maximum weight of a few tens of kilograms, a low maximum speed before it disintegrates in the air, can only carry a single person, and so on. No transport aircraft (more than 19 seats or maximum takeoff weight greater than 19,000 lbs) that actually flies can ever have a zero rate of catastrophic harm. Similarly, many weapons systems can never have a zero rate of mass casualty harms simply because of the energy they carry. In most cases, as the conventional wisdom goes, the only way to get a system to have a zero rate of harm is not to build the system.
Safety objectives, like customer, business, or regulatory objectives, are sources that lead to the concept of operations and top-level system specifications.
Defining precise safety objectives early in a project is required for building a safe system. We have observed many projects that made aspirational statements about “safety being a first priority”. In every single instance where the definition stopped at that statement, the team designed an obviously unsafe system—often because, in the absence of an objective standard, each person took steps they thought would be safe but in aggregate the design missed even basic scenarios that resulted in hazards. Further, the absence of an objective meant that no one could perform an objective analysis of a design to determine whether it was good enough.
The security objectives document provides guidance for how the system should be designed and validated to ensure that it can handle a reasonable range of attacks.
Security objectives are similar to safety objectives: they define a set of harms resulting from security incidents that the system must work to avoid or contain. Unlike safety objectives, however, security incidents occur as the result of malicious, intentional actions rather than as a result of failures, accidents, or design flaws. Like safety objectives, the security objectives document names the harms that the system should avoid. It cannot, however, generally specify maximum acceptable incident rates because the rate at which attacks occur is something that attackers can deliberately control.
The approach to defining security objectives, then, is to name threat actors and the harms they can cause. A threat actor is a person or organization that can choose to initiate an attack on the system, such as a hacker, a criminal organization, or a hostile nation state. Each threat actor can be characterized by their motivations (a criminal organization for financial gain, a nation state to disable defense-relevant capabilities). The harms the threat actors can cause include disclosure of confidential information, interruption of business, death of persons, financial loss, or theft of goods. The list of harms includes every kind of harm addressed as a safety concern, plus harms that do not involve damage or injury but do involve loss of value or information.
The system must then be designed to address the different harms that different threat actors might pose. The resulting design can be analyzed to determine whether the threats are sufficiently addressed. The built system can be tested to verify that key defensive features are working as intended.
The definition of “sufficiently addressed” remains subjective. Some security analysis techniques have rationales for assigning weights to different threats. For those analyses, ensuring that all high- and medium-priority threats have been mitigated might be sufficient.
There are many standards related to security, and depending on the industry and geographic region compliance with some standards may be mandatory. These may define security objectives that a system must meet for regulatory or business acceptance. This information should be documented in the regulatory objectives document, and information about threat actors or harms should flow from the regulatory objectives document into the security objectives document.
Purpose. The concept of operations (CONOPS) document is the systems team’s response to the customer’s objectives. It collects in one place a description of how the system will work, from the point of view of the people who will use it. Whereas the customer objectives come from the customer and should record their needs from the customer’s point of view, the CONOPS shows how a system could behave in a way that meets the customer needs. The CONOPS is written from the point of view of the system.
The CONOPS document organizes the ideas about how the system will behave. In doing so it gives a model for understanding how the system’s functions can be organized and how different behaviors relate to each other. It does not aim to provide every detail about the behavior; its value is in documenting the big picture.
The CONOPS document has three primary purposes:
The document is typically written as a narrative, and not as formal requirements or detailed behavioral models. Its value is in its explanation, not its precision. It is an explanation of how the system might function, without reference to how the system can be implemented to achieve those functions. The details of operations, as well as the implementation, are recorded in documents that derive from the CONOPS. It should, however, expose the users, features, functions, states, and use cases that model what the customers’ objectives mean.
The CONOPS should include functions that are implicit in the customer’s objectives. For example, the document should cover the system’s entire life cycle, from deployment or initial startup through shutdown and disposal. The document should cover major faults that might occur, and the system’s behavior when those occur. For a spacecraft, for example, it should include recovery modes that allow the spacecraft (perhaps under ground control) to re-establish normal operation after a fault. It should include not just the technical core of the system, but also how the human or organizational elements that use the system behave. For an automobile, the CONOPS should not just say that there is a driver, but include expectations like the driver being trained and licensed. For an airline, the CONOPS should include the airline’s safety management system and how that interacts with an aircraft or maintenance technician.
To produce the concept, the systems team reviews and understands all the objectives documents already described, then follows a process to extract from those objectives a model of how a system could behave. The process of analyzing all the objectives will almost always reveal things that the customer or others have not addressed—customers often focus on the main operational behaviors, for example, and don’t address how to deploy or dispose of the system. The systems team needs to find these gaps and address them. Where possible, the systems team should work with the customer to check whether the customer in fact has expectations about these topics before committing to a concept.
Form. The CONOPS is typically a narrative document, though organization is important. Diagrams are especially helpful as long as they only expand on the narrative description.
It is not recommended that the CONOPS document consist solely of diagrams, such as UML/SysML use cases. While these can be helpful as a part of the document, the CONOPS must provide the explanation for what these use cases are and how they relate to each other.
We have seen many projects that try to use a “CONOPS document” to record the specification of the system. One can recognize when this has happened because the document runs to hundreds of pages, includes lots of details, and is usually abandoned shortly after system development begins. This is bad practice.
The CONOPS document should be short. It is a high-level explanation, not the details. The details come in the system specification, which will be long, tedious, and written in stylized forms that are not easy for the uninitiated reader to understand. The CONOPS document should explain and illustrate the life cycle of the system, from deployment, through operation, to retirement. It should define the major users and the major functions they need from the system. A good CONOPS document is often anchored around the “big scary picture”, like the OV-1 overview diagram in the DODAF standard ! Unknown link ref: a diagram that illustrates the main behavior of the system in one place.
Input. The CONOPS derives primarily from the customer objectives. It is the system’s team distillation of what the customer has indicated they need, combined with the team’s exploration to define the users and behaviors
Dependents. The CONOPS document is the primary technical output of the initial concept development effort, and all of the other technical artifacts derive from it. The CONOPS is the source for the top-level specification of the system, which is a more formal interpretation of the CONOPS. Other design, evaluation, and implementation artifacts in turn flow from the specification.
The CONOPS is provided back to the customer for their review. The customer should check whether the system described in the concept meets what they need and what they were expecting. If so, the customer’s review leads to approval of the concept.
Content. The CONOPS document is recommended to include the following information:
Again, the CONOPS document is intended to be short. Many engineers have succumbed to the temptation to make the concept of operations document be the design document; don’t do that. The CONOPS should remain tightly focused on the users, use cases, and externally-visible behaviors without going into implementation. If there is a need to provide great detail about some externally-visible behaviors, write a document with the details and reference it from the CONOPS document, or defer this to the more detailed specifications that follow on from the CONOPS.
There are many templates for CONOPS documents. Two examples are:
Completion. The document can be considered complete when three conditions hold:
Purpose. The initial concept development phase gathers and organizes information about what a system should be and how it should behave. It gathers this information from many sources: the customer (if there is one), third parties that impose constraints, the developing organization’s policies and standards.
The concept leads, in turn, to all of the technical artifacts that make up the system and its design: the specifications, designs, analyses, implementations, and so on. Those technical artifacts can only be as good as the concept from which they are developed, so checking that the concept is accurate and complete is vital to producing a good system.
The team is ready to proceed to system specification when two conditions hold:
Form. The review and approval steps can take many forms, but at minimum they should include providing the reviewers and approvers with the documents to be reviewed, and a mechanism for recording comments and approvals.
Input. The review and approval steps use the various documents developed in the initial concept phase.
The people who should provide reviews and the people who have approval authority should be identified before starting the review process.
Dependents. The approvals of the initial concept are a gateway to all further technical development.
Content. There can be several different reviewers and approvers for the initial concept. In general, part of the concept needs to be reviewed by the customer, in order to get feedback from them on whether the concept meets their needs. Other documents created during the initial concept development are not necessarily for the customer’s knowledge—matters like the developing organization’s business objectives. These other documents should also be reviewed, at minimum by people inside the development organization.
As always, the reviewers and approvers should be independent of the people who wrote the documents. The goal is to have people who do not necessarily bring the same preconceptions to reading the material as the people who wrote it, in order to catch assumptions that need to be detailed.
One should expect that the reviews will generate comments and questions. Some of the comments will require the team to revise the various documents, at which point the changes will need to be re-reviewed. Being able to clearly identify the changes that a reviewer needs to address will help them determine what to focus on.
Completion. The review and approval step is complete when all the approvers have formally indicated that they concur with the documents and that they believe that the project is ready to move on to designing the system.
Purpose. A proposal, in the sense meant here, is a document that is sent to a potential customer in response to a request for proposal (RFP) that the potential customer has issued.
A proposal needs to make four cases to the customer:
The proposal derives from the work done during concept development, but usually also must include initial system specification and design work. This initial technical work is needed both to be able to explain to the customer what they would be getting if they choose this team to develop the system, and to generate a reasonable price for building the system.
Many processes and guidelines for proposal development have been published over the years, and we refer the reader to that large body of literature for details.
Form. Many proposals are required to follow a precise form. The form and contents are typically specified in the RFP, and often derive from regulatory requirements.
A typical proposal to NASA, for example, must follow a structure specified in the RFP. The form usually consists of:
The proposal also specifies the format in which the proposal must be delivered (PDF electronic form, paper), allowed numbers of pages for different sections, the font choices and sizes, and many other details.
Input. The proposal contains a summary of a lot of information about the technical system design. This means that the team must have developed a top-level system architecture, which in turn depends on being clear about the system’s concept and the objectives it has to meet.
The proposal also needs to include cost or pricing information. This also depends on having a reasonably accurate idea of what work will have to be done to provide the system to the customer. This also depends on understanding the business objectives that a contract would need to meet, such as the expected profitability.
Finally, a good proposal needs to clearly demonstrate to the customer that the system being proposed meets their needs. This is often presented in the form of a compliance matrix or compliance table. This table lists each of the customer’s major objectives and points to where in the proposal this objective is addressed, so that the customer has an easy way to check how their objectives are met.
Dependents. A successful proposal leads to a contract and to system development. The contract will specify what the team is supposed to build, superseding any information that was gleaned from an RFP. The team will need to develop updated versions of the customer objectives, regulatory objectives, and safety/security objectives, then revise the concept of operations, to reflect the actual commitments they have made to the customer.
As with all development projects, the real system specification will then flow from the revised CONOPS and onward into technical implementations.
Getting from the proposal to the contract may involve negotiation, which (from the developing team’s side) will use the material generated for the initial concept and for the proposal to inform negotiating positions.
Content. A proposal, as we have said, needs to convey to the potential customer what the team proposes to provide to them, along with evidence that the team can actually do the work and do it better than any competitors.
Completion. The proposal is complete when it is delivered to the customer. When delivered, it must meet the format and content requirements that the customer has provided in their RFP.
It is common for a customer to have clarification questions or ask for revisions to a proposal. In those cases, the team may need to respond with an update to the proposal.
Purpose. Most systems projects will be in some kind of competition—whether for a customer contract, for sales of a developed system, or for acceptance of a new technology over an existing approach. A team can develop a good concept or a good system, but then fail to get that system used.
The competition document gathers together intelligence about who and what might compete with this team’s system. It lists strengths and weaknesses of each competitor.
Knowing about competition applies to every project, not just those which must generate a competitive proposal. A customer-driven project must still satisfy its customer; the customer will be aware that they have choices about what investments they make in new systems or upgrades. A visionary project may have direct competitors who may try to build similar systems—but visionary projects can also have competition from the way problems are already being solved, as a customer can always choose not to buy the team’s new system and stick with what they already have.
Form. The competition document does not have any set form. We have often organized the document with one section per competitor, with a description of each and bulleted lists of their strengths and weaknesses.
Input. The information in the competition document comes from a number of primary sources: people who track relevant markets, interactions with the potential customers, market surveys, and so on.
Dependents. Information about competition can feed into many parts of the initial concept development:
Content. The competition document must be an unbiased presentation of the alternatives to using the system being designed, and of the advantages and disadvantages of those alternatives.
Many people will naturally want to emphasize what they see as their own strengths and try to contrast the competition to those strengths. That makes for a misleading competition document.
The competition must be presented as fairly as possible, and from the customer’s point of view. The document must be honest about the strengths that competitors have: they will have strengths and the team cannot defend against those if they do not have an accurate assessment of them. The document must be equally honest about the competitors‘ weaknesses. The team cannot design a better solution if they do not accurately know what customers don’t like about what their competition offers (or might offer), or if they don’t understand what structural problems the competition might have in designing or building their own offering.
Completion. The competition document is never really complete, because other teams and other technologies will always be changing. The competition document can be complete enough to support CONOPS development or proposal development when the people who are searching out potential competition haven’t found any new competitors in a while.
Specification is about recording how a component (or system) should behave or the structure that the component should present. It only documents how the component appears from the outside, as a black box; it does not specify how the component achieves these ends. A specification derives from the less-formal concept for the system or component.
XXX address specification vs requirement
XXX make sure this ties into the broader flow of phases
A specification provides a simplified and abstract view of a component. This abstract view allows one to reason about how the component will work with other components. Without the abstract view, one would have to analyze the details of a component’s implementation to determine whether it will interact properly with another. While that is possible, the work of figuring out how the component will behave only serves to reconstruct design information that was originally worked out when designing the component. The reconstructed information will not necessarily match the information used during design, and the effort is wasteful.
A good specification records the intent and assumptions that went into working out what the component is supposed to do. This information helps the component’s implementer and designer to check that they understand what they need to build, and to check that the specification matches the intent. These assumptions also help people understand how a component might need to change when part of the system is redesigned—to add a new feature, for example. A record of the intentions helps people who come along later to understand the system, and the particular component’s role in it.
Finally, a specification serves as a sort of contract between a component and the rest of the system in which it functions. The people building the component in question can proceed to work on their component with confidence that the result will likely integrate correctly into the system as long as they build to that contract. The people building other parts of the system can likewise proceed with reasonable confidence that when they go to use the component, it will do what they expect.
A specification is used for several different tasks by different people over the course of a project. A good specification needs to be structured and contain the information needed to support these people.
Specifications should be clear and unambiguous. Each of the people who will read and use each specification need to come to the intended meaning of the specification.
They should be testable. Someone using the specification should be able to look at a design or implementation and determine whether it is compliant with the specification. That does not mean that determining compliance is easy; it only means possible. Sometimes the most that is possible is to build a body of evidence that a design is highly probably compliant. For a specification to be testable, however, the specification can’t contain statements like “approximately” or “fast” or “heavy”; it needs specific values that define what “approximately” (“+/- 10%”), fast (“at least 20 m/s”), or heavy (“greater than 5 kilograms”) mean so that compliance is not a matter of subjective judgment that can differ between two different people.
The specifications need to be organized. A specification is no good if the people who need to use it don’t know it exists or can’t find it. A specification is also not useful if the people who need it can’t tell whether it is currently applicable, outdated, or a speculative proposal. Specification should be kept in one place where everyone on the project can find all of them, and they should be maintained under configuration management.
A good specification is minimal. It addresses the needs for the system or component that have been identified in the concept work leading up to the specification, but it does not add other elements that are not relevant to the identified needs. (Note, however, that the process of developing a specification can often reveal needs that were missed in building up the concept and CONOPS. When those gaps are found, the concept and CONOPS need to be updated as well as addressing the gap in the specification.)
Specification and documentation play different roles. Specification is a record of what something should be, while documentation is a record of what it has been designed and implemented to actually be. Specification deals with the black-box, external behavior, while documentation deals with the internals of the component. The documentation should connect decisions about the component’s internal structure to the external behavior or structure documented in the specification.
A small project, implemented by a very small group of people over a short time and thereafter left alone, and that does not provide safety- or security-critical functions, does not necessarily need specification.
Unless all of those conditions hold, some level of specification is necessary in order to communicate between people and across time.
The communication includes:
A specification defines the metaphorical shape that the component should have in order to fit into and support the system.
A specification treats the component as a black box: it considers only how the component should be seen from the outside, without determining how the component’s internals should be designed or implemented. One way to look at the specification is that it defines a contract between the system and the component: if the component behaves according to the specification, the system should work correctly as a whole.
A specification may define behaviors or attributes that in effect narrow the range of possible designs, possibly to only a single design. That situation in itself does not make a specification invalid. However, the specification should not include definitions that are not strictly needed to record needed external behaviors solely in order to constrain the design.
After a component has been specified, design of the internals of that component begins. The internal design often uses sub-components. The designers will develop specifications for the sub-components.
This process repeats recursively to lower and lower components, until one reaches components that have no further sub-components. The result is a tree (or possibly a DAG) consisting of alternating layers of specifications and designs. (This has been called the “layer cake model”.) The design of one component (or the system) responds to its specification. The specification for subcomponents depends on the design that has been selected for the component—the design determines both what subcomponents there are, and how they are to work together.
Some years ago, I worked on a rack-mounted computing system that had high reliability and uptime goals. A decision was taken to include a battery pack in each server assembly, so that if the mains power went out the servers would have enough time to record their state on storage before shutting down.
Consider the specification for the battery pack. It may seem simple—provide enough power to run the server assembly for some period of time—but the actual specification contains several subtle elements because its function is entwined with other system-wide reliability and safety behaviors.
Here are some of the system behaviors that affect the specifications for the battery pack:
These are rough objectives for the server assembly as a whole. These translate into specifications on the battery pack itself.
Addressing keeping the server assembly running:
Addressing the server assembly changing its behavior:
Addressing the server assembly lifetime:
Addressing likely failure:
Addressing safe and convenient customer installation:
Addressing fire, toxic gasses, and similar safety issues:
Addressing supply chain attacks:
Addressing fitting into a standard rack:
Addressing the environmental conditions:
These example objectives are not all of what would be needed for a server battery pack, but they illustrate several of the kinds of concerns that the battery pack’s designers will need to consider. These rough objectives must be turned into more precise specifications in order to guide the designers accurately. For example, some of the statements above use subjective words like “nominally” that need to be made precise. Other statements are too general and need to be decomposed into a set of more specific statements.
“Specification” is a deliberately broad term, encompassing many different ways of recording what something should be or do (and why).
Many people assume that “specification” means “requirements”. While requirements are one kind of specification, they are not the only one—and requirements are not generally sufficient by themselves to record all the information needed about behavior or structure.
Kinds of specification include:
There are many kinds of models.
In practice I have found that no one kind of specification meets all needs, and have used multiple kinds of specification together.
Generally, each kind of specification we use meets the good specification objectives of being clear and testable, as defined earlier.
Mixing multiple kinds of specification, however, requires care in organizing the specifications. Different kinds are often written and stored in different tools (a tabular tool for requirements; a CAD tool for mechanical drawings). This easily leads to a situation where a practitioner cannot find all of the specifications to which they need to be paying attention.
One way we have addressed this is to use a table of textual requirement statements as a primary specification, and include requirements like “the component shall comply with state machine X”, including a reference to the drawing of the state machine. Using a tool that makes all these forms accessible through one common user interface helps make this convenient for users. Using tools that can perform configuration management across all the different forms of specification also helps.
We first look at how specifications are developed and used from the outside: from the perspective of those who are concerned with how a component fits into the system, and not with what the specification means for the design internal to a component.
A specification for a system derives from the objectives and CONOPS developed during the system concept development phase.
The system-level specification leads, in turn, to a system design and then recursively to the concepts and specifications for components in the system.
This is the first step in using specifications. The specification developer looks through all of the conceptual material assembled for the system or for a component, and organizes and formalizes it to make a specification.
In practice this does not happen all at once. People develop the various kinds of objectives that lead to the specification iteratively, and parts of the specification will be developed as the objectives and concept becomes clear. As people develop the specification, they will identify gaps in the concept, which will lead to improvements in the objectives and CONOPS and in turn lead to updates to the specification.
The needs that a system solves change over time. New capabilities get requested. Regulations evolve. Problems with the system are found and need to be fixed. All of these can lead to changes in the concept and thus to changes in the system specification.
The concept and design of components also changes, and for similar reasons. As well, a component may have a perfectly adequate design, but it may become outdated because subcomponents become unavailable. This leads to a redesign of a component, inducing new specifications for subcomponents.
It is important to follow an organized process when a specification changes. Many process standards recommend specific approaches; for example, ISO 26262 [ISO26262] specifies that any change to a system must begin with an impact analysis, which determines how a change to objectives or specification propagates through the design of the system, and downward through the hierarchy of components. Standards like that also specify that the specifications and designs be maintained under configuration control so that everyone can know whether a change is a work-in-progress proposal or has been committed to.
The specification must reflect all of the needs identified in the concept from which it derives, and the specification must not add needs that do not appear in the concept and objectives. Before a specification can be declared complete, someone must go through all the material in the concept to check that the specification accurately reflects each of the identified needs or objectives.
A specification validation exercise can also help identify gaps in the objectives. Checking the specification often involves someone who was not part of developing the objectives and CONOPS; a fresh perspective can lead to asking questions about the objectives or the specifications that in turn lead to discoveries of topics that are missing.
As the system design grows and more and more components are defined and specified, someone needs to check that the designs and specifications are all consistent. This is especially important for “long distance” dependencies: where the correct function of one component depends on the correct function of another component in a different part of the system. (More formally, when two components A and B depend on each other for correct function, and the lowest common parent of A and B in the component hierarchy is near the top of the hierarchy.)
As we will discuss in future chapters ! Unknown link ref, the safety and security properties of a system must be designed top down, and they need to be defined early in system development, before too many low-level components are designed.
We advocate using the systems safety methodology ! Unknown link ref, which emphasizes starting with the accidents or losses that are to be avoided, and then the conditions that must be maintained in a system to achieve safe operation. (This is different from many safety methodologies, such as functional safety, which focus on safety in the face of failure conditions and do not address safety problems arising from design or component interactions.) The categories of losses come from the safety and security objectives defined in the concept development phase.
Some example conditions:
Once these conditions are identified, systems engineers must determine how to address them in the design of the top-level system. They must then create derived specifications for each of the top-level components in the system, and show that if each of the components meets its specifications the overall system will exhibit safe or secure behavior by complying with the safety and security conditions. This process is repeated through at increasingly lower levels of the system.
A specification guides the design and implementation of parts of the system. Given the importance of this role, a specification—or an update to a specification—should be reviewed before being committed to. Each specification should be checked by the people whose work it affects: system designers, the designer of the component or system that contains the thing being specified, potential implementers, and those people who are working on components that will interface with or use the component being specified.
As with other system artifacts, a specification or specification update should be under configuration management so that each user can determine whether they are using the correct version or not, and whether the version they are using is a proposed or work in progress version, is the current approved (baselined) version, or a version that has become obsolete.
We now turn our attention to those people and activities who use a component to design and implement a component; that is, who are concerned with how the internals of a component reflect its specification.
There are two tracks of activity that use a component’s specification:
One track follows the design and implementation of the component itself, which should result in a component that complies with the specification. The other track follows the design and implementation of verification methods, such as tests or static analyses. The tracks come together when the implementation gets checked by the various verification methods, resulting in a determination of whether the implementation is in fact compliant, or whether the design and implementation need to be fixed to bring it to compliance.
A specification is an abstracted view of what a component should be. That makes it useful as a guide for someone who needs to learn about a component, before diving into the design or implementation of that component.
Someone who is learning about a component—or about the structure of the system across many components—needs to be able to find the relevant specifications. The specifications should be organized to support them:
The general task of a designer or implementer is to create a component that complies with its specification. In practice, of course, this is a complex activity.
The designer needs to be able to clearly identify all of the behaviors or capabilities that the component must implement. This implies that the specification must be organized in a way that helps the designer find all of these, and in a way that can serve as a checklist for tracking which features have been satisfied and which have not yet.
As we will discuss further in upcoming chapters, the designer or implementer should be able to identify which aspects of the component have the highest design risk or are the most technically complex. The designer and implementer will often choose to focus on these hard aspects first, before dealing with aspects that are easy to solve. The hard aspects are often candidates for prototyping, in order to determine if a design approach is feasible and can meet the specification. (See XXX for more on prototyping and risk reduction.)
Complex systems and components can benefit from the combination of incremental development and continuous integration. Incremental development involves selecting a few parts of the component’s specification and implementing those, followed by testing. Once those aspects of the component appear sound, the developers perform a second iteration by selecting a few more aspects of the specification and adding them to the design and implementation. Continuous integration, in this context, involves performing integration testing of these partial designs and implementations in a skeleton of the rest of the system. The partial implementation of this component may use mockups of subcomponents, or interact with mockups of peer components in the system. We discuss incremental development and continuous integration more in XXX.
As people work through design and implementation, they are likely to find problems or gaps with the specification. The specification may be ambiguous in some part, or the specification may not define the behavior for some condition. The developers must be able to work with those who defined the specification to sort out these issues. The developers should check the specifications in depth, asking the specifiers questions to check their understanding or to confirm that there are issues. The developers then should work with the specifier to resolve the issues.
The developers should not make an assumption about a gap or ambiguity and move forward without confirming their assumption. The people who wrote the specification are responsible for ensuring that the specifications for different components are consistent and address large-scale safety or security concerns. The behaviors needed to support correct interaction are encoded in the specification. The developers are responsible for implementing components that correctly support these behaviors so that the resulting system works correctly. The developers do not necessarily have the big-picture perspective to make changes to these critical behaviors, and do not necessarily know who else needs to know about an assumption in how a component is defined. The developers need to work collaboratively with those responsible for the specifications so that all the pieces of the system remain consistent and correct, and so that everyone involved shares a common understanding of how the components and system are to function.
A component’s implementation will need to be verified against the component’s specification. People using continuous testing or test-driven development methods have had good results producing correct component implementations efficiently by testing an implementation in small increments as functionality gets added to it. This reduces the risk that the design or implementation has made some fundamental, early mistake that becomes increasingly expensive to correct as more functionality is implemented on top of the erroneous implementation. Performing continuous testing (or verification) requires having verification cases defined and implemented concurrent with the implementation of the corresponding functionality.
Finally, each component design and implementation will need to be reviewed and approved before being accepted as finished. Verifying that the design and implementation comply with the specification is a major part of the review process. The review activities will be much easier if the specification is well organized.
As mentioned earlier, a component’s specification will likely change when a system remains in use for a long time. Systems engineers will need to investigate the impact of making a change to a specification before committing to the change.
The component designers and implementers are part of the investigation process. While a systems engineer can look at what will change in how a component interacts with other parts of the system, the component designers and implementers are better positioned to evaluate the effect that a change in specification will have on implementation or verification.
To change a design and implementation in response to a change in specification, the developers need to correctly determine what has changed in the specification. Having a clear mechanism for showing what requirements have been removed, added, or changed, and for showing specifically how other parts of the specification have changed, makes this task possible. In particular, being able to accurately enumerate every change is important; the developer should not have to hunt for subtle changes that may be hidden.
The decisions that are encoded in a component’s design include how different parts of the component interact with and depend on each other. When a component’s design is to be changed in response to a change in specification, some parts of the design will be directly affected. For example, a decision to add a new input message to a component directly implies that new message reception and handling functions must be implemented. However, one change can affect other parts of the existing design, and the designer and implementer must find and address all of these effects. The example new input message, for example, might require changes to a database schema for storing additional information, or might affect response time behaviors that require changes to foundational concurrency control capabilities in the design. Having a clear record of how parts within one component are designed to depend on or affect each other reduces the effort involved in making this kind of change, and reduces the chances of an error stemming from some dependency being overlooked.
The specification defines what a component should be or do; the design and implementation define how it is or does these things. Verification is the process of ensuring that the implementation produces behaviors that match the specification.
Every element of the specification should have a corresponding method for verifying compliance of the implementation. Different aspects of the specification will require different methods: some aspects can be verified by testing, such as showing that given some input A, the component responds with behavior B. Other aspects will require demonstration, such as showing that a physically representative user can see and reach control devices. Some aspects—especially safety and security—can only be verified by analysis or formal methods, such as showing that a component never enters performs some action identified as unsafe.
Verification methods involve design and implementation, similar to the design and implementation of the component itself.
Designing a verification method involves, first, determining how a specification property can be verified. (Sometimes a property is best verified using more than one approach in parallel.) once the approach—testing, review, demonstration, or analysis—has been determined, the next step is to design how that specific specification property will be checked. That can involve designing a set of test cases that cover the expected behaviors, or defining a test procedure to evaluate a mechanical component, or defining who will perform a review and what they will look for.
Implementing a verification method turns the design into a specific set of tools and actions that, when used, give a yes-or-no answer to whether the component is compliant.
The verification methods can have errors. Indeed, in some cases the verification of a property can be more complex than the component implementation it is checking. This means that the verification designs and implementation need careful scrutiny to ensure that they are, in fact, checking the specified properties and not something else.
The verification methods also must be complete: if some property is worth specifying, it is worth verifying. The verification designs and implementations need to be checked to ensure that they cover all of the specification. Explicitly recording which parts of the specification any particular verification method checks helps the task of checking completeness.
Finally, it is common for project management to track what portion of a component’s specification has been completed and verified. This can be organized by identifying each property in the specification, and tracking which verification methods check each one. As verifications are done, the project managers can determine which parts of the specification correspond to verification activities that passed.
Specification activities take as input the objectives ! Unknown link ref and CONOPS ! Unknown link ref artifacts that were generated during concept development.
The specifications themselves involve:
The elements in the specification should include traces that show how each individual part of the specification derives from some part of the objectives or CONOPS, and conversely how each part of the objectives is reflected in the specification.
The specification artifacts should be maintained under configuration management. That means that there should be a common repository that everyone working on the system can use to retrieve (and potentially update) the artifacts. The repository should maintain separate versions of each artifact, and clearly identify which version is the current, baselined version that people should use, which versions are outdated, and which are works in progress.
The configuration management system should support people reviewing a specification, and must support recording when a particular version has been approved to be baselined.
Requirements are one kind of specification: they say something about a property that a component or system should have, or a behavior they should exhibit.
A requirement is a specification in the form of a single, declarative textual statement. In the simplest case, a requirement is a statements of the form:
<thing> <specification mode verb, like “shall"> <do or exhibit something>
For example,
The encabulator shall be colored green.
There are many nuances and variations on this basic form, but they are all extensions of this basic idea.
Requirements are written this way in order to maximize the simplicity and clarity of the specification.
Requirements are only one part of the specification for a component or system. They document specific facts about a system’s design, but they do not document the explanation of how that particular design came to be. They do not document the general purpose and scope of a particular component. They do not document complex interaction patterns. These other parts of a specification are documented in other design artifacts that complement requirements.
One of the jobs of systems engineering is to ensure that a user or consumer of some artifact (system or component) will be satisfied with the artifact once it is built and deployed.
The specifications for a system or component serve as a way to organize the information about what the user wants, and to organize the process of checking that the final result meets the user’s desires. The specification thus acts as a kind of implicit contract between the end user and the implementers: if the user agrees that the specification properly records their objectives, and the resulting system can be verified to meet the specification, then then the implementers have built something that satisfies what the user agreed to. (Whether the user is actually satisfied is a separate matter.)
XXX would a couple diagrams help here? A first one might show user → conceptual artifact, conceptual artifact → developer → concrete artifact; a second one might show systems and verification in the picture?
This means that there are three main uses for requirements (and the rest of specifications):
A systems engineer is typically the keeper of the specifications, responsible for overseeing the writing, changing, and verification of requirements and other specifications.
Requirements—and all specifications—are therefore acts of communication between multiple groups of people with different roles in building the system.
Systems engineers are facilitators and interpreters in this communication between users and implementers. They are responsible for translating information received from users into specifications (including requirements), for explaining the specifications back to the users for validation. The information from the user is often unstructured and incomplete. It is up to the systems engineer to work with the user to clarify their objectives and ensure that the result accurately reflects the user’s intent. The systems engineer also works to ensure that the specifications are complete. This often involves identifying use cases that the user has not thought of themselves and working with the user to define what behavior the system should have in those other cases.
The systems engineer also facilitates the implementer’s work. The systems engineer develops specifications so that the implementer has a clear guide to what they need to design and build; this requires that the systems engineer provide translation or explanation when the specification does not use the same terms or concepts that the implementers do. The systems engineer is also responsible for ensuring that the final artifact meets the customer’s objectives by overseeing the verification of the implementation against requirements (and other specifications). This involves working with verifiers to ensure that verification methods match the requirements, and checking that all requirements have been verified before the system is declared done.
A systems engineer performs other tasks using requirements, such as checking consistency or completeness. We will discuss these tasks in a later section.
A good requirement must meet several objectives in order to provide accurate communication between all these parties:
These needs lead to conventions about how requirements are written and organized, as we will discuss later.
Requirements are a general-purpose way of writing down facts about what something is supposed to be (or not be).
Requirements can apply to just about anything. In a typical system project, they will be used to:
Requirements don’t stand on their own.
Most requirements in a system will apply to particular components in the system. The component breakdown structure provides the list of components that requirements can be about.
Requirements are part of more general specifications for the system and its components. The specifications include
The requirements must be consistent with these other parts of a component’s specification.
In the end, requirements are satisfied by the implementation of the components in the system. Being able to trace the connection from a component’s requirements to the pieces of the implementation matters in order to be able to show that the requirements are satisfied.
A requirement itself is a single statement about something that should be true about something.
More formally, a requirement has three parts:
where “be placed” is the verb.
Some examples:
Consider an example of a statement of what the mission manager for a small spacecraft mission wants:
A spacecraft mission wants a small spacecraft that is expected to operate in low Earth orbit (LEO) for at least three years.
This sentence has a number of problems. It mixes statements together: the mission and the spacecraft, the operating environment and the lifetime. The sentence is not very precise: what is “low Earth orbit”? What does the spacecraft have to do to “operate”? It is unachievable: nobody can guarantee that a spacecraft will function for a particular duration as an absolute guarantee; what if there is an unusual solar flare that fries its electronics?
We can improve the example sentence a bit by splitting it into three requirements statements:
These requirements improve the original statement. First, it splits the original so that each requirement is about a single topic (and is written in the subject-mode-property form). Second, it improves the description of two of the requirements by making them more achievable (“95% probability”) and precise (altitude range given).
These three requirements in themselves are not sufficient. Before the requirements are done being written, for example, there will need to be a definition of what “operate nominally” means. Similarly, the “at least three years” requirement is not enough by itself: three years would be difficult or impossible to meet if the intended environment were the surface of Venus; it would be almost trivially easy in the intended environment were an air conditioned clean room. Adding more information about the environment is necessary to interpret the three-year condition—for example, what is the expected radiation environment at those altitudes?
The three example requirements are not sufficient in another way: they are high-level and provide the designer of, say, a battery subsystem no guidance about how the battery must be designed so that the spacecraft meets these requirements. The derivation or flow down is the topic of an upcoming section.
A well-written requirement is concise. As such, it makes a statement about what a component should do—but the text of the requirement does not capture why the component should do that.
Good requirements should include a rationale statement that documents the thinking that went into choosing to make the requirement. The rationale does not change the requirement; it only adds explanation. The rationale helps those who must come along later, after the requirements are written, to understand or evaluate the requirements. It helps educate other engineers about considerations that may not be obvious. It helps those who later need to revise requirements understand what constraints there may be on the requirement they are changing.
Requirements actually come in groups; they are practically never singular.
The meaning of a group of requirements is the logical and of all of them. If there are ten requirements, an implementation complies with the requirements if it complies with all ten of them individually.
There are two issues to watch out for when there are multiple requirements: contradictions and exclusivity.
Contradiction: Two requirements contradict if complying with one of them means that it is impossible to comply with the other, and vice versa. Every collection of requirements must be checked to ensure there are no contradictions. The section on consistency below discusses this further.
Exclusivity: If a collection includes a requirement
it is perfectly reasonable to also have another requirement
A must do Y.
Having both of them means that there are two things that A must do.
The question then arises: if component A also does Z, is that compliant or not? In some cases it is okay if A does Z (it has a feature that isn’t used) and sometimes it is not (if it is important that A only does X and Y and nothing else ever).
The answer is that having requirements about doing X and Y means that the requirements are silent on Z. If the requirements are silent on a topic, that topic is not considered important and it doesn’t matter for compliance. (If the topic is important, it needs to be included in the requirements.)
If it is important that A only does X and Y and nothing else, that needs to be stated explicitly. This can sometimes be written directly into one requirement:
The component must be colored one of red, green, or blue
This can also be written in a general negative form:
The component must not do any activity not listed in these requirements
Explicitly listing the allowed activities is preferable to a “must not” requirement—the negative form is convoluted and easy to misread.
Even a moderately-sized system will typically have thousands of requirements. Users need some kind of organization of all those requirements in order to find the requirements they will be working with.
There are three concepts to discuss: organizing by subject, organizing by sections, and hierarchical writing.
People use requirements for different purposes. This leads to fundamentally different kinds or requirements.
At the most abstract level, the general product or mission objectives capture what stakeholders want the system to do—its purpose. These almost always start as general, vague statements. The stakeholders, system engineers, and product managers refine these over time into a clearer definition of the system’s purpose. The exercise may or may not result in proper requirements statements, but it is worth treating the results as if they are requirements and showing how the top-level system requirements derive from these objectives.
Projects also have guiding objectives that do not specify the system directly, but instead define policy or standards that the system must adhere to. There are many kinds of policies, including:
It is helpful to organize the product/mission objectives and all the various policies and standards into separate collections, identified by the kind of policy or source of objectives. For example, one can maintain one collection for business policy and a separate one for the quality assurance standard being used to build a system.
The top-level requirements on the system as a whole are part of the formal or semi-formal definition of what the system is to do. These requirements say what the system is and does when looked at from the outside, as a black box. These requirements are best kept separate from the more vague product/mission objectives—the objectives represent desires, while the top-level requirements represent the commitments made for what the system will do. The derivation mapping from objectives to top-level requirements provides a place to record the rationale for why different decisions were made about the commitments in the system, and why the decision was made not to commit to supporting some desires, represented in objectives.
Requirements on lower-level components provide definitions of what the pieces that make up the system must do. These obviously have a different scope than the top-level requirements for the whole system.
The first concept is that requirements should be organized by their subject, following the component breakdown structure.
The system objectives are those requirements that apply to the system as a whole. These typically encode the CONOPS for the system, along with requirements derived from the process or design standards.
The rest of the requirements apply to specific components within the system. The component breakdown structure defines what the components are, and gives them names.
Organizing by component is important for proper verification, so that each requirement can be connected to the implementation artifacts that are expected to comply with the requirement, and so that the implementer of some component can properly determine all the requirements they need to adhere to.
One single component or process/design standard can often have several hundred requirements. Users can find and work with all these requirements more easily if they are organized by topic as well as by subject.
This can be done by creating a set of topic sections within each component. Often these sections are the same for all components—sometimes empty when they are not relevant, but having the same organization across all components help people find what they are looking for.
There is no one recommended set of sections that will apply to every system. The choice of sections is affected by the kind of system or components being developed, as well as by process and design standards. For example, if an automotive project is following the ISO 26262 Functional Safety standards [ISO26262], the Safety Goals and/or Safety Requirements should be collected into one section.
As a starting point, we have used variations on the following set of sections in several projects:
It’s a good idea to work out one or a few section structures that work for your project, then use those sections consistently across all components.
Keep in mind that some requirements will always fit into multiple sections. For example, a requirement may both be about regulatory compliance and define a function the component is supposed to provide. Try to make consistent choices about which section a requirement goes in, but don’t try to make some perfect hierarchical section scheme that would let people avoid making such choices.
There are two general structures for organizing requirements on a particular topic:
The flat organization has all requirements within a section be at the same level. Each requirement is independent of the others and can be understood only by reading the text of the requirement.
The hierarchical organization places requirements into an outline, with general requirements and more specific sub-requirements. The sub-requirements must be read and understood in the context of their parent. The sub-requirements provide details, clarification, or limitations on the general parent.
Consider a set of requirements for security on a TCP/IP communication channel. The general requirement is that the communication channel should be authenticated and encrypted. In outline form, this looks like:
Consider requirement 1.1.1, requiring mutual authentication for the communication channel in question. The requirement for mutual authentication must be understood only to apply to communication channel X. There could well be another communication channel, called Y, that does not have the same authentication requirements.
Written in a flat style, the requirements might be expressed as:
Each of these statements can be read on their own; each statement includes all the necessary qualifications (“the protocol for communication channel X must…”) to identify the scope of its subject without having to refer to other statements.
There are pros and cons of each approach.
Every requirement needs a unique identifier.
People use this identifier to refer to the requirement, including using it as a bookmark or link to reference the requirement in other documents. Software check ins to a repository often use the requirement identifier to indicate what functionality is being added to the repository. Task management systems use requirement identifiers to track the progress on implementing and verifying particular requirements. In general, the requirement identifier enables the integration of requirements management with other tools and tasks
The identifier must be stable. That is, once a requirement has been given an identifier, that identifier should not change. The text of the requirement can (and will) change, but the identifier remains a stable way to refer to the requirement in documents, email, and other messages without having to track down all the uses of the identifier and change them.
It is good practice for the identifier to convey some information about the requirement. At minimum, the identifier should make it clear what component or body of external requirements the identifier applies to. If one writes requirements hierarchically, then using the number of the requirement in the outline is a good identifier.
Having the identifier carry some information helps the user check that they are referencing the requirement they intended to reference. It also helps the reader to know generally what the writer is talking about, without going into a requirements management system to check.
For many projects, I have used the format <component id>:<hierarchical requirement number> as the identifier. For example, space.eps.panels:3.4.2 for a requirement applying to a spacecraft’s solar panels.
There are requirements management systems that use a universal, flat namespace for identifiers, such as REQ-82763. This is not a good identifier, because it makes it hard to check when one has accidentally mistyped or miscopied the identifier into another document. If one accidentally types REQ-82764 into another document, that other requirement could apply to a completely different component—and the mistake is obscured.
Requirements are a way of communicating between people on a project: between the customer and systems engineers, between those who look at how multiple systems work together and those who implement the pieces, between those who design and those who test. A good requirement is one understood equally well by all the people who use that requirement.
Writing good requirements takes practice, but the following guidelines will help in writing and reading requirements.
Individual requirements have a general form:
<subject> <specification mode verb> <property>
The subject is often a component named in the component breakdown structure. It should be named explicitly:
The solar panels shall generate at minimum…
The rudder shall move between 10º left and 10º right
The majority of requirements use either the word “shall” or “must”, depending on the organization and industry. “Shall” indicates an assertion that the statement about the subject is to be true in the implemented system. “Must” expresses the obligation that the statement will be true in the system. In practice the two words mean the same thing when writing requirements.
The solar panels shall generate at minimum…
The flight computer must consume no more than X watts in any mode
The property is a predicate that should be verifiably true about the subject.
Writing the predicate is usually the complex part of writing a requirement. In some cases the predicate is simple:
The subject shall be painted green
The subject shall generate at most X watts of heat
In other cases, the predicate must have conditions added, saying when or under what conditions the predicate applies:
The subject shall generate at most X watts of heat while powered on.
Sometimes the requirement statement is easier to read if the condition clauses are presented in a different, natural order. However, the semantics remain the same: the clause is part of the property statement:
While powered on, the subject shall generate at most X watts of heat
A requirement should specify a single property of the subject. The examples above all deal with a single property.
There are requirements that may have multiple things in their property statement that still deal only with a single property. For example:
The widget must be painted green, gray, or white
Formally, this requirement deals with a single property: what color the widget may be painted. The color is restricted to a set of three colors—but the property in question is the color.
Note that this requirement is slightly ambiguous: it is not clear whether the widget can be painted only one of those colors, or some mixture of them. This requirement could be improved by either rewriting it as:
The widget must be painted one of green, gray, or white
Or adding a second requirement:
The widget must be painted a single color
A good requirement must be clear about what thing it applies to. In general it is best to write down a proper name of the subject—the name of the relevant component in the breakdown structure, for example.
This rule makes for a lot of repetition in requirements. “The control system must X”, “The control system must Y”, “The control system must Z”, and so on. While it means a little more typing, using the component’s name in each requirement means that each requirement can be understood on its own.
Use consistent terms throughout requirements. Always call component X by one name; don’t change it from requirement to requirement. Always call some one function by the same name, so that it’s clear that all the relevant requirements really are talking about the same thing.
Having lists of names or terms helps those who write requirements to use consistent terms, and provides those who read requirements with definitions when they need to confirm what a term refers to. This means:
Requirements (and the rest of specifications) may be written by one or a few people, but they will be read by many people. The readers need to understand correctly what the requirements mean. Many of those readers will be learning about the system by reading requirements or other documents, so they won’t enter into reading the requirements with the same context that system engineers writing the requirements will have.
This means: don’t get fancy with requirements language. There are some ways that requirements will sound stilted, like the subject-mode-property form. There is some technical jargon that is needed to make the requirement precise. But don’t make the language more complex than it needs to be.
For any words or phrases that do not have a meaning that will be obvious to all your readers, help them out by defining how those words are being used in the specifications. Start with “must” versus “shall” and any other mode words (see Advanced Requirements below). Provide a glossary of the definitions of the rest of the words.
Many organizations prohibit requirements that say “shall not”. Negative requirements have their place, but they are tricky to get right. The problems arise with exactly how broad or narrow the requirement actually is.
Consider a component implementation that could do one of three behaviors, A, B, or C.
If the component has a requirement “the component shall do A”, the implementation satisfies the requirement (it does A). That is because the requirement, as written, allows for the implementation to do other behaviors as well.
If the component has a requirement “the component shall only do A”, then the implementation does not satisfy the requirement because the implementation might do other things.
Now consider a requirement such as “the component shall not do D”. The implementation does satisfy the requirement, but not necessarily in a helpful way. Just because the component doesn’t do D, what should it do? Are behaviors A, B, and C all acceptable? What about behavior E?
In most cases it is clearer to name exactly the behaviors that are required, because that is unambiguous. One can write verification conditions to test exactly what is allowed.
Sometimes, however, one should write a negative requirement. If there is some behavior that really, truly must never happen, then writing a “shall not” requirement calls out that important condition, and a verification test can be designed to show that the system will not do the thing it isn’t supposed to. The negative requirement should usually be paired with a positive requirement that says what the system should do instead.
Safety and security properties often require stating a negative requirement, because these properties are fundamentally definitions of things that the system is to be designed not to do. I have not been able to imagine a way to write “a robot may not injure a human being” [Asimov50] as a positive requirement.
Verifying negative requirements is more complex than verifying positive requirements. See Section 14.4.
Avoid the word “it” and other non-specific pronouns or modifiers (“they”, “those”, “them”, “its”). Repeat the name of a thing involved in the property, even if that seems repetitive and wordy. An example:
The control system must enter mode X when it is allowed
This is better written:
The control system must enter mode X when mode X is allowed
Because the “it” in the first example is ambiguous: the word could refer to the mode or to the control system.
There are things that we want a system to do. When writing a requirement, it is tempting to write something like
The spacecraft shall function nominally for at least three years on orbit
Unfortunately, this three-year required property of the spacecraft is virtually impossible to meet (unless, maybe, the “spacecraft” is a large, inert chunk of rock). A spacecraft has many parts, operates in a difficult environment, and is built by fallible humans.
The problem with this requirement is that it sets a bar that is so high that no real spacecraft can meet it. The requirement does not allow for any off-nominal operation. It doesn’t allow for a spacecraft to have a temporary fault and then recover. It doesn’t allow for debris to impact the spacecraft. In fact, this requirement is met only when the spacecraft is perfect for those three years. Any real spacecraft will fail verification if it has a requirement like this.
This kind of requirement needs to be modified to something more realistic. There are many ways to do that. The NASA Systems Engineering Handbook has the rule that a requirement should specify “tolerances for qualitative/performance values (e.g., less than, greater than or equal to, plus or minus, 3 sigma root sum squares)” [NASA16, Appendix C].
Three common ways are:
Of course, these are often combined.
The point of a requirement is that someone can determine whether an implementation complies with the statement in the requirement. Operationally, this means that a requirement can be verified (see the section on verification below).
One way to make a requirement measurable is to specify the condition quantitatively. For example, a spacecraft’s battery must be able to store at minimum X milliamp-hours. It’s not hard for a test engineer to see how to create a test to verify that the battery complies.
Other requirements, especially those that specify an action that should be taken under some condition, aren’t quantitative, but instead are measured by observing whether the required action is taken. The verification tests will involve either creating the condition under which the action is to occur or observing that the condition has occurred, and then observing that the required action has been taken. For this kind of requirement to be useful, a test engineer must be able to understand accurately the enabling condition and be able to create or detect that condition. The test engineer must also be able to understand the action that is supposed to occur, and detect that it has occurred. If the enabling condition or action can’t be detected, then the requirement is not readily measurable.
Requirements on low-level components are often easier to make measurable than requirements on high-level components. This is why high-level requirements are often verified by looking at requirements derived from the high-level requirement rather than by trying to construct a verification test directly on the high-level requirement.
When writing requirements for human-machine interaction or user interfaces, the underlying need is that a user can understand what the system is doing, and give it the right commands so that the system does what the user wants.
How would someone verify that the system as designed or implemented actually meets this objective? The statement is too vague actually to test.
There are multiple ways to address this issue.
First, one needs to break the objective up into a number of more-specific objectives. This often involves putting together a list of what it means to “understand what the system is doing”. This might involve:
And so on.
This breakdown is an improvement over the original desired objective, but the conditions are still not verifiable. As we will see in the later section on requirement derivation, these can be turned into high-level requirements that are broken down further, and the verification condition on these high-level requirements consists of, first, verifying all of the derived requirements, and then showing an argument that satisfying all the derived requirements shows that the high-level requirement is satisfied.
The derived requirements about “perceiving” or “observing” are themselves not verifiable: how does one verify that a person has observed, or can observe, some state of the system? This needs to be broken down into yet further, more specific requirements. For example,
Observing how much fuel the system has remaining
Is a process, consisting of a chain:
System has fuel → system can measure how much fuel → system transmits this information → an indicator shows the amount measured → a person can see the indicator → a person can accurately observe the indication XXX
If all these steps are satisfied and work correctly, then the person should be able to see the amount of fuel remaining.
Focus on the last two functions in the chain: that a person can see the indicator and that they can observe the indication. Seeing the indicator can be in turn broken down into further requirements, primarily on the physical structure around the person. For example, some of these might be:
There is some prerequisite information needed to verify these examples. For example, what range of sizes will the users be? In order to check for unobstructed line of sight, one must know where the user’s head will be. What visual acuity or color perception abilities are required of the users? A color blind user will not be able to perceive some color differences that might be used to convey necessary information. What expectations will a user bring to the task? If a user is socially conditioned that green means good and red means bad or stop, using different colors to indicate good or stop will be hard for a user to interpret.
How would one go about verifying these requirements? There are multiple techniques that will help—and usually the techniques must be used together to really check whether a requirement is satisfied. These techniques are a combination of analysis using models and real-world measurement.
The experimental approaches are often the most expensive in time and money, but they are the gold standard for verifying a human interface requirement. Conforming to standards can help address expectations that users will bring to tasks.
In summary, there are several tools for addressing requirements that are too vague or complex to verify:
XXX revisit this section to bring it into line with the Leveson viewpoint on user interaction as control
Requirements should be written as a description of what one sees in a component when looking at it from the outside—a black box view. A good requirement does not go into how the feature or behavior is implemented inside the black box.
Put another way, the requirements for a component are documentation of how the component fits into the system around it. If component A is part of a larger component B, the requirements on A document what the implementation of B needs for A to do its part correctly. If components C and D are peers, the requirements document what they will need from each other for both to do their job.
This matter connects directly to requirements derivation from component to subcomponent, which is discussed in the next section.
There are four reasons to follow this rule.
It is tempting to skip right to the details of how a component is built. Don’t do it; provide other people the benefit of your understanding of the problem, not just the final design answer.
XXX revisit this to bring it in line with system model terms
No requirement stands entirely on its own. Almost all requirements have some reason that they have been included in a system, starting with: this requirement is necessary so that the system meets some objective. In lower-level components, the reason often is: this requirement is necessary so that this component provides some feature that other components depend on.
These are examples of requirement derivation. Derivation encodes the relationship between requirements.
Almost all requirements are derived from other requirements, and the requirements in a system must keep track of how one requirement leads to another, or how one is dependent upon another.
There are several kinds of relationships that people record. Some of these are:
Let’s look at each of these kinds of derivation.
A parent component has a requirement that the component provide some feature. The requirement in the parent specifies what the parent must do, but does not specify how to implement that feature. The design of the parent component, and later, the implementation, document how the parent component will satisfy that requirement.
When the designer decides on the implementation, they will decide (among other things) how the parent component will use subcomponents to implement the feature. These decisions create requirements on the subcomponents so that they provide the features that the parent component will use.
The reason for these requirements on subcomponents is that they are necessary to satisfy the requirement on the parent component. A derivation relationship between the parent requirement and the subcomponent requirements documents why the subcomponents have the requirements they do.
Consider a spacecraft example. The spacecraft as a whole has a requirement that it be able to point at a ground location, with some number of degrees of accuracy. To implement that feature, the spacecraft designer chooses to use the spacecraft’s attitude control system to point the spacecraft toward a ground location, and then slowly rotate the spacecraft as it passes over the ground location. The parent component—the spacecraft—has the high-level requirements for what it needs to do. The subcomponent—the attitude control system—must be able to slew accurately to an initial pointing vector, and then be able to slew slowly and accurately until the spacecraft is done with an observation. The slewing accuracy and speed are the derived requirements on the attitude control system.
The process continues recursively. The attitude control system designer decides to use reaction wheels as the primary attitude control mechanism. The requirements for slewing accuracy and speed create requirements on the reaction wheels for how quickly or slowly they can turn the spacecraft.
Some components will have a requirement that specifies a very high-level capability the component must provide. For example, in a section on disposing of a component that is being discarded:
The component shall have a procedure for disposal that ensures that no confidential information is leaked to unauthorized parties
There are several ways this requirement could be met: destroying the retired component in house, crashing the component into the atmosphere or ground in a way that will assure the component is destroyed, or erasing the data on the component before giving the component to an outside entity for recycling.
Whatever the implementation decision is, it creates more requirements on the component, and those requirements derive from the decision on how to satisfy the requirement on protecting confidential information. If, for example, the implementation decision is to recycle a retired part, then this might lead to requirements like:
The component shall provide an interface by which an authorized user can command the erasure of all data stored in the component
The component shall provide a function that erases all data stored in the component
In some organizations, the practice is only to record derivation from one component to another. Sometimes that works out; in the example, the requirement for an erasure command could be on a command handling subcomponent, and the erasure requirement could be on a memory component. However, some components do not break down into subcomponents easily—for example, when the component is being implemented by an outside vendor. In other cases, it is simply clearer to document the implementation requirements for the component directly and then passing the requirements through to subcomponents, so that a user can see the totality of the functional interface to the component in one place rather than having to search through subcomponents for something they don’t know exists.
External objectives and standards often impose general requirements on “all components of type X”, or the like. For example, an automobile might have a requirement that all electronic components function nominally across a temperature range of -40º C to +125º C. (See the section on Sets as subjects below for more on this.)
This requirement can be placed on the automobile as a whole; the requirement might read
All electronic components in the automobile shall function nominally across the temperature range of -40º C to +125º C
If the automobile includes engine, braking system, and entertainment systems as parts, the temperature range requirement can be passed down to those subcomponents:
All electronic components in the engine system shall function nominally across the temperature range of -40º C to +125º C
All electronic components in the braking system shall function nominally across the temperature range of -40º C to +125º C
The braking system controller unit shall function nominally across the temperature range of -40º C to +125º C
But the entertainment system, which is not safety critical and operates in the more benign environment of the passenger cabin, might have the requirement:
All electronic components in the entertainment system shall function nominally across the temperature range of -10º C to +50 C
In these examples, the general requirement is copied down into lower-level subcomponents until it reaches some component (such as the braking controller in the example) that does not have further subcomponents. Sometimes the requirement is copied verbatim, just changing the scope of the subject; other times, some component will have a variant on the general requirement.
This kind of derivation is sometimes referred to as allocating requirements to subcomponents.
Sometimes two components are peers of each other, and need to interact. A fuel tank provides fuel to an engine; a spacecraft communicates with a ground station to send telemetry and receive commands; client and server applications send messages to each other.
These interactions involve requirements on each of the components involved, showing how the components support each other. The fuel tank must send fuel; the engine must consume fuel. The spacecraft must be able to communicate with the ground station; the ground station must be able to communicate with the spacecraft.
This leads to pairs of requirements that record this mutual dependency. At a high level,
The spacecraft must be able to communicate with ground stations using protocol standard X
and
Ground stations must be able to communicate with the spacecraft using protocol standard X
These two requirements should show a two-way relationship with each other. (Formally, this introduces a cycle in the derivation graph.)
Derivation shows how requirements are related to each other.
Systems engineers use the record of these relationships for several tasks.
A derivation relationship between requirements on two different components helps to document the implementation approach for meeting a higher-level requirement. When a designer looks at the high-level requirement, they can see what features are used to implement the high-level requirement. The lower level requirements and their rationale allow the designer to see the argument that the implementation will be sufficient to meet the high-level requirement. This makes the design rationale available to people who didn’t create the design in the first place, but need to understand it to evaluate it or to make changes.
The section on analyzing requirements, below, goes into more detail on how one can look at the requirement derivation relationships to evaluate completeness or sufficiency, to argue whether low-level features are actually necessary, and to trace out the effects of making a change in requirements.
There are two ways that a user should be able to see derivation relationships. First, when looking at any one requirement, the user should be able to see what requirements this one is derived from directly, and what requirements derive directly from this one.
Good requirement management tools will also provide a view of the graph that shows derivation graphically. Derivation relationships can be viewed as a graph, as a way to see multiple levels of derivation. The graph is typically mostly a tree or DAG, but there are legitimate reasons that the graph will sometimes have cycles (between peer components, for example).
Here is an example showing how a top-level requirement is the source for a number of other requirements.
All the requirements discussed so far are simple requirements. Simple requirements have a single, clearly specified subject component. Each simple requirement expresses one property about that subject that must be true.
Simple requirements are not sufficient to express every need that real systems encounter. There are two that we have seen many times: requirements on sets of components, and requirements for standards.
Consider a system where all code is expected to adhere to a published coding standard. The implied requirement does not apply to any single component; it applies to all of them that include software.
This expectation can be written as a top-level requirement on the system as a whole:
All subcomponents of <the system> that include software shall adhere to the XYZ Coding Standard.
The subject of this requirement is the set of all software components in the system. The property is that their implementation adheres to the named coding standard.
This kind of requirement is placed on the top-level system, and then each first-level subcomponent includes a derived requirement that propagates the requirement downward:
All subcomponents of component X that include software shall adhere to the XYZ Coding Standard.
On a component Y that has software as part of its implementation then has:
The software in component Y shall adhere to the XYZ Coding Standard.
If component Y has subcomponents, Y should also have a second requirement that continues to pass the requirement down to Y’s subcomponents.
This is an example of a general technique:
Many texts on requirements approach the subject from an assumption that there is one system being built: these are the requirements for System X. System X will be built in its entirety as specified; any and all requirements must be satisfied.
Writing standards is a different problem. A standard is specifying requirements on multiple hypothetical systems that may exist at some point. Those systems will not be identical, but the systems that adhere to the standards must adhere to the requirements in the standard.
Standards often provide options. The standard has a set of optional features. If the system chooses to implement those features, the features must conform to the standard. However, the system does not have to implement those features. This means that the system does not have to satisfy every requirement in the standard.
Some standards also present best practices. For some feature, it is recommended that the feature conforms to a part of the standard, but it is not absolutely required to do so.
The vocabulary of “shall” or “must” does not accommodate these situations well. The Internet Engineering Task Force (IETF) has defined a richer set of requirement modes. For example:
MAY. This word, or the adjective “OPTIONAL”, means that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item. An implementation which does not include a particular option MUST be prepared to interoperate with another implementation which does include the option, though perhaps with reduced functionality. In the same vein an implementation which does include a particular option MUST be prepared to interoperate with another implementation which does not include the option (except, of course, for the feature the option provides.) [BCP14]
The words used to indicate these more complex conditions must be defined just as carefully as “must” or “shall”, and must be used consistently.
Many people think of requirements only as a contract for guiding implementation and a checklist for performing verification tests later. However, requirements—along with other specifications—are useful in themselves for helping build a design and making sure the design is good.
There are three kinds of analysis that systems engineers do on the requirements themselves:
These are all analyses that should be done on the specifications of a system, including the requirements, and not delayed until implementation. Some of these tasks are easier to perform on the abstracted and simplified view of the system that specifications give. Performing these tasks before implementation will reduce the amount of re-implementation needed when one finds that the requirements aren’t sufficient or minimal.
The expectation is that if a system is built to conform to its specification, including requirements, that the system will do the job that its users need and do it correctly. (Of course, this assumes that the top-level specifications are themselves a correct and complete record of the users’ objectives; we discuss this more in the section on validating requirements below.)
To meet this expectation, the system’s requirements need to be complete and correct. This means that when one looks at any given top-level requirement, one can trace out the features on other components that will be used to implement the requirement and argue that those features will combine correctly to produce the desired result.
There are two parts to this analysis:
Having tools that allow one to view parts of the derivation graph in visual, graphical form is invaluable to performing this analysis.
Consider an example. A UAV (drone) is supposed to receive and process commands from an operator on the ground. This leads to requirements:
These requirements are not complete, because they leave out a critical step: when a command is sent from the ground operator to the UAV, the message first goes to the transceiver. The receiver extracts the message, and then sends the message to the command and data handling component. The example omits the part about the transceiver and command handler passing information to each other. This means that one could build an aircraft that had a radio and had a flight computer, but the two would never talk to each other. Obviously, the UAV would not be acting on commands with that design.
This leads to a more complete set of requirements:
In the example, the communication between the transceiver and command handling components should be documented in some other specification for the UAV, perhaps an activity diagram showing how commands flow through components. The requirements then need to be checked against these other parts of the specification to make sure that all of the functions in each of the steps are reflected in the functions each component is required to implement.
Sometimes determining whether a set of requirements is complete or not will require further analyses. As a simple example, the maximum mass for an aircraft might be X kg. Making sure that the aircraft’s overall mass comes in under that limit means enumerating all the components in the aircraft that have mass, adding up their mass, and determining that the result is below X kg. For that analysis to be complete, it cannot leave out, say, the mass of the motors; all components must be considered.
As a more complex example, a system might have a maximum acceptable failure rate target. Being able to argue that the system is reliable enough involves performing a fault tree analysis, enumerating all the ways that failures in components can lead to system failures. The analysis cannot leave out components and be complete; nor can it leave out some failure modes of some of those components.
Checking whether the design is complete is not a simple task that can be performed just by inspecting the graph of requirements. The analysis is helped by being able to see the requirements, but it requires imagination and effort to actually check the result.
XXX Sidebar: relationship to Goal Structuring Notation and safety cases
Every feature and every requirement on a component should have a reason for being there.
At the top level, for the system as a whole, only features that address customer needs or business objectives should be included. At lower levels, the only requirements that should be placed on components should be ones that are actually needed to make the system work properly—meaning the system meets those top-level objectives.
The derivation relationships between requirements encode the reasons for a requirement to exist. This leads to a condition that should hold across all requirements:
Every requirement for a system and all its components should derive from one or more customer or business objectives
This is straightforward to check using the derivation graph: every requirement should derive from at least one parent requirement, and it should be possible to trace upward through the derivations to reach a customer or business objective.
Often while requirements are being developed, a requirement will be placed on some component without setting up the derivation. This requirement will not have a parent, and so the checking method will flag it. But what to do then?
In most cases, there was a good reason that someone wrote that component requirement. When one finds a requirement that is not documented as supporting some higher-level reason, it is worth exploring why that requirement is valuable. In some cases, the parent requirement(s) are present, and the requirement just needs to be linked to them. In other cases, the requirement can be a clue that there is some higher-level principle that the writer had in mind, and that higher-level principle should be added into the requirements higher up in the system.
For example, consider a data storage component where an engineer placed a requirement that all data be stored in an encrypted form. As written, that requirement doesn’t derive from any other requirement. But why did the engineer believe that encryption was necessary?
One answer is that encryption isn’t necessary. In that case the encryption requirement can be removed. Another answer is that the engineer wrote that requirement because they believed that the component would be storing confidential data that should be protected against disclosure. In that case, it is worth checking: does the system have requirements—or business objectives—about protecting confidential data? If not, then this exercise will have found a topic that has not been adequately addressed, and new requirements need to be added to make a correct specification. Those requirements should be added throughout the system, and the requirement we started with should show that it derives from those new features.
Many such requirements result from external standards that are supposed to be met, such as regulatory, safety, or security standards. Those standards should be included in the external objectives for the system, their requirements should flow down through the system to the components where the standards apply. This produces a record of how the system’s design complies with those standards.
Some requirements that show how they are derived from some parent requirement are still not actually necessary.
There is no simple, mechanical way to find these unnecessary requirements. However, the analysis used to determine whether a collection of requirements is complete is also useful for finding these unneeded requirements.
Consider this example:
The requirement about encryption is not actually needed for the system in question. That is because the connection between the transceiver and command handling components is physically contained within the UAV, and the physical encapsulation provides enough security to protect the messages passing between the two. The encryption requirement can be removed with no loss of capability.
However, in this example, the engineer who wrote the encryption requirement had a good idea but expressed it wrongly. The engineer understood that the integrity of communication between the two components was important; a command that was properly received but garbled in being sent to the command handling component could be a problem. The presence of the encryption requirement should be replaced by a less costly requirement, that the channel must protect the messages it carries against corruption.
Consistency in a body of requirements is when the requirements don’t contradict each other. If requirements do contradict each other, the system as specified isn’t implementable and the specification needs to be fixed.
Broadly speaking, there are three kinds of consistency that one should check:
As long as requirements are written as text, and not in a formal notation, consistency checking will be manual. It involves reading through each requirement, finding other requirements that address related topics, and checking that they are consistent with each other.
Some inconsistencies are fairly easy to detect. If one requirement says component X shall be blue and another says component X shall be red, it’s obvious—one must just read through all the requirements on component X and see that two requirements both deal with the color property and they say opposing things.
Other inconsistencies are harder to spot because they do not use the same language in the properties they are specifying. As an example, one requirement might say component X shall use encryption algorithm Y while another requirement says component X shall use protocol standard Z. If protocol standard Z allows encryption algorithm Y, this is fine. But if the standard does not allow that particular encryption algorithm (perhaps because the algorithm is outdated and no longer considered secure enough) then there is an inconsistency.
Another class of inconsistency comes from the states a component can take on. Elsewhere in the specification of a component, there should be a definition of the state machine that the component is supposed to follow. The requirements translate that state machine into individual actions that the component is expected to take in response to particular inputs. It is easy—especially when editing or updating the component’s specification—to have two requirements: when condition A occurs, component X must transition to state Y and when condition A occurs, component X must transition to state Z. The inconsistency can be more subtle, such as leaving out some transition, or using inconsistent definitions of the condition that causes the transition. This class of problem can be addressed by having a single, clear definition of the state machine the component is expected to follow, and then checking the requirements against the state machine.
Finally, another class of inconsistency that can be hard to detect has to do with timing. Two requirements can impose timing constraints that cannot both be satisfied. For example:
When event A happens on component X, event B must happen within 10 milliseconds
When event C happens on component X, event B must happen within 15 milliseconds
Component X must perform the events A, C, and B in that order
There is no way for component X to meet the timing requirements given the order that events must occur. Building a timing model of the component in question, and performing a timing feasibility analysis using that model, can help find this kind of inconsistency.
This is by no means an exhaustive list of the kinds of inconsistency one must look for.
Systems change. This can happen because customer needs change, or because technology changes, or because someone has found a better design for part of the system. A good development process supports constant evolution and change of the design and implementation of a system.
Not every change that is proposed will be performed. When someone proposes a change, someone else will analyze the proposal to determine the effects of the change. Based on this analysis, people may decide to go ahead, postpone the change, or not make the change.
The analysis must accurately determine:
This analysis makes use of all the specifications in the system, but requirements are a major contributor. In particular, the derivation relationships help show how component features depend on each other, and thus help guide an analysis of how far some change will spread.
Top-level changes include adding a new feature to the system, removing a desired feature, or changing a standard or other external source of requirements.
If the change changes a top-level requirement, look at the derived requirements from that changed requirement and see if the derived requirements are still necessary and sufficient to satisfy the newly-changed requirement. If they are, then no further action is needed. If they are not, then the derived requirements must be revised, possibly adding or removing some of them. The process then needs to repeat with these changed derived requirements. If the change affects a requirement that supports a different top-level requirement, then one must check that the other top-level requirement is still satisfied by the changed derived requirements.
If the change adds a new top-level requirement, work out what derived requirements are necessary and sufficient to satisfy the new requirement. Look for lower-level requirements that already exist that can also support the new requirement. This may involve a change in design, not just requirements; this will cause more changes to propagate out.
If the change removes a top-level requirement, see if any lower-level derived requirements are no longer needed or can be relaxed. If so, work downwards to propagate the effects of those changes.
Many more changes will come to lower-level components in the system. There are many reasons this can happen: because people have found that a design in process is infeasible or too costly; because a vendor’s part specification or availability has changed; or because someone has found a better design for some lower-level component.
Evaluating a lower-level change involves all the checks for a top-level change above, along with the need to see how the change will affect higher-level requirements. Will the change leave the higher-level requirement unsatisfied? Will this change make some other sibling requirement redundant (that is, the parent is satisfied without the sibling)?
Tracking down these effects is much easier if the derivation relationships among requirements are accurate.
Good tools help the process of evaluating changes. There are three features in particular to look for:
XXX rewrite this to bring into line with introductory language on deriving verification
Validation is the process of determining whether a set of requirements accurately reflects the needs of the system. This can mean that the system will meet customer needs, or mission needs, or other external objectives.
It is important to keep validation separate from verification, which is discussed below. Validation is about seeing if the requirements (and the rest of the specification) is an accurate reflection of external needs. Verification is about seeing if the implementation is an accurate reflection of requirements. (Some software engineering texts focus validation on consistency, completeness, and similar properties. Systems engineering has generally kept those kinds of checks separate from validating customer or mission satisfaction.)
The validation process starts with checking the system objectives, business objectives, security and safety objectives, and regulatory objectives to see if they are an accurate reflection of the customer or mission needs. Presumably appropriate care has been taken while these objectives are being gathered and written down, but mission understandings or desires change over time and an independent check on the objectives will help avoid having problems be discovered late, when it is expensive to make changes.
At the top level, one should check:
At lower levels, one is checking whether the derived requirements from a parent are necessary and sufficient. The analyses for complete and minimal design, discussed above, cover those checks.
There are many different ways to validate a system’s specifications. They generally fall into two groups: analysis and simulation.
XXX improve language: analysis as formal method vs review as informal
Validation by analysis involves people reviewing the requirements and using their judgment to check the specifications. This can involve performing joint reviews with stakeholders so that they can check the requirements.
Validation by simulation involves stakeholders somehow seeing a model of the system in action. There are many ways to do this. Stakeholders can be invited to define some scenarios that represent how they will use the system, and then try out those scenarios using a model of the system. Some ways we have done this include:
These validation exercises should be completed and the stakeholders should concur that the specifications are correct before one baselines the specifications, including requirements.
People must be able to navigate from a requirement to its associated implementation artifacts and vice versa. The people implementing a part of a system according to requirements need to be able to quickly and accurately find the requirements that they need to comply with. In the other direction, the people verifying requirements must be able to find the artifact or artifacts that implement a particular requirement.
The approach to organizing systems artifacts that I advocate here, which organizes many systems work around a hierarchical component breakdown structure, is designed to meet this need conveniently. The set of requirements that apply to some component are implicitly connected to other specifications and the implementation of that component because they are all organized by the same component names and identifiers.
One can also explicitly label artifacts with component identifiers or requirement ids. For example, verification test specifications are associated with specific requirements, so the test specification needs to be labeled with the requirement ids that it applies to.
Verification is the process of showing that the implementation of the system, or parts of it, complies with the requirements.
Verification involves gathering evidence that every requirement is satisfied by the implementation.
There are four general methods used to verify the implementation’s compliance:
Inspection is verification by having people review parts of the implementation to check that it complies with a requirement. The inspection review should be performed by people who did not implement that part of the system, so that the reviewers are not misguided by preconceptions (“I’m sure I implemented this correctly”).
Some inspections are particularly simple. Consider a high-level requirement that is the source for a few lower-level requirements. In many cases, the high-level requirement is satisfied when the lower-level derived requirements are all satisfied. In these cases inspection becomes a simple matter of checking that the derived requirements are all satisfied. The rationale associated with the derivation or with the high-level requirement should indicate when this situation applies.
Test and demonstration are similar. Testing is generally more exhaustive, and necessary lower-level components. A single electronic component, for example, might be operated across all the specified thermal, vibration, and atmospheric environments it must handle. Demonstration is less exhaustive, and used to verify top-level system objectives. A prototype spacecraft radio transceiver might demonstrate that it can communicate with ground stations from a similar orbit to where the final spacecraft system will operate.
Some requirements cannot effectively be verified by test or demonstration, and must be verified using analysis. This occurs when one is verifying a negative condition: the verification must show that the system will not perform some action or be in some condition at any time. Providing evidence of the absence of some condition is a long-standing scientific and engineering problem because proving the presence of some condition is relatively easy—demonstrate it happens in one case and that’s sufficient—but showing absence often requires exhaustive search. These verification problems often arise in safety and security requirements, where unsafe failures must be rare (e.g. no more than once in 109 operating hours) or a system must resist a class of attacks (showing that no attack of that class will succeed).
Each requirement should have an associated verification specification. The specification should lay out what steps must be taken to determine whether the implementation is correct or not. A verification specification is often complex—many pages of documentation for a three-line requirement.
Verification status is a measure of how well the implementation matches the specification, including requirements. In practice this means how well a version of the implementation complies with a version of the specification, as both implementation and specification evolve over time. This means that, during design or implementation, there is no one single “verification status” that can be tracked: with each new update to the implementation, the verification status changes. Some practitioners and tools make the mistake of tracking verification status only in terms of requirements: which requirements have been satisfied by the implementation? This leads to project management errors when a change is made to the implementation that improves the implementation in one area but causes other parts of the system to go out of compliance—a common occurrence while in the middle of implementation using iterative approaches.
Requirements have limitations. Writing a good specification for a system means understanding these limitations and addressing them in one way or another.
One limitation is that requirements are written in natural language. Human language is notoriously difficult for pinning down precise meanings, even within a single group of people. Specifications, including requirements, are used to communicate between different groups of people with different outlooks, experiences, and jargon. This makes it hard to write requirements that will be interpreted the same way by all of the people involved.
The limitation of natural language can be partly mitigated using a couple of techniques. One is to maintain a glossary that defines words or phrases that have specific meanings in the specification beyond common understanding. The second is through social cohesion: having enough people from different groups interacting and discussing the system so that they evolve a common understanding of the meanings of things.
Precision is another limitation. Some specifications can be clear and simple in mathematical notation, while they are hard to follow in prose. (Consider expressing Newton’s law of gravitation as an equation versus in prose.)
A third limitation comes from requirements being single statements. Sometimes the specification needs to encode a complex, multistep activity. Each of the steps might be encoded as an individual requirement, but it is awkward and hard to understand. Sometimes the better answer is to write part of the specification in a different form—a flowchart, a state machine, or a set of equations.
As a result, requirements are only one part of the total specification. They cannot do the entire job of recording the full specification of the artifact in question—but they are often the most flexible way to organize most of the specifications. Be prepared to supplement textual requirements with other kinds of specification to get the whole job done.
This chapter has mostly covered what requirements are. This section touches on what one does with them and how they evolve over time.
Requirements will change continuously over the life of a project. The rate of change will be high at the project’s beginning, when the team is trying to sort out what the system should be. The rate of change will increase after the high-level system purpose is sorted out and as the design work proceeds in parallel on different components in the system. The rate will taper off as the design and implementation become more mature, with occasional bumps as people find problems with the specifications, or as stakeholders request changes. Ideally the rate will reach zero when the system is ready to go operational, but even while in use people will find changes they would like to make.
Detailed requirements are expensive to develop and maintain. They encapsulate the complexity of how all the parts of a system are interconnected. They require effort to develop in the first place, involving checking for consistency and feasibility across large parts of the system. Changes later involve even more effort, especially if the changes involve reorganizing specifications that have already been developed.
This leads to a tension: changes will always happen, especially with modern, flexible systems, but the cost incentivizes developing all the requirements at once and then freezing them to minimize the cost of change.
This tension is unavoidable, but there are things one can do to reduce the difficulty.
The requirements for a system—and indeed all the specifications for the system—grow and evolve over time. The times and ways when requirements change depends on the development process a project is using. However, all these processes share some tasks in common.
Collaborative development. In some phases of developing the specifications and requirements for a system, there will be many unknowns and the possible specifications will be in constant flux. In periods like this, many people will be involved in writing down possible requirements, often collaboratively. In phases like this, what matters to people is the ability to quickly sketch out some requirements, and the ability to share and collaborate on these sketches.
Incremental change. At other times, when the requirements and specifications are more stable, there will be incremental changes to the requirements. When someone makes a request for a change to the system, a systems person will need to evaluate the effects of that change. The ability to trace out the implications of a change using derivation relationships helps make the analysis process accurate. As the systems person works out the effects of the change, they need to be able to create an independent working version of the requirements where their updates will not affect an official, baselined version of all the specifications.
Baseline. While the requirements and specifications will be in some degree of flux all the time, the people who use those requirements need stability. The most common approach is to designate a version of the requirements as the current stable version, and then control updates to that stable version. The stable version goes by different names in different fields: baseline, release, plan of record, committed version. For the purposes of this document, we use the term baseline.
A project should use a configuration management or version management process to maintain the baseline requirements. There are many tools that automate such processes. The key features needed are that
Review and approval. People will propose updates to the system’s design as a project moves forward. This occurs often at the beginning of a project, as the design goes from vague ideas to concrete specifications; it continues during the life of the project as stakeholders ask for changes, as engineers find problems or improvements with the current design; and it can continue after a system is released to operation, as people find problems in actual use. These changes will result in specific proposed updates to the requirements. The proposed updates need to be checked before they are accepted and applied to the baseline. Once applied to the baseline, everyone developing the system implementation will need to work to revise their part of the implementation to match, and verification steps will be required, and so on—thus it is important to control changes to the baseline to be sure that they are sound and within the project’s scope before committing to them.
Projects generally use a review and approval process to decide whether to apply an update to the baseline or not. In the review part, systems engineers check the updates to ensure they meet guidelines, including consistency, completeness, and minimality. People who will be affected by the update are asked to review the update, to evaluate whether it is technically correct from their point of view and whether the change is feasible. Project managers are asked to evaluate the update to determine whether the change is in scope and whether there are resources to accommodate the change. If all those parties agree, then the update is approved and someone creates a new requirements baseline that incorporates the changes.
Verification. The implementation of the system needs to be verified from time to time to ensure that what is being constructed complies with specifications. Verification can happen at many different times and with different scopes. As someone implements a feature into a component, verification tests can provide immediate feedback to the implementer. In software development, this is related to test-driven development. Regular verification activities can detect whether a change in the implementation in one place has had an unexpected consequence that causes something else to go out of compliance. This is sometimes called continuous integration testing. When a vendor supplies a prototype component, the prototype needs to be verified for acceptance testing. And when the system is believed to be complete, final verification checks are required before the system enters into operation.
Many people generate or use requirements during the lifetime of a project. These include:
The right tools make working with requirements much easier and more accurate. However, different requirements management tools are designed to support different styles of requirement writing and use, so you need to choose tools that match how you will write, organize, and use requirements.
Here are some questions that can help you evaluate requirements management tools.
People will use the requirements management tools to perform a number of tasks. You should evaluate how well requirements tools support these activities.
Previous chapters introduced how to work out what a system or a component should do, by determining what the objectives are for it and then turning those objectives into a specification.
The next step is to design the system or component that will fill those needs.
A design for a component provides a simplified model of how the component will achieve the behaviors, qualities, and structure laid out in its specification. The design is not the full details of how it will achieve those things, or a detailed implementation. The design is a plan for how the component will be built, at a high level; it records the high-level decisions about how the component will be implemented without actually being the implementation.
“Design” is an activity that lacks sharp boundaries from other development activities. On the one hand, it responds to the objectives and specifications that have been developed for the thing being built; on the other hand, the act of designing usually reveals gaps in the specifications that lead to feedback that causes people to update the specification. Specification and design proceed recursively as a system is built, where the act of designing one component leads to writing specifications for its subcomponents.
“Design” also lacks a distinct boundary with “implementation”. Indeed, the boundary between the two varies by convention in different disciplines.
Given the diversity of ways the word “design” is used, I will define what I mean by the term in general.
A design is:
A design is not:
In some projects I have used the term “design model” for the design, to emphasize that the design is a simplification and explanation of the most important aspects of the component’s implementation.
There are several kinds of information that should be recorded in a design.
All of this information should be annotated with a rationale for the decisions that led to the particular design.
Why should one take the deliberate and separate step of putting together a design for a system or component, rather than just implementing a component directly based on its specification?
For an exceptionally simple component, one can skip design and just implement the component—but the component must be truly simple, completely understandable from its implementation, involving no significant design choices, and with no future need to change the component, for this to pay off in the long run.
The value of an explicit design comes partly from its abstraction and simplification, and partly from being done mostly before putting together the detailed implementation.
Time to reflect. This is perhaps the most important reason to take the time to build a design before implementing a component. Modern systems are deeply interconnected. The design choices for one component have effects not limited to that component, and the design choices must usually reflect the needs that many other components place on the one being designed. It takes time to find and understand all these interdependencies.
Many components can be designed in multiple different ways. It is often useful to spend some time developing multiple design approaches before settling on one of them. In many cases it is useful to have two or three design approaches, one of which imposes requirements on some subcomponent that are difficult to achieve. That difficulty may not reveal itself until people have proceeded into the specification and design of that subcomponent. Only then may one realize that an alternative design for the original component is better.
Finally, the design needs to support all of the component’s or system’s specification. Rushing through the design increases the likelihood that some essential requirement will get missed, leading to problems later when the component is integrated with others, or when the system goes into operation, and a subtle failure occurs.
Balanced and incremental design. Modern, complex systems involve many different kinds of constraints on components. A component may need to meet all of structural, safety, functional, security, reliability, environmental, maintainability, user interface, and budget constraints to meet its specification and thus to function correctly in the system as a whole.
I have found that focusing too much on any one of these aspects leads to an unbalanced design that does not meet some other aspect. This can lead to repeated partial design followed by redesign after redesign, each time focusing on a different aspect.
The alternative is to consider a little of each aspect at the same time, working to find a rough design that looks like it will be going in a feasible direction for all of these aspects. After there is a rough design, one can go into greater depth on individual aspects with lower risk that the dive into one area will result in not meeting constraints on another aspect.
As one example, reliability and safety often work against each other. The safer choice is often to shut down a component rather than trying to keep it in operation after a failure. Conversely, the redundancy needed to increase reliability increases the complexity of the component, leading to more conditions that could lead to a safety violation.
Guide and explanation. Multiple people will use a design over the course of a project. While one person may develop the first design, others will analyze it for safety or security; still others will review the design for completeness or correctness; one or more people will use it to implement the component; other people will use it to develop and perform verifications. Later, other people will use the design to understand a component that may need a bug fix or feature change.
In other words, the design is for communicating among many different people and over potentially long periods of time, when the people who originally made the design are no longer available to answer questions from their memory.
For all those people who work on the component later, the design provides a guide to understand how the component is organized.
All too often, an engineer is asked to figure out why some existing software component is not working as expected. There is no design, just the source code. The engineer has to try to extract the design from the source code in order to figure out where the component is not behaving as it should. Extracting the design takes time and effort that could be avoided if the design could just be consulted. An extracted design is rarely accurate: the source code does not have a record of where there are subtle, unobvious aspects of the design; nor does it record why the design is what it is. The result is greater cost and time required to update the component, and a higher risk of a change introducing more problems than it fixes.
Decision rationales. A good design includes an explanation of why particular decisions were made. This information helps those who review and analyze the design to determine whether good choices were made. More important, the rationale informs the people who later need to update or redesign the component.
It is common that any electronic board component that is in production more than a handful of years will run into a situation where some chip is no longer available. The manufacturer has stopped making the original chip X, but another manufacturer is making a chip Y that is supposed to be pin-compatible with chip X. Is it okay to substitute chip Y for chip X? That depends on what it was about chip X that led to it being the choice. If the choice was just on the basic chip function, the substitution is probably okay. However, if the choice was based on something unobvious like the chip X’s radiation tolerance resulting from a particular lithography technique, chip Y may not be an acceptable replacement. The only way to know that the radiation tolerance was a key part of the decision is if someone writes down that rationale.
Supporting analysis. Many key component properties, especially those related to safety, security, or reliability, are emergent from the design. It is increasingly evident that these properties are difficult to retrofit into a completed design: they involve the fundamental organization of elements of the design.
This leads to approaches of security-guided or safety-guided design. In these approaches, the security or safety properties are considered from the start and included in the design. As the design progresses from a rough sketch to something more detailed, it can be analyzed with progressively greater accuracy to determine whether these properties are being met.
This approach is relatively inexpensive and easy when it is being done as part of the original design effort. A safety analysis can determine what high-level aspects of a control loop are essential for safe operation; a security analysis can determine what information flow properties must be met to maintain security. These analyses help early pruning of potential design approaches that would not meet safety or security needs.
The alternative is to proceed without including safety or security considerations, then having to go back and work out control or data flow on a more complex design, then repeat parts of the design process while undoing earlier decisions. Repeating work like this takes more time and effort, and is more likely to result in an implementation that has safety or security flaws.
Alternative designs. In the early stages of designing a complex component, there are likely to be multiple different approaches for the component. The choice among the approaches is often not immediately evident. Which one uses chips that will be available on the needed schedule at the needed quantity? Which one uses a subcomponent that will require significant research to make work? Which one will require a significant up-front investment in acquiring long lead time parts? Which one will be acceptable to regulatory agencies? It may take quite some time and effort to find answers to these issues: prototyping a subcomponent, making legal arrangements with suppliers to find out about availability, and so on.
When there are these kinds of risks in the designs, it is helpful to explicitly keep multiple designs open during the investigations, and to delay investing in detailed implementation effort on any one design that would not be useful if that design turns out not to be feasible.
As noted above, a design enables communication among multiple people, across different times, and for different purposes.
Developing the initial design. One or more people take the objectives, CONOPS, and specification for a component and eventually produce one or more potential designs for that component.
Developing the design is not a single, monolithic activity. It almost always proceeds incrementally, evolving the design from a rough sketch through multiple ideas that turn out not to be quite right until reaching a design that looks like it will meet the component’s specification. The designers will need to try out multiple ideas along the way, meaning that what they document will need to evolve as they try different approaches.
The process of assembling a design can be characterized as working through each of the elements of the specification, while at the same time matching the specification against the possible building blocks for the component. As a simple example, this might involve matching a specification for an electrical energy storage system to store X mAh of energy against a catalog of available battery products.
Actual component specifications involve multiple aspects, some of which will work against each other. A realistic electrical energy storage system must meet performance specifications such as the amount of storage, maximum safe current, reliability constraints, and a number of constraints related to safety. This leads to the recommendation that a designer consider many specification aspects at once, but only at a high level, before going into greater detail.
In the end, the designers must either show that the design they have created fulfills the corresponding specification, or show that the specification is flawed in some way and feed that information back to the people responsible for the specification to get it changed.
Tracking alternative designs. There are usually many ways to design some component, with pros and cons to each. Early in design, there may be multiple promising approaches that require more investigation before a decision can be made among them.
This means that each of the alternatives needs to be documented, along with the investigations needed for each of them, until a decision can be made. It must also be clear to everyone working with the alternatives which one is which. When one alternative is selected, that choice must be clear to everyone working with the designs.
Evolving a design. Every design will evolve, both during the initial system development and over time as the system is used or upgraded or fixed. Any change to the design needs to be evaluated for its scope, its effects, and its correctness.
Evaluating scope and effects means determining what effects the change will have in addition to the specific change being considered. A change in one part of a component might affect some safety property of the component as a whole, for example. A change might also affect some behavior or structure that some other component depends upon, possibly indirectly across multiple intervening components. Substituting one chip for another in a board design might change the timing of some signal, which leads to a subtle change in the sequence of operations performed by software on another board, which in turn invalidates a monitor watching for faults.
Evaluating correctness involves checking that any analyses done on the previous design to show that safety, security, or other properties hold either continue to hold or that the analyses can be adjusted to show that the updated design still meets those criteria.
Analyses. Complex systems will have a number of properties they must exhibit to be correct. These include safety, reliability, and security properties; they also include meeting business objectives and other more mundane properties.
Safety- and security-guided design methods ! Unknown link ref involve incrementally building up these analyses as design progresses, so that a simple, preliminary analysis can provide input to an evolving design.
When a design is believed to be complete enough to select and baseline, it will need review to ensure that it meets all of its specification. Part of this review involves checking the analyses that show the design is compliant. The reviewers need to have the analysis in order to check it.
When a change is being made to a component’s design, the analyses provide a starting point for analyzing the effects of the changes to check that the safety, security, or other properties will continue to hold if the change is made.
Generate specifications for lower-level components. The choice of what subcomponents will be part of a component is a major part of the design effort. The choice of subcomponents means that the role each subcomponent will play has to be worked out; this amounts to developing a specification for each subcomponent.
A subcomponent’s specification is a reflection of the component design. The subcomponent will only work properly as a part of the component if it meets that specification. This leads to the layering principle discussed earlier ! Unknown link ref.
Navigating through the system. Many people will need to find things in the system over time—developers, reviewers, auditors, and many others. Virtually none of them will come in with a complete understanding of the system and its structure, so they will need a guide that helps them learn the structure of the system and to find where some behavior or feature is implemented.
The system design can support such users in three ways. First, the design can provide the breakdown structure, showing how the system is divided into components, those into subcomponents, and so on. The breakdown structure also groups related components together, so that a user can narrow down where they are looking. Second, the design can show how components are related to each other. If one component in one part of the system is providing feedback signals to a component in a different part of the system, making these relationships explicit provides a way for a user to trace out these interactions. And third, including explanations or rationales for why the design is the way it is helps educate the user about subtleties that are not going to be apparent from just reading about the structure, interactions, or behaviors.
Guiding project management. As the design progresses, there will be more components to design than there are people to work on them, and some components will be ready to implement or verify. Project management must make decisions about where to put effort.
Project managers will need information like how risky some potential component designs are, as opposed to those component designs that are fairly certain and thus reasonable to implement. They will need to know which component designs have significant uncertainty, and will benefit from investing resources to prototype a potential design.
These decisions benefit from information that can be gathered and maintained in the overall system design, such as:
Progress tracking. Project management needs to be able to track the development progress of different parts of the system, in order to determine whether a project is on track for completion or is having problems that need to be addressed.
Being able to name each of the components that need to be developed, and being able to determine the development progress on each of them, enables project tracking.
As well as all the uses listed above, the developer uses the design as a guide for the implementation. The resulting implementation must be consistent with the design: having the same structure and behavior, including all the functions in the design, and including no functions not in the design.
The developer or implementer must be able to understand the design to build a component that matches the design. The developer must also be able to check that they understand the design properly, so that there is a way to catch misunderstandings. A good design uses consistent structure, terminology, and diagrams to aid understanding. It provides a glossary of terms that may have multiple meanings to define how they are used in the design.
Developers will find problems with the design as they proceed through implementation. They may find ambiguities, where the design is unclear or where the design does not address some important condition. The developer may find errors, where the design is inconsistent internally or with its specification. The developer may find that parts of the design aren’t feasible to implement. All of these problems need to be fed back to designers for clarification or correction.
When the design changes, the developer needs to be able to identify what parts of the design have changed so they can change the corresponding implementation. The change might come in response to feedback from the developer, or evolution of the design to address changing needs or broader system fixes. This can be supported by using tools that track design versions and highlight design changes between versions.
Finally, the developer must be incentivized to follow the design (or provide correction feedback) as they implement the component. This includes having the designer and independent people review the implementation to compare it to the design. If they find that the design and implementation are not consistent, they must decide on how to change the design, the implementation, or both in order to achieve consistency. The component implementation should not be accepted as complete until they match.
The artifacts that record the design enable all the usage cases listed above. The key functions they need to fill include:
The designs for a system need to be available to everyone associated with the project, so that they can use the design to learn about the system and navigate through it.
An ideal solution provides a “single source of truth”: a user can go to one place and see all of the information about the system. The ideal solution also ensures that the user always sees a single consistent version of all the information. To the best of our knowledge, at present there are no systems that completely meet this ideal. However, there are ways to come close by integrating multiple tools and applying conventions to how they are used.
The infrastructure for maintaining designs needs to, at minimum:
The following sections list the key artifacts that should be part of a design. Later chapters will detail these artifacts.
The breakdown structure consists of the hierarchical relationship of system, components, and their subcomponents recursively. It gives a name or identifier to each component, and provides the index or table of contents to the parts that make up the system. See ! Unknown link ref.
A complex system will have behaviors or structures that cross multiple parts of the system, and don’t neatly fit within a single hierarchy of components. There are two important examples of these behaviors to document.
The first example is behavior or activity sequences that show how different parts interact with each other. These are sometimes documented as UML or SysML activity diagrams, which show how control or data pass among components, and how different components take actions in response to those. The point of these patterns is to show how components work together, which informs the interfaces, actions, and states that the components involved in the activity must support.
The second example is the hierarchies of control that operate in the system. These document how one part of the system controls the functions in other parts, including how some components provide sense data to drive the control logic, and how the control logic in turn sends commands to other components to effect control actions. Documenting and analyzing these control systems is an essential part of some safety and security process methodologies, such as STPA [Leveson11].
Each component in the system should have its own design. This is the primary content about individual components, as opposed to how components work together.
A component’s design can be represented in many different ways. However, it is easiest for users if all designs follow the same general format so that they know how to find particular kinds of information within every design.
All designs should include:
Each component’s design should include rationale: the reasons why different design choices were made. This information helps those who must come along later to review or update the design.
In some cases, not all of this information can be represented in one way or in one tool. For example, for electronics designs the best way to represent some information will be in a CAD drawing that is maintained in a separate tool from the rest of the design information. In these cases, there should be unambiguous references from the main design to the CAD drawing and vice versa, and the versioning in the main design should be reflected in versioning in the CAD tool.
Part of the reason for developing a design—as a simplified model of what will be implemented—is to enable analysis of the design’s essentials. These analyses address whether the design will meet aspects of the component’s specification. These can include safety and security, as well as meeting business objectives, regulatory requirements, performance specifications, or resource budgets.
As I will discuss in the next section, it is recommended practice to develop these analyses incrementally in parallel with the design itself. In this way, a rough analysis of a rough design can provide quick, early feedback that will guide the design toward meeting its specified properties as it is developed in more detail.
These analyses become an important part of the record of a design once complete. They provide an extended rationale for why the design is the way it is. They may be needed to answer to external stakeholders, including regulators or courts of law, when it becomes necessary to provide evidence why the design is acceptable. The analyses also help people who must later evolve the designs to understand both the constraints on what they can change, and where they have freedom to make changes without invalidating the safety or other properties of the design.
As a matter of principle, the design for a system or component should be done after its objectives and specification are done, and before its implementation. Similarly, the design for the components in a system should proceed top down, starting with the system as a whole and proceeding to lower and lower level components. When the design of one component depends on the design of another, the two should be designed together.
These principles often lead people to conclude that systems should be built using a waterfall-like process, where everything is specified before design, everything designed before implementation, and so on.
Real projects are not so simple. I have never observed a project that actually used such a process, even when they tried to. This is because every complex system I have encountered is not fully and accurately knowable in advance. One can write a set of specifications that turn out to require some impossible component design. One might miss some important system objective when developing the initial system concept because the customer was not able to conceive of system operation until they could see part of the system in operation, or because the customer’s needs change. An initial design may be invalidated because a supplier discontinues an essential part. Some part of the system may require significant investigation or research before one can find a feasible way to approach its design.
All of these situations lead to cases where the specification, design, and implementation of the system does not proceed in a tidy one-way sequence through the waterfall stages. Instead, part of a component’s specification gets worked out, and some tentative design goes ahead using that part of the specification gets worked out. Or multiple possible design approaches are defined, and then someone proceeds to build simple prototype implementations of two or more of them to compare their feasibility. Or the design for a component must change, leading to a change in implementation. All of these may be happening in multiple parts of the system at once.
At the worst, all this change happening all over a system can lead to chaos where people working on different components are working to incompatible specifications or designs and building parts that will not integrate into a system. Project management may not be able to determine how much progress has actually been made on any part of the system, and thus be unable to detect when there are schedule or resource problems.
Therefore while the simple waterfall model, which organizes the work on a system, is not feasible, there is still a need to organize development work.
The principles I started with are good ideas in general, when used flexibly.
Develop specifications, then design. When one designs a component without first working out what the rest of the system needs that component to do, one usually ends up with a design that doesn’t actually meet needs (once those are worked out). When a specification gets developed, the people involved will tend to look at the effort that has already been spent on designing (and possibly implementing) the component and will try to adjust the specification to fit that sunk cost—after all, that work has already been done, why should it be discarded? Unfortunately this tends over time to produce safety and security problems, and to dramatically increase the cost of the system as people try to integrate the wrong component into the rest of the system.
It is better to explicitly defer some design decisions until the specification is firm—but not avoid doing any design. (Doing no design until specification is done is not possible when the design activity can reveal problems with a specification.) Do a minimal amount of design, bearing in mind the risk that design may need to change as the specification changes, as well as the risk that the specification may need to change as design reveals problems.
Instead:
Develop design, then implement. Similar to the way design reflects specification, the implementation reflects design. Proceeding with implementing a component before it has been designed is not really possible: doing so means that design is done implicitly and is left unrecorded. This leads to components that fail to meet functional, safety, or security constraints because those constraints have not been properly considered and analyzed before committing effort to implementation.
At the same time, deferring all implementation until all design is complete is a recipe for an infeasible system. It is all too easy to create a design that involves impossible feats of implementation, from requiring metals that do not currently exist (“unobtainium”) to algorithms that have not been invented.
I have found that a middle ground often works well. As I will discuss in future chapters on implementation, I have used a software implementation approach that emphasizes continuous integration (by which I do not mean continuous testing) and skeleton building for implementation, where the implementation proceeds in many small iterations. Using this approach the team can build a simplified implementation of the general structure of a component, focusing on those aspects where the design appears either to be relatively certain or where there is higher risk in the design that needs to be checked with a rough implementation.
I have also made a point of prototyping implementations of parts of a design in order to validate whether the design is feasible. I will also discuss prototyping in a future chapter.
There is a high risk with any implementation done before specification and design are solid, even when the implementation is done for good reasons (like prototyping to validate a design approach). The effort spent on implementing something is a sunk cost: it cannot be recovered. As the design evolves, there is a strong incentive to try to continue to reuse the implementation that has already been completed, as the incremental cost or time of modification is almost always perceived to be less than starting a new implementation from scratch. This leads to a sequence of incremental changes, each of which by themselves can be perceived as the lower-cost way of handling a sequence of design changes. However, it is often the case that after a few of these incremental changes, it will have become more cost-effective to have thrown away the initial prototype or implementation and started over with better information. This sequence of incremental changes also tends to result in an implementation that has many vestiges of implementations that are no longer applicable, but which continue to present a source of bugs, security flaws, or safety problems.
The cost of incrementalism is often apparent only in retrospect. It is also driven by basic business imperatives to minimize cost at each step, or to get features implemented as rapidly as possible. This is an example of an online optimization problem, which is often hard to solve well theoretically and even harder when human incentives are involved. The techniques used to solve similar online optimization problems (notably the ski rental problem ! Unknown link ref) apply. Limiting the amount of implementation effort that may be at risk for incrementalism by deferring as much implementation as possible until the design is solid helps avoid this situation.
Thus we:
Design top down and coordinate the design of interdependent components. Many aspects of a system’s design can only be developed effectively when they are developed from the top down, notably safety and security properties. That is because these properties apply to the system as a whole and are emergent from the designs of the components that make up the system. (See [Leveson11] for an in-depth discussion of this effect.)
However, designing from the top down creates risk, similar to the previous principles, that a high-level design may create unachievable specifications for lower-level components. There is also a risk that during high-level design the cost or time involved in developing some lower-level parts of the system is unknown. This can lead to effort being spent, unknowingly, on subcomponents that are simple to design and build while subcomponents that will take far longer to develop are left for later, leading to a drawn-out schedule.
Our recommendation for managing this risk is to sketch the design for multiple layers, creating a rough outline of a design for a component and some layers of its subcomponents, then proceeding to add detail to the high-level component and fleshing out the specification for its subcomponents. Proceeding incrementally in this way allows one to obtain some information about the feasibility and complexity of a particular design approach before committing all of one’s effort to the detail of the top-level component. This approach is similar to our recommended implementation approach of building skeletons or prototypes of components rather than immediately progressing to detailed implementation.
The same issues about the cost of incrementalism apply to top-down design as they do to implementation. It can be useful to make sketch designs that are not in the final form needed, to reduce the temptation to turn sketches that have been changed over and over directly into the design for a component.
Balance design work. I have found that focusing on one aspect of a component’s design to the exclusion of others often leads to dead-end designs, where a work in progress becomes too biased toward one aspect and is not readily evolved as other aspects begin to be considered. Focusing on primary features first, and leaving security or safety for later, is a common example of this pattern.
I have found it more useful to consider many different aspects of a component’s design at a high level, sketching out different rough possible designs and making simple comparisons as one learns about the design problem. This approach has the advantage of investing relatively less effort on detail design and analysis while the design has a higher degree of uncertainty, and focusing effort on those approaches that pass the first simple evaluations.
This approach to design has its pitfalls. Some components’ designs are constrained by particular aspects—such as a need for high performance or the ability to operate in an extreme environment. These aspects are sometimes called design drivers: they have a disproportionate effect on the final design. Recognizing when some aspect drives the design in this way, and putting more effort earlier into understanding these drivers, is part of the art of designing well.
Plan for updates. Nearly every design in a successful system will be updated as time goes by. Over time, the effort spent on these updates will dwarf the effort spent on the initial design. This means that if one is developing a system for the long run, the processes, tools, and artifacts used in the design effort should be organized in a way that supports those who will come along to learn about, evaluate, and redesign parts of the system—long after those who initially designed it have moved on.
This necessitates documenting more than just the structure of the implementation. For these people to understand a design, they need to know the thinking behind what the choices were and the subtle aspects that are not necessarily apparent from looking at the implementation. These people will need guidance for how components relate to each other. They will need to understand the analyses that determined whether the component’s design was sufficiently safe or secure. This documentation takes more effort than proceeding through a one-time design, building an implementation, and then moving on, but it provides a project with a future.
Making updates effective also involves creating a team structure and human processes that can handle updates. This involves giving the team a clear way to understand how design changes happen, and how to distinguish proposals or work in progress from a design they should work from, or how to determine what design applies to a specific deployed system. It also involves developing a team culture that incentivizes good design and good documentation, giving them enough time to document enough design that their successors can build on their work and avoiding creating unnecessary time pressures that disincentivize people from doing good design.
Use appropriate infrastructure. Finally, effective design relies on having the tools, processes, and standards that give people the tools to do design work. The key principles I recommend include:
The component breakdown, or breakdown structure, is the way to name and organize all the components that make up a system.
The component breakdown organizes and names all the pieces in the system. It serves three main purposes:
These purposes lead to a few objectives that a breakdown should meet.
Some institutions, notably NASA [NPR7120][NASA18] and other parts of the US Federal government [DOD22], specify the use of a work breakdown structure (WBS) in project management and systems engineering. A WBS as used in those projects is different from a component breakdown structure as defined here.
A WBS is oriented toward project management, not systems engineering. It is focused on defining the work to be done (hence the name) rather than the items or components being built by the work. From the NASA WBS Handbook [NASA18, p. 35]:
The WBS is a project management tool. It provides a framework for specifying the technical aspects of the project by defining the project in terms of hierarchically-related, product-oriented elements for the total project scope of work. The WBS also provides the framework for schedule and budget development. As a common framework for cost, schedule, and technical management, the WBS elements serve as logical summary points for insight and assessment of measuring cost and schedule performance.
Other project management methodologies define a work breakdown structure as, in effect, a checklist of the kinds of work that may be required for a system, feature, or component. McConnell discusses using a generic work breakdown structure in estimation to ensure all the effort involved is accounted for [McConnell09, Table 10-3].
This difference in intent leads to two major differences in the contents of a WBS compared to a component breakdown. The first is that a WBS includes work items that are not product artifacts. The standard NASA WBS, for example, includes project management, systems engineering, and education and public outreach branches of the work breakdown tree [NASA18, p. 47]. Given that part of the goal of the WBS is to organize resources and budget for a project, that’s an appropriate choice. The other difference is that some people break a task for building a component down into multiple revisions or releases. For example, a “motor control software” component might have subitems “prototype”, “release 1”, and “release 2”, recording the phases of work done to develop that software package.
The component breakdown structure presented in this chapter is narrower in focus than a WBS. The component breakdown lists only the things that are being built. It must be complemented by other engineering and management artifacts to provide everything needed to run a project.
The component breakdown is one of several views into the system’s design and specification. The component breakdown has only two purposes: listing all the components and giving them unique names, and providing a structure that people can use to navigate through the components to find one they are looking for.
The component breakdown is not for expressing other facts about components and relationships between them. There are other views and other breakdowns for representing that information—and for doing so in ways that are better suited to the specific information that needs to be explained. For example, a network or wiring diagram does a better job of illustrating how multiple hardware components are connected together. Mechanical drawings are a better way to show how components relate to each other physically. Data and control flow diagrams, perhaps realized as SysML activity and sequence diagrams, are better suited to expressing relationships between software components.
When developing a component breakdown, the first question to be settled is: what is a component?
First, a component is something that people think of as a unit. Terms like “system”, “subsystem”, or “module” are all clues that people think of a thing as a unit. More generally, a component is something
Components do not have to be atomic units. Systems have subsystems; components have subcomponents. For example, the electrical power system (EPS) in a spacecraft is a medium-level component in a typical breakdown structure. It is part of the spacecraft as a whole. It is made up of several subcomponents: power generation, power storage, power distribution, and power system control. Each of those subcomponents in turn have constituent components themselves: for example, power generation has solar cells, perhaps arrays that hold the cells, perhaps some other power generation mechanism.
This illustrates the general pattern for the breakdown structure. The structure is a tree, with the highest-level component being the system as a whole. The system as a whole is typically not just a vehicle or box; it is the entire mission or business on which a vehicle is part. Underneath the whole system come the major component systems. For a spacecraft mission, this might be the spacecraft, ground systems, launch systems, and related assembly and test systems. The next level of components are the major subsystems. The structure continues recursively until reaching components that are the smallest that are sensible to model using systems tools.
The recursive process of defining smaller and smaller components ends when there is a judgment that further subdivision won’t help the systems engineering process. In practice, for example, continuing the breakdown structure all the way to individual resistors and capacitors on a printed circuit board is too detailed to be useful for systems engineering tasks.
Some criteria I have used for deciding when to continue subdividing a component into subcomponents include:
Some examples:
The approach laid out here is fundamentally hierarchical, and reflects the way people usually approach breaking down a complex system—by a reductive approach that organizes parts into a hierarchy.
That is not the only approach to organizing the components. Mechanical and electrical engineering systems often use a more-or-less flat space of part numbers to identify components. The specifications for each part can have attributes, and the attributes allow one to search for a desired part.
A flat part number approach works well for low-level, physical components. A 100 ohm resistor can be used in many different components; there is little value in giving a different name for its use in one place on one board and a different name for a second place on that board, or on a different board. Similarly, when manufacturing many instances of a vehicle, using a part number to identify the part in an assembly works well.
I have generally not used a part number approach for higher-level systems activities, however, because the uses are not the same. During design, each component that systems engineering deals with is generally unique.
A component’s identifier provides a unique way to refer to that component. It is like the address for a building: it allows one to find the component (or its specifications), but does not by itself convey much more information. The keys are that the identifier be unique, and that people can use the identifier to find what they are looking for.
The pathname is the long-standing practice for creating identifiers
for elements in a hierarchy. This is familiar from file systems and
URLs: the path /a/b/c/d
refers to a file or object named “d”, which
is contained in “c”, which is in turn contained in “b”, which is part
of “a”, which is one of the top-level objects or folders in the
system. While the object name “d” is not necessarily unique (there can
be another object /a/f/d
, for example), the path as a whole does
give a unique identifier for the object or file.
This approach applies to the identifiers for components in a breakdown
structure as well. The names in the path are typically separated by a
slash (/
) or period (.
).
The names of each component in the tree can be abbreviations or short words describing the component. Both work well; the choice is primarily a matter of style. When there are commonly used abbreviations for some components, it is reasonable mix and match abbreviations and longer names. For example, a spacecraft’s computing system is often called the CDH (command and data handling); attitude control is the ACS (attitude control system); and the electrical system is the EPS (electrical power system).
Some examples from a fictitious spacecraft system:
Abbreviations | Short names |
---|---|
sc | spacecraft |
sc.eps | spacecraft.power |
sc.eps.batt | spacecraft.power.battery |
sc.cdh.fp | spacecraft.cdh.flightprocessor |
Long component identifiers can become a problem. Long identifiers are harder to type than shorter ones. Sometimes there are limits on how long an identifier can be; for example, if one is recording information about components in a spreadsheet and putting each different component on a different sheet, most spreadsheet packages have limit on how long a sheet name can be.
The length of an identifier is driven by how deeply the breakdown structure tree goes. The path name for a component six layers down in the hierarchy will be much longer than the path name for a component in the third layer. This suggests that one should try not to make the component hierarchy any deeper than it needs to be.
Many people find a visual representation of the breakdown structure helpful for understanding it. Here is a drawing of an incomplete breakdown structure for a simple spacecraft:
It is worth finding tools that can show this kind of visual representation of the breakdown structure.
The breakdown structure provides the fundamental organization for most systems engineering artifacts. This means that the structure chosen for the breakdown will affect how most other parts of a specification are organized.
Each component named in the breakdown has a specification. The specification includes information like
When two components interact, the interface between them must name which components are involved. The specifications for each component must indicate what data or control they will be sending and receiving in the interaction.
The identifier for a component provides a way to express a reference between implementation and test artifacts, like source code or drawings, and the specifications to which they should comply.
The breakdown structure affects almost everyone working on the project. This includes:
The understanding of the system evolves gradually from the initial concept to the time that a final product is delivered (if indeed there is a final product). At each step of this evolution, the understanding of what should be in the breakdown structure and how it should be organized will change.
Because the breakdown structure is central to many other processes and artifacts, a change to the breakdown structure will result in changes to potentially many other artifacts. The cost of the change grows as the size of the breakdown structure tree grows.
Don’t try to build an elaborate and complete breakdown structure too early. At the beginning, while still working out the basic concepts of the system and its structure, just sketch out the first level of the structure—and try out several potential structures until one appears to match the system’s objectives. Often the main structure will be suggested by common practice for similar projects: the automobile industry has a common, vernacular breakdown of cars and trucks into common subsystems, for example.
In general, it is best to keep a branch of the breakdown structure shallow as long as there is significant uncertainty about how that part of the system will be designed. In an aircraft, for example, the propulsion system should be left unrefined in the breakdown structure until the team has settled on the general approach to propulsion—will it use turbofans, turboprops, propfans, electric rotors, or some combination? The broad choice can typically be settled early in concept development by working out the concept of operations and determining what capabilities, performance, and physical layout will meet the aircraft’s operational needs. Once the general architecture has been decided, then one can refine the propulsion system by adding a layer of components for each engine or other major unit involved in propulsion.
The point of the breakdown structure is to help people find and refer to components. The breakdown structure should reflect common ideas of how a system breaks down into components, and should result in short, easy-to-use identifiers. The breakdown structure should focus on these capabilities and not be drafted into serving other purposes.
Consider the breakdown structure for all the sensors that provide information to an autonomous vehicle. One way to organize the sensors is to create a general “sensors” component, and then include all the sensors as children of the general sensors component. Another way is to break the sensors down first by general type (camera, lidar, radar, sonar, microphone), then by general location of the sensor on the vehicle (front, left, right, top, back), and then by the specific sensor unit. In this example, the first approach leads to a shallow and broad breakdown structure; the latter example leads to a narrow and deep structure.
In general, a shallow, broad breakdown structure will meet these objectives better than a narrow and deep structure. There are a few reasons for this.
This leads to a general principle. The breakdown structure should be used only for providing a unique name, and not for embedding a taxonomy or search attributes. The tools that people use to navigate through the breakdown structure and its related artifacts, like specifications, should provide search mechanisms that let someone find a component by attributes. Embedding extraneous information, like a location attribute or model number or power requirement in the name will just make the names longer, harder to use, and less resilient to change.
The hierarchical, tree-structured approach recommended here makes each component part of exactly one parent component. It does not accommodate components that have more than one natural affinity to parent groupings.
Consider a radio transceiver that is used to communicate between aircraft, such as the ADS-B systems used for collision avoidance. This transceiver could be categorized multiple ways. It is part of the aircraft, but it is also part of an air traffic management safety system. The transceiver within the aircraft is part of a communication system, but it is also a part of the flight control system and intimately connected with human interface components on the flight deck. The transceiver, in other words, is part of several different groupings of components, depending on who is looking and for what purpose.
There is a fundamental tension between simple organizing structures, like a tree, and the richer relationships that elements of a system have with each other. For an excellent discussion of this, see Alexander’s essay on trees as a structuring approach for cities [Alexander15]. In that essay, Alexander proposes that a lattice structure is a more appropriate model for organizing urban structures. In his account, a tree-oriented description of a city fails to account for the ways that a house can be both a place for a family to live as well as a node in a social network and a place of work; in each of these roles, the house is related to different buildings or locations in the city.
The systems engineering approach presented here addresses this problem by separating naming or identity from the complex relationships that each component actually has. The breakdown structure only tries to give a name to each thing, like the address for a building. The relationships, functions, requirements, and everything else that goes into defining a component are all left to other artifacts, such as the component’s specification and models of the components.
This means: don’t try to make the breakdown structure do too much. When a component fits into multiple categories, pick the one that seems most natural for most users and leave it at that. Other artifacts and tools will address greater complexity.
The breakdown structure is for organizing components: things that are built and that can be seen or touched (possibly virtually).
There is sometimes a temptation to try to organize system functions into the breakdown hierarchy. Don’t do that. The breakdown of function—and of the allocation of function to component—is a separate task that needs to be addressed by a structure that focuses on how functions are organized.
A better approach is to maintain the component breakdown and a functional breakdown separately, and maintain an allocation mapping that shows how different subfunctions are achieved by different components. The functional breakdown is often better reflected in the structure of how specifications or requirements derive from each other. See the chapter on requirements for more on this.
Some projects have proposed organizing components primarily by some fundamental, nonfunctional attribute. One project was considering separating hardware from electronics from software from operational procedures at the top level, and then organizing components within each of those categories by subsystem. Another project organized components first by the vendor organization that was to implement the component.
These approaches make it harder for people to use the breakdown structure to find things. Consider an electrical power controller on a spacecraft. This has an electronic component (the board and processor that runs the power control function) and a software component (that makes the decisions about what to power on and off, and to report information to a telemetry function). Someone working on the power controller will generally want to know about both aspects. Requiring them to look in two widely-separated parts of the breakdown structure is inconvenient, and (more seriously) it increases the chances that someone will miss a component that they need to know about to do their work.
As a general principle, it is better to group components by how people naturally think of them as being grouped. Keep functionally-related components close together in the breakdown structure so that people find everything they need about something by looking in one place.
As noted above, this doesn’t always work. The breakdown structure will not be perfect because not everything in a system naturally falls into a hierarchical organization. But the more that like things can be grouped, the easier it will be for people.
There is one special case of a component fitting into multiple places in a breakdown structure that deserves special treatment: generic and reusable components.
Consider an operating system. There may be multiple processors within a system that may all run instances of the same operating system. It is useful to have one specification for that operating system: there’s one product that is acquired from a vendor, there is one master copy kept somewhere, and so on. At the same time, that operating system will be loaded onto many different processor components in different subsystems.
One way to address this is to have a part of the breakdown structure for generic components, and then put an instance of that component in the places where it is used. The specification of each instance component can refer to the specification for the generic, with those functions or requirements that are specific to the instance added. This is an example of using the class-instance model from object-oriented programming to solve the problem.
The NASA project management process and systems engineering standards use a common WBS structure across all NASA projects. The use of the WBS is codified in a Procedural Requirement document [NPR7120], with details in an accompanying handbook [NASA18].
The NASA WBS is used as a project management artifact to organize work tasks, resources and budget, and report progress. The hierarchy must “support cost and schedule allocation down to a work package level” [NPR7120, p. 113]. A “work package” means one task or work assignment that is tracked, budgeted, and assigned as a single unit.
A NASA project’s WBS tree is rooted in the official NASA project project authorization, with its associated project code.
The first level of elements is defined by NASA standards, and each element has a standard numbering. The standard elements for a space flight project are: [NPR7120, Fig. H-2, p. 113]:
Note how this organization mixes technical artifacts (payloads, spacecraft, ground systems) and management activities (project management, safety and mission assurance, public outreach).
The NASA WBS is intended to be one part of an overall project plan document. The project plan also contains information like:
This breakdown structure standard aims to provide a “consistent and visible framework” [DOD22] for communicating and contracting between a government program manager and contractors that perform the work. It addresses needs such as “performance, cost, schedule, risk, budget, and contractual” issues [DOD22, p. 1]. This kind of WBS is thus focused on supporting contractual relationships with suppliers.
The standard defines a number of different templates for different kinds of projects. It includes templates for aircraft systems, space systems, unmanned maritime systems, missiles, and several others.
The template for an aircraft system includes the following Level 2 items:
As should be clear from this example, this WBS template aims to address not just the design and building of a system but rather the operation of the entire program, including testing, deployment, and initial operation.
This is an example component breakdown for a simplified imaging spacecraft. The spacecraft uses solar panels to collect energy; it has a single imaging camera to collect mission data; it has a flight computer to run the system; an attitude control system to point the imager where needed; and a radio to communicate to ground. (The graphical version of this breakdown structure is included earlier in this chapter.)
Id | Title |
---|---|
space | Space segment |
space.acs | Attitude control system |
space.acs.control | Control logic |
space.acs.sun | Sun sensor |
space.acs.wheels | Reaction wheels |
space.cdh | Command and data handling avionics |
space.cdh.gps | GPS receiver |
space.cdh.gps.ant | Antenna |
space.cdh.main | Main processor |
space.cdh.storage | Data storage |
space.comm | Communications system |
space.comm.ant | Antenna |
space.comm.ant-tran | Cable |
space.comm.trans | Transceiver |
space.eps | Electrical power system |
space.eps.battery | Battery |
space.eps.controller | Power controller |
space.eps.panels | Solar panels |
space.eps.sep | Separation switch |
space.harness | Harnesses |
space.harness.canbus | Data CAN bus |
space.harness.pl | Payload harness |
space.harness.power | Power cabling |
space.harness.radio | Radio harness |
space.pl | Payloads |
space.pl.imager | Imager payload |
space.prop | Propulsion system |
space.prop.lines | Fuel lines |
space.prop.tank | Fuel tank |
space.prop.tank.pressure | Pressurization system |
space.prop.tank.sensor | Fuel pressure sensor |
space.prop.thruster | Thruster |
space.structure | Structure |
space.thermal | Thermal management system |
space.thermal.propheat | Prop tank heater |
space.thermal.radiator | Thermal radiator |
This example only goes four levels deep. The actual breakdown structure would likely include at least two more levels, to represent, for example, different parts of the flight control software or subcomponents of the radio transceiver.
The example includes an example of a component that could fit in multiple places in the structure: the propellant tank heater. This is part of the thermal management system—its function is to keep the fuel in the propellant tank within a certain temperature range—but it is also part of the propulsion system. In this example the choice was to categorize it as part of the thermal management system.
- purpose of prototyping
- reasons to prototype
- pitfalls of prototyping
- good practices
– bounded effort
– explicit non-reusability
[14CFR450] | “Part 450—Launch and reentry license requirements”, in Title 14, Code of Federal Regulations, United States Government, August 2024, https://www.ecfr.gov/current/title-14/chapter-III/subchapter-C/part-450, accessed 2 September 2024. |
[Albon24] | Courtney Albon, “Space Force may launch GPS demonstration satellites to test new tech”, C4ISRNET, February 2024, https://www.c4isrnet.com/battlefield-tech/space/2024/02/09/space-force-may-launch-gps-demonstration-satellites-to-test-new-tech/, accessed 11 September 2024. |
[Alexander15] | Christopher Alexander, A City is not a Tree, Portland, Oregon: Sustasis Press, 2015. |
[Ambler23] | Scott Ambler, “What happened to the Rational Unified Process (RUP)?”, https://scottambler.com/what-happened-to-rup/, accessed 29 February 2024. |
[Asimov50] | Isaac Asimov, I, Robot, New York: Gnome Press, 1950. |
[BCP14] | Scott Bradner, “Key words for use in RFCs to Indicate Requirement Levels”, Internet Engineering Task Force (IETF), Best Community Practice BCP 14, March 1997, https://www.ietf.org/rfc/bcp/bcp14.html. |
[Bezos16] | Jeffrey P. Bezos, “2015 Letter to Shareholders”, Amazon.com, Inc., 2016, https://s2.q4cdn.com/299287126/files/doc_financials/annual/2015-Letter-to-Shareholders.PDF, accessed 22 February 2024. |
[Bogan17] | Matthew R. Bogan, Thomas W. Kellermann, and Anthony S. Percy, “Failure is not an option: a root cause analysis of failed acquisition programs”, Naval Postgraduate School, Technical report NPS-AM-18-011, December 2017, https://nps.edu/documents/105938399/110483737/NPS-AM-18-011.pdf. |
[CISA21] | “Defending against software supply chain attacks”, Cybersecurity and Infrastructure Security Agency, U.S. National Institute of Standards and Technology, April 2021, https://www.cisa.gov/sites/default/files/publications/defending_against_software_supply_chain_attacks_508.pdf. |
[CMMI] | ISACA, “What is CMMI?”, https://cmmiinstitute.com/cmmi/intro, accessed 24 March 2024. |
[CVE24] | Information Technology Laboratory, National Institute of Standards and Technology, “CVE-2024-3094 detail”, in National Vulnerability Database, https://nvd.nist.gov/vuln/detail/CVE-2024-3094, accessed 4 August 2024. |
[Castano06] | Andres Castano, Alex Fukunaga, Jeffrey Biesiadeick, Lynn Neakrase, Patrick Whelley, Ronald Greeley, Mark Lemmon, Rebecca Castano, and Steve Chien, “Autonomous detection of dust devils and clouds on Mars”, Proceedings of the International Conference on Image Processing, October 2006. |
[Control19] | “Yokogawa announcement warns of counterfeit transmitters”, Control, 29 May 2019, https://www.controlglobal.com/measure/pressure/news/11301415/yokogawa-announcement-warns-of-counterfeit-transmitters. |
[DFARS] | “Defense Federal Acquisition Regulation Supplement”, General Services Administration, United States Government, January 2024, https://www.acquisition.gov/dfars, accessed 16 February 2024. |
[DOD22] | “Work Breakdown Structures for Defense Materiel Items”, Department of Defense, United States Government, Standard Practice MIL-STD-881F, May 2022, https://cade.osd.mil/Content/cade/files/coplan/MIL-STD-881F_Final.pdf. |
[Drucker93] | Peter F. Drucker, Management: Tasks, Responsibilities, Practices, New York, NY: Harper Business, 1993. |
[ELOMC] | Engineering Lifecycle Optimization—Method Composer, IBM, version 7.6.2, 2023, https://www.ibm.com/docs/en/engineering-lifecycle-management-suite/lifecycle-optimization-method-composer/7.6.2, accessed 29 February 2024. |
[EPF] | Eclipse Process Framework Project (archived), Eclipse Foundation, 2018?, https://projects.eclipse.org/projects/technology.epf, accessed 29 February 2024. |
[FAR] | “Federal Acquisition Regulation”, General Services Administration, United States Government, January 2024, https://www.acquisition.gov/browse/index/far, accessed 16 February 2024. |
[Foust24] | Jeff Foust, “Slow Burn: How Starliner’s crewed test flight went awry”, Space News, 4 September 2024, https://spacenews.com/slow-burn-how-starliners-crewed-test-flight-went-awry/, accessed 9 September 2024. |
[Git] | Git contributors, “Git documentation”, https://git-scm.com/doc, accessed 31 July 2024. |
[Goodin24] | Dan Goodin, “What we know about the xz Utils backdoor that almost infected the world”, Ars Technica, 31 March 2024, https://arstechnica.com/security/2024/04/what-we-know-about-the-xz-utils-backdoor-that-almost-infected-the-world/, accessed 4 August 2024. |
[Heilmeier24] | George H. Heilmeier, “The Heilmeier Catechism”, in DARPA, https://www.darpa.mil/work-with-us/heilmeier-catechism, accessed 13 July 2024. |
[ISO26262] | “Road vehicles — Functional safety”, Geneva, Switzerland: International Organization for Standardization, Standard ISO 26262:2018, 2018. |
[LADEE13] | “LADEE—Lunar atmosphere and dust environment explorer”, NASA Ames Research Center, Fact sheet FA-ARC-2013-01-29, 2013, https://smd-cms.nasa.gov/wp-content/uploads/2023/05/ladee-fact-sheet-20130129.pdf, accessed 16 September 2024. |
[Leveson11] | Nancy G. Leveson, Engineering a safer world: systems thinking applied to safety, Engineering Systems, Cambridge, Massachusetts: MIT Press, 2011. |
[LoBosco08] | David M. LoBosco, Glen E. Cameron, Richard A. Golding, and Theodore M. Wong, “The Pleiades fractionated space system architecture and the future of national security space”, AIAA Space 2008 Conference, September 2008, https://chrysaetos.org/papers/Pleiades%20fractionated%20space%20system.pdf. |
[McConnell09] | Steve McConnell, Software Estimation: Demystifying the Black Art, Redmond, Washington: Microsoft Press, 2009. |
[NASA16] | “NASA Systems Engineering Handbook”, National Aeronautics and Astronautics Administration (NASA), Report NASA SP-2016-6105 Rev2, 2016. |
[NASA18] | “NASA Work Breakdown Structure (WBS) Handbook”, National Aeronautics and Astronautics Administration (NASA), Handbook SP-2016-3404/REV1, 2018, https://essp.larc.nasa.gov/EVM-3/pdf_files/NASA_WBS_Handbook_20180000844.pdf. |
[NPR7120] | “NASA Space Flight Program and Project Management Requirements”, National Aeronautics and Astronautics Administration (NASA), NASA Procedural Requirement NPR 7120.5F, 2021. |
[NPR7123] | “NASA Systems Engineering Processes and Requirements”, National Aeronautics and Astronautics Administration (NASA), NASA Procedural Requirement NPR 7123.1D, 2023. |
[NavarroGonzalez10] | Rafael Navarro-Gonzalez, Edgar Vargas, José de la Rosa, and A. C. Raga, Christopher P. McKay, “Reanalysis of the Viking results suggests perchlorate and organics at midlatitudes on Mars”, Journal of Geophysical Research, vol. 115, December 2010. |
[Purdy24] | Kevin Purdy, “Music industry’s 1990s hard drives, like all HDDs, are dying”, Ars Technica, 12 September 2024, https://arstechnica.com/gadgets/2024/09/music-industrys-1990s-hard-drives-like-all-hdds-are-dying/, accessed 13 September 2024. |
[Spiral] | Wikipedia contributors, “Spiral model”, in Wikipedia, the Free Encyclopedia, https://en.wikipedia.org/w/index.php?title=Spiral_model&oldid=1068244887, accessed 14 February 2024. |
[Wertz11] | Space Mission Engineering: The New SMAD, James R. Wertz, David F. Everett, and Jeffery J. Puschell, editors, Torrance, CA: Microcosm Press, 2011. |
[Wilkes90] | John Wilkes, “CSP project startup documents”, Concurrent Computing Department, Hewlett-Packard Laboratories, Report HPL-CSP-90-42, 11 October 1990, https://john.e-wilkes.com/papers/HPL-CSP-90-42.pdf. |
[Zetter23] | Kim Zetter, “The untold story of the boldest supply-chain hack ever”, Wired, 2 May 2023, https://www.wired.com/story/the-untold-story-of-solarwinds-the-boldest-supply-chain-hack-ever/. |