Previous chapters introduced how to work out what a system or a component should do, by determining what the objectives are for it and then turning those objectives into a specification.
The next step is to design the system or component that will fill those needs.
A design for a component provides a simplified model of how the component will achieve the behaviors, qualities, and structure laid out in its specification. The design is not the full details of how it will achieve those things, or a detailed implementation. The design is a plan for how the component will be built, at a high level; it records the high-level decisions about how the component will be implemented without actually being the implementation.
“Design” is an activity that lacks sharp boundaries from other development activities. On the one hand, it responds to the objectives and specifications that have been developed for the thing being built; on the other hand, the act of designing usually reveals gaps in the specifications that lead to feedback that causes people to update the specification. Specification and design proceed recursively as a system is built, where the act of designing one component leads to writing specifications for its subcomponents.
“Design” also lacks a distinct boundary with “implementation”. Indeed, the boundary between the two varies by convention in different disciplines.
Given the diversity of ways the word “design” is used, I will define what I mean by the term in general.
A design is:
A design is not:
In some projects I have used the term “design model” for the design, to emphasize that the design is a simplification and explanation of the most important aspects of the component’s implementation.
There are several kinds of information that should be recorded in a design.
All of this information should be annotated with a rationale for the decisions that led to the particular design.
Why should one take the deliberate and separate step of putting together a design for a system or component, rather than just implementing a component directly based on its specification?
For an exceptionally simple component, one can skip design and just implement the component—but the component must be truly simple, completely understandable from its implementation, involving no significant design choices, and with no future need to change the component, for this to pay off in the long run.
The value of an explicit design comes partly from its abstraction and simplification, and partly from being done mostly before putting together the detailed implementation.
Time to reflect. This is perhaps the most important reason to take the time to build a design before implementing a component. Modern systems are deeply interconnected. The design choices for one component have effects not limited to that component, and the design choices must usually reflect the needs that many other components place on the one being designed. It takes time to find and understand all these interdependencies.
Many components can be designed in multiple different ways. It is often useful to spend some time developing multiple design approaches before settling on one of them. In many cases it is useful to have two or three design approaches, one of which imposes requirements on some subcomponent that are difficult to achieve. That difficulty may not reveal itself until people have proceeded into the specification and design of that subcomponent. Only then may one realize that an alternative design for the original component is better.
Finally, the design needs to support all of the component’s or system’s specification. Rushing through the design increases the likelihood that some essential requirement will get missed, leading to problems later when the component is integrated with others, or when the system goes into operation, and a subtle failure occurs.
Balanced and incremental design. Modern, complex systems involve many different kinds of constraints on components. A component may need to meet all of structural, safety, functional, security, reliability, environmental, maintainability, user interface, and budget constraints to meet its specification and thus to function correctly in the system as a whole.
I have found that focusing too much on any one of these aspects leads to an unbalanced design that does not meet some other aspect. This can lead to repeated partial design followed by redesign after redesign, each time focusing on a different aspect.
The alternative is to consider a little of each aspect at the same time, working to find a rough design that looks like it will be going in a feasible direction for all of these aspects. After there is a rough design, one can go into greater depth on individual aspects with lower risk that the dive into one area will result in not meeting constraints on another aspect.
As one example, reliability and safety often work against each other. The safer choice is often to shut down a component rather than trying to keep it in operation after a failure. Conversely, the redundancy needed to increase reliability increases the complexity of the component, leading to more conditions that could lead to a safety violation.
Guide and explanation. Multiple people will use a design over the course of a project. While one person may develop the first design, others will analyze it for safety or security; still others will review the design for completeness or correctness; one or more people will use it to implement the component; other people will use it to develop and perform verifications. Later, other people will use the design to understand a component that may need a bug fix or feature change.
In other words, the design is for communicating among many different people and over potentially long periods of time, when the people who originally made the design are no longer available to answer questions from their memory.
For all those people who work on the component later, the design provides a guide to understand how the component is organized.
All too often, an engineer is asked to figure out why some existing software component is not working as expected. There is no design, just the source code. The engineer has to try to extract the design from the source code in order to figure out where the component is not behaving as it should. Extracting the design takes time and effort that could be avoided if the design could just be consulted. An extracted design is rarely accurate: the source code does not have a record of where there are subtle, unobvious aspects of the design; nor does it record why the design is what it is. The result is greater cost and time required to update the component, and a higher risk of a change introducing more problems than it fixes.
Decision rationales. A good design includes an explanation of why particular decisions were made. This information helps those who review and analyze the design to determine whether good choices were made. More important, the rationale informs the people who later need to update or redesign the component.
It is common that any electronic board component that is in production more than a handful of years will run into a situation where some chip is no longer available. The manufacturer has stopped making the original chip X, but another manufacturer is making a chip Y that is supposed to be pin-compatible with chip X. Is it okay to substitute chip Y for chip X? That depends on what it was about chip X that led to it being the choice. If the choice was just on the basic chip function, the substitution is probably okay. However, if the choice was based on something unobvious like the chip X’s radiation tolerance resulting from a particular lithography technique, chip Y may not be an acceptable replacement. The only way to know that the radiation tolerance was a key part of the decision is if someone writes down that rationale.
Supporting analysis. Many key component properties, especially those related to safety, security, or reliability, are emergent from the design. It is increasingly evident that these properties are difficult to retrofit into a completed design: they involve the fundamental organization of elements of the design.
This leads to approaches of security-guided or safety-guided design. In these approaches, the security or safety properties are considered from the start and included in the design. As the design progresses from a rough sketch to something more detailed, it can be analyzed with progressively greater accuracy to determine whether these properties are being met.
This approach is relatively inexpensive and easy when it is being done as part of the original design effort. A safety analysis can determine what high-level aspects of a control loop are essential for safe operation; a security analysis can determine what information flow properties must be met to maintain security. These analyses help early pruning of potential design approaches that would not meet safety or security needs.
The alternative is to proceed without including safety or security considerations, then having to go back and work out control or data flow on a more complex design, then repeat parts of the design process while undoing earlier decisions. Repeating work like this takes more time and effort, and is more likely to result in an implementation that has safety or security flaws.
Alternative designs. In the early stages of designing a complex component, there are likely to be multiple different approaches for the component. The choice among the approaches is often not immediately evident. Which one uses chips that will be available on the needed schedule at the needed quantity? Which one uses a subcomponent that will require significant research to make work? Which one will require a significant up-front investment in acquiring long lead time parts? Which one will be acceptable to regulatory agencies? It may take quite some time and effort to find answers to these issues: prototyping a subcomponent, making legal arrangements with suppliers to find out about availability, and so on.
When there are these kinds of risks in the designs, it is helpful to explicitly keep multiple designs open during the investigations, and to delay investing in detailed implementation effort on any one design that would not be useful if that design turns out not to be feasible.
As noted above, a design enables communication among multiple people, across different times, and for different purposes.
Developing the initial design. One or more people take the objectives, CONOPS, and specification for a component and eventually produce one or more potential designs for that component.
Developing the design is not a single, monolithic activity. It almost always proceeds incrementally, evolving the design from a rough sketch through multiple ideas that turn out not to be quite right until reaching a design that looks like it will meet the component’s specification. The designers will need to try out multiple ideas along the way, meaning that what they document will need to evolve as they try different approaches.
The process of assembling a design can be characterized as working through each of the elements of the specification, while at the same time matching the specification against the possible building blocks for the component. As a simple example, this might involve matching a specification for an electrical energy storage system to store X mAh of energy against a catalog of available battery products.
Actual component specifications involve multiple aspects, some of which will work against each other. A realistic electrical energy storage system must meet performance specifications such as the amount of storage, maximum safe current, reliability constraints, and a number of constraints related to safety. This leads to the recommendation that a designer consider many specification aspects at once, but only at a high level, before going into greater detail.
In the end, the designers must either show that the design they have created fulfills the corresponding specification, or show that the specification is flawed in some way and feed that information back to the people responsible for the specification to get it changed.
Tracking alternative designs. There are usually many ways to design some component, with pros and cons to each. Early in design, there may be multiple promising approaches that require more investigation before a decision can be made among them.
This means that each of the alternatives needs to be documented, along with the investigations needed for each of them, until a decision can be made. It must also be clear to everyone working with the alternatives which one is which. When one alternative is selected, that choice must be clear to everyone working with the designs.
Evolving a design. Every design will evolve, both during the initial system development and over time as the system is used or upgraded or fixed. Any change to the design needs to be evaluated for its scope, its effects, and its correctness.
Evaluating scope and effects means determining what effects the change will have in addition to the specific change being considered. A change in one part of a component might affect some safety property of the component as a whole, for example. A change might also affect some behavior or structure that some other component depends upon, possibly indirectly across multiple intervening components. Substituting one chip for another in a board design might change the timing of some signal, which leads to a subtle change in the sequence of operations performed by software on another board, which in turn invalidates a monitor watching for faults.
Evaluating correctness involves checking that any analyses done on the previous design to show that safety, security, or other properties hold either continue to hold or that the analyses can be adjusted to show that the updated design still meets those criteria.
Analyses. Complex systems will have a number of properties they must exhibit to be correct. These include safety, reliability, and security properties; they also include meeting business objectives and other more mundane properties.
Safety- and security-guided design methods ! Unknown link ref involve incrementally building up these analyses as design progresses, so that a simple, preliminary analysis can provide input to an evolving design.
When a design is believed to be complete enough to select and baseline, it will need review to ensure that it meets all of its specification. Part of this review involves checking the analyses that show the design is compliant. The reviewers need to have the analysis in order to check it.
When a change is being made to a component’s design, the analyses provide a starting point for analyzing the effects of the changes to check that the safety, security, or other properties will continue to hold if the change is made.
Generate specifications for lower-level components. The choice of what subcomponents will be part of a component is a major part of the design effort. The choice of subcomponents means that the role each subcomponent will play has to be worked out; this amounts to developing a specification for each subcomponent.
A subcomponent’s specification is a reflection of the component design. The subcomponent will only work properly as a part of the component if it meets that specification. This leads to the layering principle discussed earlier ! Unknown link ref.
Navigating through the system. Many people will need to find things in the system over time—developers, reviewers, auditors, and many others. Virtually none of them will come in with a complete understanding of the system and its structure, so they will need a guide that helps them learn the structure of the system and to find where some behavior or feature is implemented.
The system design can support such users in three ways. First, the design can provide the breakdown structure, showing how the system is divided into components, those into subcomponents, and so on. The breakdown structure also groups related components together, so that a user can narrow down where they are looking. Second, the design can show how components are related to each other. If one component in one part of the system is providing feedback signals to a component in a different part of the system, making these relationships explicit provides a way for a user to trace out these interactions. And third, including explanations or rationales for why the design is the way it is helps educate the user about subtleties that are not going to be apparent from just reading about the structure, interactions, or behaviors.
Guiding project management. As the design progresses, there will be more components to design than there are people to work on them, and some components will be ready to implement or verify. Project management must make decisions about where to put effort.
Project managers will need information like how risky some potential component designs are, as opposed to those component designs that are fairly certain and thus reasonable to implement. They will need to know which component designs have significant uncertainty, and will benefit from investing resources to prototype a potential design.
These decisions benefit from information that can be gathered and maintained in the overall system design, such as:
Progress tracking. Project management needs to be able to track the development progress of different parts of the system, in order to determine whether a project is on track for completion or is having problems that need to be addressed.
Being able to name each of the components that need to be developed, and being able to determine the development progress on each of them, enables project tracking.
As well as all the uses listed above, the developer uses the design as a guide for the implementation. The resulting implementation must be consistent with the design: having the same structure and behavior, including all the functions in the design, and including no functions not in the design.
The developer or implementer must be able to understand the design to build a component that matches the design. The developer must also be able to check that they understand the design properly, so that there is a way to catch misunderstandings. A good design uses consistent structure, terminology, and diagrams to aid understanding. It provides a glossary of terms that may have multiple meanings to define how they are used in the design.
Developers will find problems with the design as they proceed through implementation. They may find ambiguities, where the design is unclear or where the design does not address some important condition. The developer may find errors, where the design is inconsistent internally or with its specification. The developer may find that parts of the design aren’t feasible to implement. All of these problems need to be fed back to designers for clarification or correction.
When the design changes, the developer needs to be able to identify what parts of the design have changed so they can change the corresponding implementation. The change might come in response to feedback from the developer, or evolution of the design to address changing needs or broader system fixes. This can be supported by using tools that track design versions and highlight design changes between versions.
Finally, the developer must be incentivized to follow the design (or provide correction feedback) as they implement the component. This includes having the designer and independent people review the implementation to compare it to the design. If they find that the design and implementation are not consistent, they must decide on how to change the design, the implementation, or both in order to achieve consistency. The component implementation should not be accepted as complete until they match.
The artifacts that record the design enable all the usage cases listed above. The key functions they need to fill include:
The designs for a system need to be available to everyone associated with the project, so that they can use the design to learn about the system and navigate through it.
An ideal solution provides a “single source of truth”: a user can go to one place and see all of the information about the system. The ideal solution also ensures that the user always sees a single consistent version of all the information. To the best of our knowledge, at present there are no systems that completely meet this ideal. However, there are ways to come close by integrating multiple tools and applying conventions to how they are used.
The infrastructure for maintaining designs needs to, at minimum:
The following sections list the key artifacts that should be part of a design. Later chapters will detail these artifacts.
The breakdown structure consists of the hierarchical relationship of system, components, and their subcomponents recursively. It gives a name or identifier to each component, and provides the index or table of contents to the parts that make up the system. See ! Unknown link ref.
A complex system will have behaviors or structures that cross multiple parts of the system, and don’t neatly fit within a single hierarchy of components. There are two important examples of these behaviors to document.
The first example is behavior or activity sequences that show how different parts interact with each other. These are sometimes documented as UML or SysML activity diagrams, which show how control or data pass among components, and how different components take actions in response to those. The point of these patterns is to show how components work together, which informs the interfaces, actions, and states that the components involved in the activity must support.
The second example is the hierarchies of control that operate in the system. These document how one part of the system controls the functions in other parts, including how some components provide sense data to drive the control logic, and how the control logic in turn sends commands to other components to effect control actions. Documenting and analyzing these control systems is an essential part of some safety and security process methodologies, such as STPA [Leveson11].
Each component in the system should have its own design. This is the primary content about individual components, as opposed to how components work together.
A component’s design can be represented in many different ways. However, it is easiest for users if all designs follow the same general format so that they know how to find particular kinds of information within every design.
All designs should include:
Each component’s design should include rationale: the reasons why different design choices were made. This information helps those who must come along later to review or update the design.
In some cases, not all of this information can be represented in one way or in one tool. For example, for electronics designs the best way to represent some information will be in a CAD drawing that is maintained in a separate tool from the rest of the design information. In these cases, there should be unambiguous references from the main design to the CAD drawing and vice versa, and the versioning in the main design should be reflected in versioning in the CAD tool.
Part of the reason for developing a design—as a simplified model of what will be implemented—is to enable analysis of the design’s essentials. These analyses address whether the design will meet aspects of the component’s specification. These can include safety and security, as well as meeting business objectives, regulatory requirements, performance specifications, or resource budgets.
As I will discuss in the next section, it is recommended practice to develop these analyses incrementally in parallel with the design itself. In this way, a rough analysis of a rough design can provide quick, early feedback that will guide the design toward meeting its specified properties as it is developed in more detail.
These analyses become an important part of the record of a design once complete. They provide an extended rationale for why the design is the way it is. They may be needed to answer to external stakeholders, including regulators or courts of law, when it becomes necessary to provide evidence why the design is acceptable. The analyses also help people who must later evolve the designs to understand both the constraints on what they can change, and where they have freedom to make changes without invalidating the safety or other properties of the design.
As a matter of principle, the design for a system or component should be done after its objectives and specification are done, and before its implementation. Similarly, the design for the components in a system should proceed top down, starting with the system as a whole and proceeding to lower and lower level components. When the design of one component depends on the design of another, the two should be designed together.
These principles often lead people to conclude that systems should be built using a waterfall-like process, where everything is specified before design, everything designed before implementation, and so on.
Real projects are not so simple. I have never observed a project that actually used such a process, even when they tried to. This is because every complex system I have encountered is not fully and accurately knowable in advance. One can write a set of specifications that turn out to require some impossible component design. One might miss some important system objective when developing the initial system concept because the customer was not able to conceive of system operation until they could see part of the system in operation, or because the customer’s needs change. An initial design may be invalidated because a supplier discontinues an essential part. Some part of the system may require significant investigation or research before one can find a feasible way to approach its design.
All of these situations lead to cases where the specification, design, and implementation of the system does not proceed in a tidy one-way sequence through the waterfall stages. Instead, part of a component’s specification gets worked out, and some tentative design goes ahead using that part of the specification gets worked out. Or multiple possible design approaches are defined, and then someone proceeds to build simple prototype implementations of two or more of them to compare their feasibility. Or the design for a component must change, leading to a change in implementation. All of these may be happening in multiple parts of the system at once.
At the worst, all this change happening all over a system can lead to chaos where people working on different components are working to incompatible specifications or designs and building parts that will not integrate into a system. Project management may not be able to determine how much progress has actually been made on any part of the system, and thus be unable to detect when there are schedule or resource problems.
Therefore while the simple waterfall model, which organizes the work on a system, is not feasible, there is still a need to organize development work.
The principles I started with are good ideas in general, when used flexibly.
Develop specifications, then design. When one designs a component without first working out what the rest of the system needs that component to do, one usually ends up with a design that doesn’t actually meet needs (once those are worked out). When a specification gets developed, the people involved will tend to look at the effort that has already been spent on designing (and possibly implementing) the component and will try to adjust the specification to fit that sunk cost—after all, that work has already been done, why should it be discarded? Unfortunately this tends over time to produce safety and security problems, and to dramatically increase the cost of the system as people try to integrate the wrong component into the rest of the system.
It is better to explicitly defer some design decisions until the specification is firm—but not avoid doing any design. (Doing no design until specification is done is not possible when the design activity can reveal problems with a specification.) Do a minimal amount of design, bearing in mind the risk that design may need to change as the specification changes, as well as the risk that the specification may need to change as design reveals problems.
Instead:
Develop design, then implement. Similar to the way design reflects specification, the implementation reflects design. Proceeding with implementing a component before it has been designed is not really possible: doing so means that design is done implicitly and is left unrecorded. This leads to components that fail to meet functional, safety, or security constraints because those constraints have not been properly considered and analyzed before committing effort to implementation.
At the same time, deferring all implementation until all design is complete is a recipe for an infeasible system. It is all too easy to create a design that involves impossible feats of implementation, from requiring metals that do not currently exist (“unobtainium”) to algorithms that have not been invented.
I have found that a middle ground often works well. As I will discuss in future chapters on implementation, I have used a software implementation approach that emphasizes continuous integration (by which I do not mean continuous testing) and skeleton building for implementation, where the implementation proceeds in many small iterations. Using this approach the team can build a simplified implementation of the general structure of a component, focusing on those aspects where the design appears either to be relatively certain or where there is higher risk in the design that needs to be checked with a rough implementation.
I have also made a point of prototyping implementations of parts of a design in order to validate whether the design is feasible. I will also discuss prototyping in a future chapter.
There is a high risk with any implementation done before specification and design are solid, even when the implementation is done for good reasons (like prototyping to validate a design approach). The effort spent on implementing something is a sunk cost: it cannot be recovered. As the design evolves, there is a strong incentive to try to continue to reuse the implementation that has already been completed, as the incremental cost or time of modification is almost always perceived to be less than starting a new implementation from scratch. This leads to a sequence of incremental changes, each of which by themselves can be perceived as the lower-cost way of handling a sequence of design changes. However, it is often the case that after a few of these incremental changes, it will have become more cost-effective to have thrown away the initial prototype or implementation and started over with better information. This sequence of incremental changes also tends to result in an implementation that has many vestiges of implementations that are no longer applicable, but which continue to present a source of bugs, security flaws, or safety problems.
The cost of incrementalism is often apparent only in retrospect. It is also driven by basic business imperatives to minimize cost at each step, or to get features implemented as rapidly as possible. This is an example of an online optimization problem, which is often hard to solve well theoretically and even harder when human incentives are involved. The techniques used to solve similar online optimization problems (notably the ski rental problem ! Unknown link ref) apply. Limiting the amount of implementation effort that may be at risk for incrementalism by deferring as much implementation as possible until the design is solid helps avoid this situation.
Thus we:
Design top down and coordinate the design of interdependent components. Many aspects of a system’s design can only be developed effectively when they are developed from the top down, notably safety and security properties. That is because these properties apply to the system as a whole and are emergent from the designs of the components that make up the system. (See [Leveson11] for an in-depth discussion of this effect.)
However, designing from the top down creates risk, similar to the previous principles, that a high-level design may create unachievable specifications for lower-level components. There is also a risk that during high-level design the cost or time involved in developing some lower-level parts of the system is unknown. This can lead to effort being spent, unknowingly, on subcomponents that are simple to design and build while subcomponents that will take far longer to develop are left for later, leading to a drawn-out schedule.
Our recommendation for managing this risk is to sketch the design for multiple layers, creating a rough outline of a design for a component and some layers of its subcomponents, then proceeding to add detail to the high-level component and fleshing out the specification for its subcomponents. Proceeding incrementally in this way allows one to obtain some information about the feasibility and complexity of a particular design approach before committing all of one’s effort to the detail of the top-level component. This approach is similar to our recommended implementation approach of building skeletons or prototypes of components rather than immediately progressing to detailed implementation.
The same issues about the cost of incrementalism apply to top-down design as they do to implementation. It can be useful to make sketch designs that are not in the final form needed, to reduce the temptation to turn sketches that have been changed over and over directly into the design for a component.
Balance design work. I have found that focusing on one aspect of a component’s design to the exclusion of others often leads to dead-end designs, where a work in progress becomes too biased toward one aspect and is not readily evolved as other aspects begin to be considered. Focusing on primary features first, and leaving security or safety for later, is a common example of this pattern.
I have found it more useful to consider many different aspects of a component’s design at a high level, sketching out different rough possible designs and making simple comparisons as one learns about the design problem. This approach has the advantage of investing relatively less effort on detail design and analysis while the design has a higher degree of uncertainty, and focusing effort on those approaches that pass the first simple evaluations.
This approach to design has its pitfalls. Some components’ designs are constrained by particular aspects—such as a need for high performance or the ability to operate in an extreme environment. These aspects are sometimes called design drivers: they have a disproportionate effect on the final design. Recognizing when some aspect drives the design in this way, and putting more effort earlier into understanding these drivers, is part of the art of designing well.
Plan for updates. Nearly every design in a successful system will be updated as time goes by. Over time, the effort spent on these updates will dwarf the effort spent on the initial design. This means that if one is developing a system for the long run, the processes, tools, and artifacts used in the design effort should be organized in a way that supports those who will come along to learn about, evaluate, and redesign parts of the system—long after those who initially designed it have moved on.
This necessitates documenting more than just the structure of the implementation. For these people to understand a design, they need to know the thinking behind what the choices were and the subtle aspects that are not necessarily apparent from looking at the implementation. These people will need guidance for how components relate to each other. They will need to understand the analyses that determined whether the component’s design was sufficiently safe or secure. This documentation takes more effort than proceeding through a one-time design, building an implementation, and then moving on, but it provides a project with a future.
Making updates effective also involves creating a team structure and human processes that can handle updates. This involves giving the team a clear way to understand how design changes happen, and how to distinguish proposals or work in progress from a design they should work from, or how to determine what design applies to a specific deployed system. It also involves developing a team culture that incentivizes good design and good documentation, giving them enough time to document enough design that their successors can build on their work and avoiding creating unnecessary time pressures that disincentivize people from doing good design.
Use appropriate infrastructure. Finally, effective design relies on having the tools, processes, and standards that give people the tools to do design work. The key principles I recommend include: