Robust Design: Macro & Meta Contradictions
Kobus Cilliers | On 07, Apr 2019
I’m speaking at a Robust Design conference this month and I think my job is to shake things up a bit. It shouldn’t be too difficult. The Robust Design world is still thinking it’s a good idea to be talking about Robust Design. Robust sounds good until things go wrong. When they do, a robust system goes really wrong. Never mind. One day the community will work out that Resilient Design would be a better target. Or, better yet, the real target ought to be AntiFragile Design. Systems that get stronger the more they are stressed. Some industries are heading there. Most have a way to go. This is for them.
Like everything else in life, a desire to improve the robustness – or reliability, availability, etc – of a system is bound by the universal law of the S-Curve. This is the law that tells us we can improve the robustness of a system and at first our attempts bear a lot of fruit, then, as time progresses, it gets tougher to make the improvements we seek, then, later still, a law of diminishing returns kicks in until eventually no matter how much more effort we put in, no increases in robustness are achieved. Something like this:
In crude terms, also universal, we can observe three distinct phases of Robust Design. The first one is the low-hanging-fruit stage. This stage is found early in the evolution of the system we’re designing when, because we’ve only just started to understand what the reliability of the system is when we place it in the real world (as opposed to the FEA calculations in the office). This is the phase where lots of things can and do go wrong with the system, but when we attempt to put them right, for the most part our efforts pay off.
Then comes Phase II. This is the Phase I think of as ‘Robust With Consequences’. Attempts to improve the reliability of a system will deliver results, but at the expense of something else. We’ll improve reliability, but the cost of the system goes up. Or we’ll improve the reliability and make the operating efficiency worse. This is the contradiction stage of the S-Curve. It’s one thing to trade-off reliability for other design parameters, but no customer is particularly going to thank us.
Because the whole world of Robust Design is still relatively immature, I think a lot of designers are still of the view that these trade-offs are an inevitable part of their design challenge. Traditionally – if we all look back through our college notes – they’d be right. In a world in which TRIZ and tools like the Contradiction Matrix exist, of course, we know that the smart designer will recognize the existence of the trade-off and tap in to the conflict-resolving strategies of those that had successfully challenged the design conventions. Something like this…
Some designers accidentally find their own solutions to these kinds of trade-off problem. The majority embark on a game of Robust Design Whack-A-Mole, occasionally getting rid of the mole for a while, but the game gets tougher and tougher until we enter the next Phase.
Phase III is what I call the Dodo Phase. It’s the time when we think we’re being really smart continuing to make best use of our resources to improve reliability, but in so doing we’ve unwittingly begun to make fatal compromises to the overall efficacy of the design. As with the Dodo, we’ve made really reliable leg muscles, but forgotten that wings might be useful if hungry sailors start shooting at us.
In an ideal world, we recognize we’re about to enter Phase III before we actually reach its threshold. Sadly, few organisations seem to have in place the mechanisms that allow them to recognize the fundamental-ness of the approaching fundamental limits. It usually takes months of beating one’s head against a wall before people get an inkling that something is amiss. Maybe that’s the sign. The sign that tells us that solving the Phase II contradictions can only help so far.
True enough, the Contradiction Matrix will offer certain clues that more fundamental jumps are required in order to fix the prevailing conflict. Principle 28, Mechanics Substitution is a pretty good clue that a mechanical system needs to stop being mechanical. Principle 36, Phase Transition, a more subtle but no less profound signal that a bigger discontinuity is in order.
Fortunately, as we plot out how Robust (or should that now be Resilient or AntiFragile?) system design mechanisms evolve through these bigger paradigm-shifting jumps we get to observe a very clear pattern.
We’re talking now about meta-contradictions. Solving Phase II reliability-with-consequences contradictions will climb us up the current design paradigm s-curve (while enabling macro-scale s-curve jumps), but when we hit its limits, we need to change the way we think about the design processes we deploy if we’re to jump to a new paradigm.
Typically, it seems, each one of these meta-design S-curves will offer designers one or sometimes two orders of magnitude increase in the levels of (compromise-free) reliability. Hence, if we think about reliability in terms of the number of ‘nines’ we’re able to achieve (Table 3 offers a way of calibrating where we are and where we need to be), and we have a desire to achieve a certain number of ‘nines’ that will ultimately tell us how many design-method paradigm jumps we are likely to have to make.
To those familiar with our Trends Of Evolution, it shouldn’t surprise you too much to learn that this design-paradigm trend is one of the thirty-eight step-change trends we’ve uncovered over the years. The pattern – reproduced in Figure 3 – has been established for some time. It has, in light of the emergence of the ‘antifragile’ word, also evolved. What is less well known is how the various evolution jumps correspond to the number of nines they are likely to deliver to the designer of a given system. Figure 3 contains an attempt to calibrate the trend, but it is still very visible that there’s a lot of uncertainty. Achieving nine-nines in a jet engine is not the same as achieving the same level of reliability in an epi-pen, for example.
Best at this stage in the evolution of the Trend to see it as a relative rather than an absolute tool. Which means that, to stick with the epi-pen thought for a few more seconds, if we’re currently achieving three-nines reliability and are currently using design methods that are ‘steady state’, and we have a desire to achieve six-nines, then we’re likely to have to make two design paradigm jumps: we need to think about and do something about transients and we need to think about and do something about slow degradation effects. Each jump, as stated earlier, likely to give us an increase in our number-of-nines of one or two.
For the jet engine example, where the current state of the art on modern engines is ten-nines or higher, we know that the next evolution jump in terms of design method is to go beyond the ‘design for Murphy’ idea that the engine should survive no matter how badly the customer tries to treat it, to the idea that when the customer does something dumb, the engine learns to make itself stronger.
What happens beyond the ‘AntiFragile Design’ paradigm remains to be seen. Keeping an eye out for that new jump is a job for the SI research team. In true ‘think of someone with a more extreme version of your problem because they’ve probably solved it’ and given that the aerospace industry currently defines the highest capability, it will very likely be them that we will need to look first. Meanwhile, the good news for all the other industries is that because aerospace has already made all the paradigm shifts along the Trend, the wheel doesn’t have to be re-invented. If you’re designing to include transient effects, in other words, you know where you need to look for your next big meta-level S-Curve shift.