ROBUSTICITY! What it is and why it's important

In the systematic trading and investing world, you see it all the time. Someone will design a system or rule set that shows tremendous historical returns - returns that seem too good to be true. They will start trading the system in real time with their own money. Or worse, they will go out and market that strategy, luring in naive investors who lack the training or experience to recognize an improperly built system when they see it.

The actual results come in much weaker than implied by the test data. After a few days, weeks or months of lackluster returns, the system is tossed into the waste pile.

Trading systems can fail for a number of reasons. Among the most common are inaccurate estimates of slippage and commissions, issues with trade execution, or that the system was simply based on a weak market anomaly that is easily arbitraged away. In many cases, however, the deeper reason is that the designer failed to keep robusticity in mind.

What, exactly, is robusticity? Robusticity is simply the ability to withstand changes in environment, and in this context, “environment” refers to market conditions. A robust system is one that can survive widely varying market conditions across a long span of time. It flexes with market conditions, and it is unlikely to fall apart due to a market regime change. The longer a system has been working, and the more asset classes and countries it has been working on, the more likely it is to continue working. This is called the Lindy Effect - what has been around a long time is likely to be around a long time.

The opposite of a robust system is one that is “curve-fit” or “over-parameterized,” Such a system is designed (knowingly or not) to perform optimally in the narrow market conditions upon which it was tested. Extremely high returns over a short test period are a red flag, as a robust system is unlikely to be the very best performing system on any narrow time frame.

Markets are ever evolving, so designing systems with robusticity in mind is of utmost importance if one hopes to have success with live systematic trading and investing. So how do we design and implement a robust trading strategy? Some points for consideration:

Does the backtest work on multiple assets, or just a single security?

A system based on a backtest using just Apple stock or even just the S&P 500 index would be suspect. We would want to test the rules on a wide range of stocks (for instance, all stocks in the S&P 500) or a wide range of equity indices (the German Dax, the Hong Kong HSI, the Japanese Topix, the Australian All Ordinaries, etc.). A rule set that was designed to generate trading profits on a narrow universe is likely “over-fit,” meaning that if market conditions change in that instrument, the system could break. Now if we go out and confirm that it works on a wider range of assets, we might be onto something.

Does the backtest work when we vary the underlying parameter values?

If we design a system that creates buy and sell logic based on the 200 day moving average, we would want to stress test the system using parameter variation of that moving average. We would want to try a whole range, perhaps 50 days to 400 days, sampling results at 10 day increments. If the system performs well using a 200 day lookback, but the results fall apart when testing with a 150 day or 250 day lookback, the system is likely not robust. Chances are that market conditions will be different in the future than they were in the past, so the success with the 200 day was likely due to luck rather than a true edge. If the 200 day moving average signal is truly an edge, then so too should be other time frames within the same order of magnitude.

Has the system been tested on different market regimes?

If we test a system that shows solid results on US stocks from 2010 to 2018, that is a good start, but we would never put money to work based on this alone. From 2010 to 2018, interest rates have been low, the economy has been in expansion, and stock valuations have been rising as smoothly as they ever have. For greater confidence, we would want to see how the system performs during bull markets, bear markets, rising rates, falling rates, inflation, deflation, and stagflation, not to mention the time before and since the introduction of computerized trading. While we can never test all possible scenarios (indeed, new and unfathomable scenarios are bound to occur in the future), generally, the more data and the more variability in market regimes, the more confidence we can glean from our testing.

Putting the concept of robusticity into practice in testing:

How does a systematic trader guard against over-parameterization and do the best he can to ensure robusticity? Stress test! Using something called Monte Carlo simulation, a system designer is able to vary the parameters in their simulations and see if and how things break.

For example, in our Alpha Momentum Strategy, we vary parameters such as the moving average filters we apply at the index level and individual stock level, using both long-term and medium-term moving averages. Thus, we have multiple signals that influence if and how much of the portfolio will be invested, and which individual stocks will be held.

Here is a chart of the output from a Monte Carlo test on our long-term individual stock-level moving average. It shows the result of thirty backtests where all parameters of the system are held constant except for the moving average lookback, which we test in the range of 100 days to 400 days in 10 day increments. So we are testing the system using a 100 day moving average, a 110 day moving average, a 120 day moving average, etc. all the way up to a 400 day moving average and plotting the results along a curve. For each of these simulations, all other parameters of the system are held constant.

We can see the annualized return that corresponds to each moving average lookback on the Monte Carlo curve here. Be aware that when we are doing this type of analysis, (1) we do not want to look at compound annual returns and (2) we want to look at measures of system success other than just returns, but we are using this chart for illustrative purposes.

What we are hoping to see in this analysis is a large “flat spot” on the Monte Carlo curve, indicating that the exact choice of this parameter (lookback days) isn’t all that important. If we had seen instead a jumpy pattern or just a single peak, we would suspect that any good results were due to luck and not a true market phenomenon. What we see above though is parameter stability, indicating that this element of the system is likely robust to a change in market conditions. If conditions change such that markets “move faster”, we should be ok, as we should be if they start “moving slower”. Again, this is just one test among many that we would require before committing live capital. We ask, “what if,” and don’t assume the future will be like the past.

Putting robusticity into practice in live trading:

While the concept of robusticity is vitally important in the testing phase, we can extend this concept into implementation phase as well to reduce volatility. By breaking up the portfolio into tranches and running the live system using different parameters across the parameter value curve, we can increase our consistency in live trading and thus make the system more comfortable to trade and easier to stick with in real time.

We want to do this because there is a tremendous amount of noise and randomness in markets. Before we started trading using systems, we always knew there was a fair amount of randomness to asset prices, but we never truly grasped how random markets were until we were able to test rules and vary parameters systematically.

So while running a system with the 250 day moving average may provide results that are indistinguishable from a system that uses a 200 day moving average or a 300 day moving average over a 10 or 20 year period, in the short run, the results will likely be radically different.

The same goes when you substitute say the Dow for the S&P in a multi-asset system that uses US stocks as one component. Over the long-term, the difference will be minimal, but results in any given year may look surprisingly different. A bad run in a good system may fool a trader, perhaps causing him to change the parameters, only to have the original parameters then start outperforming the new ones! There are many such cases.

Imagine the following scenario:

You design a system that generates signals based on a stock’s relation to its moving average. You test the parameterization of the moving average for robusticity and find that over a 20 year period, the moving average “works” for lookbacks of 100 days to 200 days. In the spirit of simplicity, you decide to split the difference and use a 150 day lookback for your moving average in live trading.

3 months go by and you check the performance of your system. Unfortunately, performance is not that great. The system is flat. You take a look at how the system would have done had you used the 200 day moving average and you find that it would have been up 10%.

Disappointed, you decide that going forward, you are going to use the 200 day moving average instead of the 150 day moving average.

You check back in three months and performance is still flat. You play with your parameters and find that the 150 day moving average system was up 10% since you switched to the 200 day moving average. Had you stuck with your moving average at 150 days or had you started trading using the 200 day moving average from the start you would be up 5%. Instead you are flat. The system was robust, but your execution was not, and you did not have the emotional fortitude to stick with your system throughout the execution (perhaps a topic for a future blog post!).

To be clear, this was a behavioral error. You designed a robust system, but you didn't consider how a short-term period of underperformance based on randomness associated with your parameterization might affect your mindset and emotions, you overrode your strategy, and created slippage relative to how the model should have performed.

Using the concept of robusticity, you could have avoided this situation. How?

Back to our starting point, remember, you tested lookback values from 100 to 200 days and you found that over a 20 year period, the results were pretty much indistinguishable from each other over the long term no matter the lookback for your moving average. Now you realize, however, that market randomness can cause a system that uses the 150 day moving average to greatly underperform a system that uses the 200 day moving average over the short-term (or vice-versa).

A solution? Chop up your portfolio into tranches and use different lookbacks for each tranche. Perhaps you could split your portfolio into three tranches, trade one tranche using the 100 day moving average, the second tranche using the 150 day moving average, and the third tranche using the 200 day moving average. You have effectively diversified some of the randomness associated with the particular lookback that you choose out of the system and you are now more effectively leveraging the signal provided by the indicator. Institutional quant funds are known for running lots of essentially identical strategies using various lookbacks side-by-side for exactly this reason.

The downside to this solution is now you have added extra complexity to your process. Here it is important for the systematic trader to balance both the positive and negative effects of this added complexity, and these decisions will be personal – they should be based on an individual’s tolerance for short term volatility of performance relative to signal and emotional makeup.

A Real Application:

Our Alpha Momentum Strategy, which we track on a bi-monthly basis, is essentially the synthesis of two systems - one using medium-term lookbacks, and one using long-term lookbacks.

We find generally, that for our type of trading, long-term lookbacks tend to do better than short and medium-term lookbacks over a long investment horizon. That said, we recognize the risk of short-term underperformance from the use of a single system. Therefore, we integrate the medium-term lookbacks into our system to help smooth returns and make the system easier to stick with as a whole. To be sure, our performance over the long run will likely be inferior to a system using solely the long-term lookbacks, but we sacrifice some performance for peace of mind, reduced volatility, and thus a better chance of being able to stick with our system. There is also the chance that when we look back at the next 60 years we’ll find that medium-term lookbacks have outperformed long-term lookbacks over this period.

This approach is precisely the opposite of curve-fitting. We consciously choose parameters that make our headline simulation numbers look worse than they could, which is a small price to pay for increased robusticity.

We’re already seeing the benefit of this choice in real time. Since we started tracking our performance live on our website back in March of this year, our medium-term sub-strategy has substantially outperformed our long-term sub-strategy despite our long-term sub-strategy showing clearly better returns in our test period going back all the way to the 1950’s. The table below shows the breakdown of live performance of our two underlying sub-strategies.

Robusticity and diversification at work!

Thanks for following along!