Black Swans, Stress Testing, and the Plausibility Trap

A black swan refers to a highly improbable but impactful event. The metaphor can apply to a wide range of events, even baseball records. Black swans most frequently apply to financial risk management, including regulatory capital and stress testing. The Global Financial Crisis (GFC) exposed some of the gaps in probability-based capital frameworks. Stress testing allows banks and their supervisors to focus on specific scenarios but remains hampered by a desire to ensure the stress scenarios remain “plausible.” Proposed changes to capital stress testing are likely to make the problem worse.

Until the 17^th century, Europeans believed swans were only white. That changed with the discovery of black swans in Australia. Until then, Europeans had no way of knowing that black swans even existed. They probably would also have had a hard time imagining koala bears and duck billed platypuses. Author Nassim Nicholas Taleb popularized the term in his book by the same name. A good way to think of an event in black swan terms is to look at its improbability ex ante and not with 20/20 hindsight.

Basel II and the GFC

During the 1980s, global regulators developed Basel I, which based capital requirements on risk weighted assets (RWAs). The framework was relatively simple, using risk buckets for specific asset types. As implemented in the United States, Basel I assigned a risk weight of zero for U.S. Treasuries, 50% for residential mortgages, and 100% for commercial loans. Some regulators believed the Basel I risk weights were insufficiently risk sensitive and developed a model-based approach. This Basel II (B2) approach had three key components: exposure at default (ED); probability of default (PD); and loss given default.

The PD models could be quite complex, but they boiled down to looking at historical losses and as well as a systematic risk parameter. The latter made assumptions about an asset class’s correlation with the overall market, with a low correlation translating to a lower risk weight. Basel II assigned a correlation of only 15% for residential mortgages, suggesting that loss experience was principally idiosyncratic rather than cyclical.

How did the B2 risk parameters translate into risk weights? Calem and Follain provided some useful examples in their 2003 paper. A prime loan (750 FICO; 70% LTV) would have a risk weight of 3%. An Alt-A (low doc) mortgage came in at 19% RWA. A subprime loan with a 620 FICO and 95% LTV translated to 60% RWA. It’s worth noting that these are risk weights; the actual capital charge would be one-tenth as much. For example, $100 in Alt-A loans would need just $1.90 in capital to meet the “well capitalized” threshold.

Although B2 capital rules were supposed to cover once-in-a-thousand-year events, they proved grossly inadequate during the GFC. For example, according to CoreLogic, seriously delinquent Alt-A loans climbed from 2% to 26% between 2007 and 2010. Delinquencies for conventional conforming mortgages rose from 1.4% to 7.2%. Despite the poor performance, the B2 risk parameters for residential mortgages remain unchanged. The Basel Endgame (RIP) sought to move away from a model-based approach, but those efforts are unlikely to go anywhere in the foreseeable future.

Stress Testing as an Alternative

Rather than making changes to Basel II, regulators developed an alternative framework for assessing capital adequacy. Supervisors designed stress tests less focused on probability and more focused on specific, adverse macroeconomic scenarios. Banks have also developed their own stress test frameworks that usually follow a similar approach. While largely an improvement over the models-based approach, supervisory stress testing has its share of blind spots.

As Bad as it Gets?

Stress scenarios often assume the GFC as a starting point. The GFC was obviously bad for a wide range of asset classes. But some asset classes fared much worse than others. Using the FORECAST function in Excel, I compared an upper bound delinquency forecast (99.99% confidence interval) to actual results during the GFC for certain asset classes. The forecast used data going back to the early 1990s. As shown below, actual credit card delinquencies during the crisis were roughly in line with the upper bound forecast. Bad but not catastrophic.

Now look at residential mortgage loans. Even using a 99.99% confidence interval, the upper bound forecast doesn’t even come close to actual delinquencies. Unfortunately, both banks and supervisors treat 2008 as a once-in-a-millennium event, across the board. The performance of mortgages certainly looks like a black swan, but for some other asset classes, the experience was merely bad.

Supervisory stress scenarios also tend to tie adverse events to broad, macroeconomic variables, when the reality is more complex. The huge increase in mortgage delinquencies reflected not just an economic downturn, but also a bursting housing bubble and a breakdown in underwriting standards. Securitization and derivatives made things worse.

Short Memories

While a true black swan should reflect an unprecedented event, some events merely look that way if you focus too much on recent data. Consider the jump in mortgage rates starting in 2022. Mortgage rate data are available on a weekly basis, so you can compile a lot of observations quickly. The graph below shows forecast vs. actual 30-year mortgage rates based on date from 2010 to 2021. Actual rates fall way outside the 99.99% confidence level. Rates peaked at 7.79%, roughly double the upper bound estimate.

But what happens if we go back further? Using the entire time series (going back to 1971) leads to a different result. As shown below, peak rates jump well above the upper bound estimate, but the upper bound eventually catches up.

Even projections based on long term data overstate the extent to which mortgage rate changes during 2022 and 2023 represented extreme tail events. Mortgage rates aren’t especially high by historical standards. Moreover, we saw considerably larger rate changes during the early 1980s. Mortgage rates rose by 410 basis points between November 2021 and November 2022. But they rose by 587 basis points between April 1979 and April 1980. The difference in 2-year rate changes is even larger. Sure, 1980 predates most of our careers, but it wasn’t 10,000 years ago.

Reality Gets in the Way

During Congressional testimony on the stress test scenarios, Bank Policy Institute (BPI) President Greg Baer observed that the Severely Adverse scenario for unemployment “has about a 50-50 chance of occurring once in 10,000 years.” Baer was so proud of this factoid that he used it three times in his prepared remarks. The once in ten-thousand-year odds really mean a 4 standard deviation event. For reference, mortgage delinquencies during the GFC represented a twenty standard deviation event.

Better still, let’s compare this “extraordinarily improbable” scenario with what really happened. The severely adverse scenario covered the period from Q1 2018 to Q1 2021. It assumed that the US unemployment rate would increase by 600 basis points over 18 months and peak at 10.0%. In fact, the unemployment rate rose by 1,040 basis points over one month (March to April 2020) and peaked at 14.8%.

Expect the Unexpected

Black swans happen. Individual black swans are, by definition, highly improbable. That some highly improbable stress to the financial system will happen in the foreseeable future is not. Over the past 25 years, we’ve had the 9/11 terrorist attacks, a global financial crisis, a global pandemic that killed millions worldwide, and the highest U.S. inflation in 40 years. While each of these events had warning signs, they were much more severe than prior experience would suggest.

One of the biggest weaknesses in the current stress testing approach is what I’ve called the plausibility trap. Focusing only on seemingly plausible stress scenarios can provide a false sense of security. Stress testing can identify a bank’s vulnerability to certain scenarios, but don’t assume it’s the worst case.

What Regulators Can Do

Changes to the stress testing regime over the past several years have magnified these weaknesses. Regulators reduced supervisory stress scenarios from two to one. The selected scenarios each year have focused exclusively on credit risk exposures, when interest rate risk played a much more important role in 2023’s large bank failures. Getting rid of company run stress tests meant less focus on bank-specific vulnerabilities, such as concentrations of uninsured deposits. Banks between $100 and $250 billion face supervisory stress tests only every other year. As a result, a bank like SVB could pass the $100 billion threshold in 2021 and not be subject to a stress test until 2024, more than a year after it failed.

Using multiple stress scenarios and restoring bank-specific stresses would help to make supervisory stress tests more meaningful. We should also add a healthy dose of humility that acknowledges the limitations of both internal and supervisory stress tests. Unfortunately, it looks like we’re headed in the opposite direction.

The BPI, ABA, and other trade associations filed a lawsuit in Federal Court challenging the supervisory stress tests, including both the scenarios and the supervisory models. Loper and other recent Supreme Court decisions make it more likely banks will receive a sympathetic ear from courts when challenging regulatory actions. While the lawsuit focused mainly on some arcane process issues, much of the public discussion revolved around results. The BPI claimed the current framework produces “capital charges that are inaccurate, volatile and excessive.” Using the Main Street pretense that characterized its astroturf campaign to kill the Basel Endgame, the trade groups claim the current approach leads to “reduced lending and economic growth.” The ABA Banking Journal gets closer to the truth when noting the stress tests’ impact on “capital distributions and discretionary bonus payments.”

The Federal Reserve responded to the threatened litigation by announcing plans to make changes to the stress tests. The press release provides little detail but suggests some changes under consideration. These changes include making supervisory stress models and scenarios open for public comment and averaging results over two years. The press release did not indicate the use of multiple stress scenarios but rather plans to retain the current “exploratory analysis,” which only announces aggregate results but not for individual banks.

The Fed claims ‘these proposed changes are not designed to materially affect overall capital requirements.” Note the focus on “materially” and “overall.” Stress tests apply to individual banks rather than the overall banking system. Comments on the stress scenarios, as they always do, will focus on “plausibility,” even though supposedly implausible events occur every several years. Moreover, averaging the results over two years may make capital requirements more predictable but likely less risk focused. Let’s return to our SVB example. Using a two-period average would mean that stress tests wouldn’t have affected SVB’s capital requirements until 2026.

And then there’s the fire and ice nature of black swans. Say a bank is exposed to excessive interest rate risk and the Fed decides to run a rising rate stress scenario. The results are bad. But wait, they get to average these results with an earlier year that focused on credit risk and assumed falling rates. Of course, some banks are exposed to both risks, but more often it’s a matter of picking your poison when deciding between credit and interest rate risk.

Both regulatory capital and stress testing approaches routinely underpredict the frequency and severity of extreme events. Unfortunately, public policy approaches appear headed in the direction to make underprediction more likely. It would be like appointing someone who underpredicted US COVID deaths by 97% as head of the National Institutes of Health. Oh wait.