Models play an essential role in financial decision making. Financial institutions and their regulators use models to forecast, identify exposures, and quantify risks. Whether you’re a Treasurer, a risk manager, or a regulator, it’s important to understand how models work and some of their pitfalls. The idea isn’t to turn people into modeling experts, but you should at least be an educated consumer.
Reactions to models can vary considerably. Some dismiss them as mere black boxes and prefer a more qualitative approach. Others react like the Kevin Spacey character in Margin Call (at 3:04): “Oh Jesus, you know I can’t @&$#! read these things. Just speak to me in English.” On the other extreme, some can become so enamored with models as to neglect their limitations.
Much of the regulatory guidance on model risk management comes from OCC 2011-12 (SR-11-7). The Bank Policy Institute (BPI), the lobbying arm for large banks, has essentially declared war on this guidance. The BPI’s post includes its usual greatest hits, describing the guidance as a check the box exercise and citing its length and that of its accompanying section in the Comptroller’s Handbook. With respect to AI, BPI maintains that requirements for “extensive documentation and comprehensive testing from model owners and model validators that could cover bias and explainability, which can delay the time to release into the production and the cost.” In other words, large banks are upset that they need to demonstrate that models work before putting them into production.
Incoming regulators have repeated many of BPI’s talking points, practically word for word. I have yet to see regulators make public statements about model risk management. However, news reports do indicate that OCC’s team of modeling experts has experienced especially heavy staffing cuts. A lack of resources and expertise will make it more difficult to assess model risk, whether or not the guidance stays in place. But a change in emphasis won’t make modeling risk go away.
Here are some tips when it comes to looking at models.
It’s not just about the math. Risk managers, validators, and regulators tend to equate a model’s mathematical complexity to its overall complexity and risk. A good example comes from the 2000s when regulators started to evaluate internal risk models for capital purposes under Basel II. For, say, a residential mortgage loans, there were two main components: probability of default (PD) and loss given default (LGD). The formula is shown below:

The PD formula isn’t super complex as models go but it’s certainly not for the faint of heart. The formula doesn’t merely estimate the probability of default, but the PD over one year, at the 99.9th percentile, assuming a 15% correlation with the overall economy. In contrast, LGD involves little more than elementary school arithmetic. Banks’ PD estimates received a great deal of scrutiny. LGD, on the other hand, attracted little interest. Paradoxically, assumed LGD rates tended to move the needle considerably more than the assumed PD.
Modeling nonmaturity deposits provides another case in point. As noted in an earlier post, NMDs present significant modeling challenges. There is a range of modeling approaches for these deposits, but most aren’t statistically complex. A bunch of other factors make NMD modeling complex.
Calibrate to the market, if possible. Accounting employs a hierarchy when assigning values to assets and liabilities. Mark-to-market compares carrying values to observed market prices. In effect, other market participants are validating (or falsifying) your assumptions. Mark to model derives its value primarily from the model itself and can become untethered to actual market conditions. That’s why some refer to mark-to-model approaches as mark to make believe. A lack of market liquidity can make calibration to the market virtually impossible. In those cases, it’s best to at least recognize the higher model risk.
A good case in point involves mortgage servicing rights (MSRs). In a typical MSR transaction, a bank originates a loan, sells it on the secondary market but continues to service the loan for a fee. This transaction creates a mortgage servicing “asset” equal to the present value of the servicing cash flows. The key point to remember is that the bank didn’t buy the MSR but derived its value based on assumptions related to prepayments, costs, and required rates of return. MSR purchases and sales occur on occasion, but they tend to be few and far between. In the meantime, there’s only limited market data available to tie that asset’s value to reality.
Think the unthinkable. Risk modeling, especially for stress scenarios, can fall into a plausibility trap. Black swans occur more often and at a greater severity than many expect. In early 2020, a major bank started to incorporate COVID scenarios into its stress testing framework. While some lauded the bank for its forward-looking approach, the scenario itself proved far too optimistic. The stress scenario assumed some short-term disruption in Asian markets and that was about it. The scenario significantly underestimated the worldwide impact, which can lead to a false sense of security.
The problem can go beyond scenario selection. Models themselves can break down under more extreme conditions. Regulators tend to overuse the term “robust” as a synonym for “good,” but robustness is an important feature of a model. Robustness looks at whether the model holds up with more extreme values or after introducing outliers. Test for a model’s robustness and adjust as necessary.
Have a long memory. By their nature, black swans can be difficult to predict. But extreme changes aren’t necessarily unprecedented if you look back far enough. The spike in mortgage rates in the early 2020s looked extreme compared to the past ten or even twenty years. Those changes look much more modest compared to the early 1980s. When it comes to credit risk, banks and their supervisors often treat 2008 as a once in a millennium event, even though it happened just 17 years ago. And some, including the current Comptroller of the Currency, seem to want us to forget about 2008 as well.
Consider inconvenient facts. Risk managers shouldn’t get too attached to their models or the model’s underlying theory. In 2006, Professor Dean Keith Simonton developed estimates of the IQs of U.S. presidents from Washington to George W. Bush. These estimates primarily relied on biographical descriptions and were validated through, among other things, previous IQ estimates. Simonton estimated John F. Kennedy’s IQ to be 159.8. [1] The trouble is that Kennedy took an actual IQ test in prep school and scored only 119. A score of 119 would place Kennedy in the 90th percentile while a score of 159.8 would place him in the 99.995th percentile, or about one in 20,000. One score on one test is hardly conclusive, but it does seem to shoot a hole in the model.
MSR valuation provides another example. MSR sales are rare, but they do occur. A large difference between the market price in an actual transaction and the mark to model valuation should raise questions about the valuation. One transaction may not be conclusive, but a reassessment of the modelling methodology seems to be in order.
Consider the model’s purpose when evaluating performance metrics. A key element of model risk management is to look at the model’s predictive accuracy. This process is sometimes known as outcomes analysis or back-testing. Are the model’s forecasts accurate? Outcomes analysis can be less straightforward than you might think. Are you looking for precision or mainly to avoid major misses? This can become especially tricky when looking at stress testing models. Stress events are, practically by definition, unlikely. The nature of many financial risks makes it difficult to merely extrapolate from small changes to large changes. For example, a prepayment model may be accurate for a 10-basis point change in interest rates but fall apart if rates rise by, say 200 basis points.
Outcomes analysis can still be useful. Major misses early on, in direction as well as magnitude, can present an obvious red flag. However, it may make sense to adjust performance thresholds for the situation. You want models to be accurate, neither understating nor overstating risk. But some errors are more important than others. For example, depositors react slowly, if at all, to small rate changes. Their behavior changes when rates move up more sharply and the opportunity cost of holding a low-rate deposit becomes too large. In this case, you may be willing to tolerate model misses for a small rate change if the model promises to capture exposure to larger rate changes.
Don’t be intimidated by models. Model developers and validators usually have strong quantitative backgrounds and may consider mathematical equations a helpful shorthand. Those equations can prove intimidating to the rest of us. While a risk managers, executives, or regulators may not appreciate every nuance of a model, there are ways to make it less of a black box. That process boils down to asking the right questions. Model validations usually subject models to specific statistical tests. Understand what those tests mean and how failing those tests might make the model’s output less reliable. Break the model into its component parts. Also try to convert the abstract to the concrete. In other words, what happens when I plug in actual numbers? That makes it easier to judge what the model is doing. You don’t need to be a rocket scientist to make informed assessments of models, just an inquisitive mind.
[1] Kennedy’s assigned IQ of 159.8 is the figure most cited in the popular press, but Simonton’s journal article provides a range between 138.9 and 159.8.