Using explainable machine learning techniques, the insights team at Cherre has discovered major demographic drivers for housing prices in major metropolitan areas.
In residential real estate investing, we often look at broad economic drivers in a region, like population growth, wage growth, housing supply and demand, employment industry mix, as well as growth constraints like regulations, policy, topography, and land scarcity. Quality of life metrics like traffic, schools, climate, healthcare, and non-residential asset classes like retail and groceries can also factor into the equation. This analysis is critical to understanding the shape of some markets and their prospects.
More recently, firms and investors, with greater access to data, have turned to models to better understand markets and submarkets. Academic and industry models we have seen tend to be linear regressions or simple machine learning models, which are fairly good at anticipating upper and lower bounds for pricing and movement in a market, making market entry and decisions easier. These dynamics are reflected in a dramatic rise in institutional investment in single family housing.
As more institutional money moves into housing, there is a growing appetite to diversify market economic modeling and model building, with respect to model features, market conditions (bullish or bearish), and longer time horizons.
The world’s leading real estate data connection platform, Cherre has put together our Data Kit for Single Family Residences (SFR) which allows us and our clients to dramatically accelerate research and model building in the SFR space. As part of that effort, we are experimenting with explainable modeling techniques which allow us to understand the influence of bespoke economic changes in markets over time. For this, we used Shapley values to dissect which economic drivers are associated with market changes over time. While our models do not assume causality, the work gives us a better peek at several markets and gain some interesting insights into what makes them tick.
We looked at Atlanta, Boston, Chicago, and New York. Some of our surprising takeaways are below.
Data was obtained from 1990 – 2020 from the following sources:
Cherre’s Data Kit for SFR is the combination of many of Cherre’s foundation layer data. To derive the data in the Data Kit for SFR, we combined county tax assessor, recorder, owner unmasking, and several data boundaries with selective filtering to get on-market transaction numbers for each property in the foundation layer. For example, in recorder, we filtered out non-market transactions from recorder 2 and single family properties3 from tax assessor. From this combined master residential table, we derived time series statistics for metropolitan statistical areas (MSAs). This provides monthly and quarterly MSA-level statistics for median and mean transactions per square foot from 1990 to 2020.
Oxford Economics data is provided on an annual basis. To derive quarterly and monthly values, we calculated linear interpolations between each discrete annual value. To join Oxford Economics data to public market transactions, we joined using fuzzy matching MSA names for each year-quarter and year-month of data. There are slight variations in MSA definitions4 between the two datasets, but can be assumed to be the same in this circumstance.
Shapley values is a common post-hoc explainable machine learning technique to observe the marginal contribution of each feature to the overall model. Using Shapley values and a decision tree regressor5 , we compared various Oxford Economics features against each other to see which metrics contributed the highest to changes in single family detached home prices in the past 40 years for four MSAs with varying economic profiles.
As we build out Cherre’s Residential Data Kit with more Oxford Economics data, we will continue to observe the highest contributing factors to home prices in these MSAs. Other features we plan to incorporate include:
These additional metrics will provide more comprehensive economic analysis of each region, and will thus be more informative about historic trends.
Alyce Ge is an ML engineer at Cherre.