Impact of Demographics on Historical Housing Prices

Using explainable machine learning techniques, the insights team at Cherre has discovered major demographic drivers for housing prices in major metropolitan areas.


In residential real estate investing, we often look at broad economic drivers in a region, like population growth, wage growth, housing supply and demand, employment industry mix, as well as growth constraints like regulations, policy, topography, and land scarcity. Quality of life metrics like traffic, schools, climate, healthcare, and non-residential asset classes like retail and groceries can also factor into the equation. This analysis is critical to understanding the shape of some markets and their prospects.

More recently, firms and investors, with greater access to data, have turned to models to better understand markets and submarkets. Academic and industry models we have seen tend to be linear regressions or simple machine learning models, which are fairly good at anticipating upper and lower bounds for pricing and movement in a market, making market entry and decisions easier. These dynamics are reflected in a dramatic rise in institutional investment in single family housing.

As more institutional money moves into housing, there is a growing appetite to diversify market economic modeling and model building, with respect to model features, market conditions (bullish or bearish), and longer time horizons.

The world’s leading real estate data connection platform, Cherre has put together our Data Kit for Single Family Residences (SFR) which allows us and our clients to dramatically accelerate research and model building in the SFR space. As part of that effort, we are experimenting with explainable modeling techniques which allow us to understand the influence of bespoke economic changes in markets over time. For this, we used Shapley values to dissect which economic drivers are associated with market changes over time. While our models do not assume causality, the work gives us a better peek at several markets and gain some interesting insights into what makes them tick.

We looked at Atlanta, Boston, Chicago, and New York. Some of our surprising takeaways are below.


  • Out of the four metros, Atlanta had the biggest squeeze on construction labor, seeing housing demand oustrip construction labor and driving labor wages up.
  • The growth in middle age population, especially early-mid-career (ages 30-34) and mid-career (ages 40-44), was the biggest population contributor to housing value growth.
  • Wage growth in technology, logistics, video and film (entertainment), and finance have equally contributed while the region has seen increases in pricing due  to pressure on blue collar segments.


  • Wage growth in education and the scientific field drove a large portion of home price increases, in line with Boston’s status as a scientific and academic meca.
  • Growth in mid-career populations (ages 35-39) and retirees (ages 75-84) contributed to housing value growth more than early- and late-career groups.


  • Mid- to late-career earners (ages 35-44 and 55-59) drove most of the growth of property prices in the Chicago area in the 1990’s and early 2000’s. The region had a harder time recovering from the Great Recession in 2008, which resulted in slower home starts and a decline in these earners in the 2010’s.1
  • Surprisingly, families with young children (ages 0-9) inversely impacted home prices –  historic home prices tend to be higher when the population of young children declined.
  • Given that our Shapley values are fairly low for the factors we explored, we suspect that there are additional demographic and economic factors that impact home prices in Chicago (see additional factors in the Future Research section).


  • Unsurprisingly, early- and mid-career cohorts (ages 15-24 and 30-39) drove increases in housing prices as young workers moved to New York to start or advance their careers. On the other side of the working population, near-retirees (ages 60-64) also drove increases in housing prices as they retire to the suburbs outside the city.
  • Wages for scientific and technical workers affected single family home prices disproportionate to other industries, contributing to New York’s moniker of Silicon Alley

Data & Methodology

Data was obtained from 1990 – 2020 from the following sources:

  • Oxford Economics historical data nominal wages and population age counts
  • County tax assessor and recorder from Cherre’s Data Kit for Single Family Residential (SFR)

Cherre’s Data Kit for SFR is the combination of many of Cherre’s foundation layer data. To derive the data in the Data Kit for SFR, we combined county tax assessor, recorder, owner unmasking, and several data boundaries with selective filtering to get on-market transaction numbers for each property in the foundation layer. For example, in recorder, we filtered out non-market transactions from recorder 2 and single family properties3 from tax assessor. From this combined master residential table, we derived time series statistics for metropolitan statistical areas (MSAs). This provides monthly and quarterly MSA-level statistics for median and mean transactions per square foot from 1990 to 2020.

Oxford Economics data is provided on an annual basis. To derive quarterly and monthly values, we calculated linear interpolations between each discrete annual value. To join Oxford Economics data to public market transactions, we joined using fuzzy matching MSA names for each year-quarter and year-month of data. There are slight variations in MSA definitions4 between the two datasets, but can be assumed to be the same in this circumstance. 

Shapley values is a common post-hoc explainable machine learning technique to observe the marginal contribution of each feature to the overall model. Using Shapley values and a decision tree regressor5 , we compared various Oxford Economics features against each other to see which metrics contributed the highest to changes in single family detached home prices in the past 40 years for four MSAs with varying economic profiles.

Future Research

As we build out Cherre’s Residential Data Kit with more Oxford Economics data, we will continue to observe the highest contributing factors to home prices in these MSAs. Other features we plan to incorporate include:

  • Historical employment;
  • Real incomes and disposable incomes; 
  • Housing permits; and
  • Local real GDP numbers

These additional metrics will provide more comprehensive economic analysis of each region, and will thus be more informative about historic trends.

Alyce Ge is an ML engineer at Cherre.