Owner Unmasking v2.0

Cherre’s Owner Unmasking v2.0 is out! This release follows half a year of phenomenal teamwork by Cherre’s data scientists, engineers, and product managers. Owner Unmasking v2.0 improves significantly over its predecessor, which we released two years ago.

The Limitations of Owner Unmasking v1

As we described in one of our previous blog posts, true owners of properties are unmasked by applying a neighborhood traversal algorithm to Cherre’s Knowledge Graph: the knowledge graph neighborhood of the property node is traversed to search for a node corresponding to a real estate investor, and the investor is then inferred to be the true owner of the property. But how do we define a real estate investor? In a broad sense, any entity that owns real estate can be considered an “investor.” However, many properties belong to “masking entities” (LLC companies), which in turn belong to real estate investors. The complexity of the ownership structure is why we needed to solve the owner unmasking problem in the first place. But how do we distinguish between LLCs and true owners? Owner Unmasking v1 did not adequately address this question so it was the main focus when working on Owner Unmasking v2.0.

Big Garden

In order to clearly separate real estate investors and masking entities, we decided to build a comprehensive, curated dataset of large real estate investors in the United States, which we refer to as “Big Garden.” To construct Big Garden, we built a series of information extraction methods and applied them to Cherre’s aggregated and cleaned real estate data repository to create a list of candidate entities. We then filtered these candidates to only include those with a web presence, since we assume that if an entity does not have a web presence, then it is likely not a large player in the real estate space. As a post-processing step, we developed an entity resolution methodology to aggregate multiple name versions of the same owner into one canonical name.

Our main goal in the creation of Big Garden was to optimize for coverage: we aimed to put an owner name behind each meaningfully large portfolio of properties. We identified these portfolios using a community detection algorithm on Cherre’s Knowledge Graph. We estimated the size of a portfolio as the sum of total assessed values of its properties. Total assessed values are used in property tax calculation and are usually the lower bound of a property’s market value. Taking into account the fact that the actual market value of a portfolio might be substantially higher than its total assessed value, we decided to use a relatively low threshold to classify a portfolio as large, and considered all portfolios with over $10M in total assessed value. As a result, most owners of commercial portfolios (even fairly small ones) and of bundles of residential properties (from dozens all the way up to dozens of thousands) are all on our radar.

As of now, Big Garden consists of over 100,000 large portfolio owners in the US. To the best of our knowledge, Big Garden is the largest dataset of US real estate owners ever constructed. If you can think of a real estate owner, they are most likely in our dataset already. But we are obviously not magicians: if an owner takes particular precautions in order not to get exposed, we would probably miss them. We are actively working on expanding the Big Garden dataset further.

Owner Unmasking Pipeline

One of the drawbacks of Owner Unmasking v1 was that the unmasking process was not easy to execute, resulting in a low cadence of refreshing unmasked ownership data. We addressed this issue in Owner Unmasking v2.0 by constructing a fully automated pipeline for the Owner Unmasking process. This pipeline starts by constructing Cherre’s Knowledge Graph from numerous data sources, including Big Garden. It then applies the fully parallelized Owner Unmasking algorithm and processes the results to incorporate owner contact information. The final product is then published so that our clients can access the data via Cherre’s API. This brand-new pipeline can be executed at any cadence we wish; in practice, the cadence is conditioned on the frequency of the input data updates.

The Owner Unmasking pipeline is unique in its technological depth, incorporating pretty much the entire technology stack currently used at Cherre. One particular challenge we overcame was to seamlessly integrate the SQL-based data transformation functionality used for the knowledge graph creation and post-processing stages with the PySpark implementation of the Owner Unmasking algorithm. The success of this integration opens the door to the hassle-free productization of many other analytical tools which we will develop in the future.

Quality of Owner Unmasking v2.0

We measure the quality of owner unmasking results in terms of accuracy and coverage. To an extent, measuring coverage is quite straightforward: we simply need to count the number of properties whose ownership we manage to unmask and divide it by the overall number of properties monitored by Cherre. In reality, though, this evaluation metric would be an over-simplification, because the majority of real estate parcels in the US are single-family residences, an asset class where ownership is easy to unmask (in most cases, the owner resides at the property). Keeping this observation in mind, we developed a range of coverage measures for a variety of asset classes, geographies, and portfolio sizes. Using these measures, our owner unmasking results achieve over 90% coverage across the board.

Measuring accuracy of owner unmasking is extremely challenging because of the lack of ground truth data for unmasked owners. If unmasked ownership information had been available to serve as the ground truth for our system, we would not have needed to build an unmasking system in the first place! We prepared a small test set consisting of properties with known ownership. We then compared Owner Unmasking v1 and Owner Unmasking v2.0 on this test set, defining the accuracy as the number of properties whose ownership we managed to unmask correctly divided by the number of properties whose ownership we attempted to unmask. Using this accuracy measure, version 2.0 is about twice as accurate as version 1.0.

While Owner Unmasking v2.0 is a great improvement over Owner Unmasking v1, we have still not resolved a number of issues. One example is that our Owner Unmasking algorithm is tuned to consider the taxpayer of a property as its true owner, which is incorrect in the case of net leases. Another example is law offices that represent true owners; currently, our algorithm does not treat law offices as a special case, but rather takes them for the true owners. We are actively at work correcting these issues.

To conclude

Owner Unmasking is one of the most sought after features provided by Cherre, since it enables many important use cases in modeling, planning, forecasting, executing, and monitoring real estate investments. The strongest value proposition of Cherre’s Owner Unmasking is its high coverage, both in terms of the number of properties and the number of owners. This unprecedented coverage allows our customers to focus on any asset class and any locality, starting from the state level, down to the county, city/town, and submarket/neighborhood level, and even further down to a small radius around a property of interest. At each resolution, we are able to provide information on most prominent owners and their portfolios.

Dr. Ron Bekkerman is the CTO of Cherre, overseeing Cherre’s emerging technologies. From 2013 to 2018, Ron was Assistant Professor and Director of the Big Data Science Lab at the University of Haifa, Israel.