Experimental Single Family Sales Comps Data

Introduction

“The Cook County Assessor just re-assessed my property value. I know that they produce my home’s property value based on the sales values of other similar, or comparable, homes. What comparable sales did the Cook County Assessor use to value my property? What comparable sales should I look at to know whether my home was assessed fairly? How can I find comparable sales so that I can appeal my assessments? How do I know whether the comparable sales I provide on appeal will be considered valid by the Assessor?”
These are questions our office gets asked on a regular basis, and they boil down to one hard question. What makes two homes comparable? This is difficult to answer easily because a) comparable sales are technically very difficult to define and communicate, b) other offices may have their own definitions of what constitutes a comparable sale, and c) there are already many different tools and systems that the public and CCAO employees can use to find comparable sales.
In order to move towards a more clear and cohesive approach to using comparable sales to value property, the Cook County Assessor’s Office is building a tool to identify comparable sales so that the public has access to the same sales sets that CCAO analysts use to value their property and/or make decisions on appeal. This tool would automatically identify the set of comparable sales for each property in the county based on a fixed set of algorithms and rules. This tool would be yet another way that we do our best to institute uniformity and to reduce bias in property values – during both the initial assessment phase and the appeal process.
The first step in building such a tool, however, is to define what constitutes a comparable sale. We tested many different methods detailed below. We picked the best combination of methods (see below) and used it to create this data set: a list of comparable properties for some sample PINs. The goal in publishing this experimental data set is feedback: we’d like to know if people think these properties are comparable, and if these match the intuitions of people with professional experience (like those in real estate) in finding comparable home sales.

How should I use this data?

First, understand that this data was not directly used to value any property in Cook County. In the future, however, a similar data set may be used for valuation purposes. We would like to gather feedback from industry professionals about the accuracy and usability of this data. In order to find comparable sales for a given property, you can filter this data on either “Subject PIN” or “Subject Address,” and the resulting rows will show you comparable sales for the property in question.

What is a comparable sale?

Your subject property is the property you want to value, the one you are finding comparable sales for. A comparable sale is a recent sale of a property in a location similar to the subject property’s location, with characteristics similar to those of the subject property. But what defines recent, similar location, and similar characteristic? This is a difficult question, particular when the definition needs to be interpretable by a computer rather than a human analyst. Even among human appraisers, however, there may be disagreement about whether a given property is comparable to the subject property. In such cases, how would the CCAO consistently decide between two sets of comparable sales?  

Choosing between comparable sale identification algorithms

Suppose you have two methods of finding comparable sales, each supplying a different set of sales comps for a set of subject properties. How do you know which comp set is ‘better’? Bear in mind that we are trying to decide between methods of finding comparable sales for roughly 800 thousand residential properties. It is not feasible to manually inspect each set of comparable sales for each subject property and determine which is more accurate.
To choose between different sets of comparable sales, we take the average of each property’s set of comparable sales as a "prediction" of the target properties values. We than scored these predictions according to criteria:
  • Coverage: Does the method produce a sufficient number of comparable sales for each subject property?
  • Uniformity: Does the method produce similar predictions for subject properties that have similar sale prices?
  • Vertical Equity: Are predictions un-biased, so that it performs the same way regardless of sale prices?
This approach allows us to develop a number of different approaches to finding comparable sales, apply them to a random set of properties, and then rank them in terms of predictive accuracy. It also allows developers the opportunity to propose alternative comparison finding tools and compare them objectively and directly to our current comparable set.
In developing our experimental data set of comparable sales, we deployed the highest ranked method first. If that method failed to find comparable sales for a set of properties, we then used the second best method to fill out the rest of the data. We completed the comparable set with two methods.

Finding comparable sales – a two-step approach with multiple methods

Our most successful methods relied on a two-step approach to finding comparable sales. The first step was to define the geographic areas where we looked for comparable properties. The second was to define what characteristics determined whether an individual property was similar enough to the target property to be considered ‘comparable.’ We tested 17 different methods that combined different approaches to each step, and found that two best methods that produced comparable sales for the entire set of subject properties.
Method 1
Our most successful algorithm looked for comparable sales within the subject property’s neighborhood code. We then ran a gradient boosting regression on sale price using the universe of predictive variables to identify which variables were valuable in predicting properties sale prices. These turned out to be limited: rooms, bedrooms, building square feet, and improvement age. We counted a property as comparable if, along each dimension, the comparable sale was +/- 20% on each of the four dimensions.
Method 2:
One we exhausted the comparable sales identified by method 1, we used method 2. For method 2, we used the same characteristics as method 1, but we expanded the search range to any neighborhood with median assessed value similar to the subject PIN. That is, we looked for comparable sales in neighborhoods that were similar in value to the subject PIN.

Adjusting sale prices for time
Older sales typically have lower prices, since housing prices trend upwards over time. To adjust for this, we used a hedonic housing price index developed by the Institute for Housing Studies at DePaul University. By adjusting prices forward in time, we can use a longer timeframe for comparable sales, which increases the predictive power of our comparable set.
This property looks nothing like my home!
Users will quickly notice two things about this set of comparable sales. First, properties selected as a comparable property may not ‘look like’ the subject property. In particular, we did not include stories, exterior wall construction or land area in the criteria used to determine whether a sale was comparable to a subject PIN. This was done because these features add very little predictive power to the comparable sales model. That is, knowing these things about a building does little to help you predict value.
Second, some properties have comparable sales many miles away. As discussed above, method 2 is insensitive to the physical distance between subject and comparable property. This may strike some as odd – how can a property 15 miles away be considered comparable? If you consider, however, that housing markets do not necessarily follow geographic lines, but instead cluster around similar local amenities, you can imagine that a sale in Barrington might be considered comparable to a subject PIN in Arlington Heights, or that a sale in the Uptown neighborhood in Chicago could be comparable to a subject PIN in the Bronzeville neighborhood in Chicago.
These features are part of the experimental nature of this data set, and we’d love to hear your feedback on them.
Send us your thoughts
If you’d like to give us your feedback, send us a message: communications@cookcountyassessor.com or DataScience@cookcountyassessor.com.