Cook County Assessor Model & Valuation Data Release

Last Update: 2/13/2020

February 2020 Update
As we enter the second year of Fritz Kaegi's administration, the Data Science department has worked with the Cook County Bureau of Technology to put more context on the data we have already published. This will help users navigate and use our different open data assets. Below is a summary of our current open data assets, and some example use-cases of those assets.
Cook County Assessor's Residential Sales Data. This is a data set of residential sales from 2013-2019 in Cook County. Since this is a data set of sales, not properties, individual PIN numbers may appear twice in a given year. We have joined this sales data with characteristic data from the year of the sale. The same property selling in two different years may have different characteristics.
One use case for this data would be a residential property owner who wants to see nearby sales of properties like their home. Another use case would be a journalist who wants to see whether the CCAO's published ratio statistics are supported by the sales data.
Cook County Assessor's Residential Property Characteristics. This is a data set of all residential properties in Cook County and their characteristics. This data is unique by PIN and year, except for properties that have multiple buildings on them. In those cases, there will be multiple lines for a PIN in a given year. One use case for this data would be a residential property owner looking for a set of similar, nearby properties.

NEW: Cook County Assessor's Residential Assessments. This is a data set of all residential property assessments. The 'Assessment Roll' is produced over roughly a 12 month period of time. First, our statistical modeling pipeline produces initial estimates of home values. Then, analysts review these values and make corrections where appropriate, and the values are mailed to taxpayers. Taxpayers then have the opportunity to appeal, after which the Assessor certifies the Assessment Roll. Taxpayers may then appeal to the Board of Review, after which the Board certifies the roll, and tax bills are sent to taxpayers.

In this data, we have published not just the final, Board of Review certified assessments, but assessments in each major stage of the production process: modeling results, mailed values, appeals and appealed values, and Board of Review values. There are a couple of features of this data users should be aware of:
  • Our new modeling framework allows for the separate reporting of modeling and mailed values, but the legacy process does not. This is why the modeling values field is blank for years prior to 2019.
  • The Board of Review has not fully certified the 2019 assessment role, so that field will be blank for many PINs. We will update this data at some point after the Board certifies the full assessment role.
  • For the sake of reporting, we have counted appeals and re-reviews both as appeals, so the maximum number of appeals a property can have is 3.
This data can be joined on PIN against the sales data to facilitate sales ratio studies, or against the property characteristic data to facilitate describing the properties in given areas.
Experimental Single Family Sales Comparables. This is currently an experimental data set to test a number of potential use cases. The public should not use this data for any serious purposes.
January 2020 Update
We changed the names of two data sets. 'Modeling Data' is now 'Sales Data,' and 'Assessment Data' is now 'Property Characteristics.' This was done in anticipation of a number of use cases of the data. For example, taxpayers may want to search for sales of properties with similar characteristics in a particular area using the Residential Sales Data. Taxpayers may want to search for PINs with similar characteristics using the 'Residential Property Characteristics Data.'
August 2019 Update
We have expanded the number of fields in Model Data, and also increased its coverage. Model Data now includes sales and characteristics for the last five years for all of Cook County. Assessment Data only includes PINs in the North Triennial, but includes additional fields and more accurate characteristics. We posted a new data set of Single Family sales comps, which are described more deeply in a different narrative.
We have also made replication simpler. In order to minimize the extent to which you will have to alter our scripts to make them run with the data we have made available here, we recommend using version f07f2975 of Residential, Maine branch and version 4e7cbf6a of Utility, Maine branch.
To find the correct version of the Residential repository, search the history for the commit message ‘Merge branch '116-july-open-data-update' into 'master'. In utility, search thee history for ‘added hanover recodes’.
 If you are interested in working on a project with the Assessor’s Office, please see our contribution guide.

What is the purposes of this data?

One of the goals Assessor Kaegi set was the publication of residential Computer Assisted Mass Appraisal (CAMA) code and data. These data sets fulfill part of that goal.
This data, in conjunction with our published code, will allow any technically proficient member of the public tore-construct our first-pass residential valuation process. We have intentionally avoided using expensive software; our entire modeling process is done in R and RStudio, a free statistical program. This helps minimize the barriers between our internal process and scrutiny by journalists and academics.
Our modeling process draws data from a number of storage locations, which are not accessible to the public. In lieu of such access, we have replicated this data to publish through the County’s Open Data Portal. These data act as replacements for the queries in the CAMA code that cannot be used outside the office, allowing the code to function in any environment.
Complete and consistent data is central to successfully producing accurate, fair assessments. In order to give the public the clearest picture of the state of our data, we have re-created the data as it exists in our production databases.
At the time of publication, the current assessor held office for four months. As such, we offer two disclaimers:
·         First, in instances where this data conflicts with the taxpayer’s assessment notice, the taxpayer’s notice takes precedence.
·         Second, we are publishing data that may contain errors so this is provided as-is. For this first publication, we have attempted to document instances of ambiguity and inaccuracy.
One example of a corrupted field is OT_IMPR. There is a disconnect between the paper property inspection cards used by CCAO field staff and the data system that information is entered into. The field cards have three separate fields to record “Other Improvements.” Such improvements including pools, private tennis courts, yoga sheds, etc. The AS400, the system of record in the office, only has a single field in which to record other improvements. In instances where a property has multiple improvements, both were entered into this field without a delimiter. This has made it impossible to determine algorithmically whether a 12 is a 1 & 2, or a 12, rending this field mostly useless for modeling purposes.
Another example is the AGE field. In a more advanced data system, you might capture multiple age characteristics: age of original structure, age of interior, age of bathrooms, age of kitchen, etc. In the AS400 system, one of the CCAO’s legacy systems, we can only store a single field, AGE. This means that this field is mixed-use, sometimes capturing original structure age, someone capturing effective age, or age from most recent major renovation. While this field is predictive of property value, it is not well defined.
Other examples are documented in our data dictionary.

How should I use this data?

Our residential modeling code can be downloaded and modified to run on your local PC or Mac using the data published through the Open Data Portal. Whether you are an academic, a journalist, or a property tax professional, we hope that this portal and our code are useful to you. Please be aware that the CCAO’s code is published under a GNU Affero General Public Use License, and you should not use CCAO’s code in any manner that conflicts with this license.
The CCAO is currently working on a collaborator policy. When a final policy is published, the CCAO will welcome suggestions on code, modeling, and/or data improvements.

What shouldn't I try to do with this data?

Don’t use this data to look up basic information about your property – there is an easier and quicker way to do that. If you are a taxpayer looking for explanations about your property’s values, appeal status, or other questions pertaining to your assessment, please visit www.cookcountyassessor.com, or call (312) 443-7550 to speak with a taxpayer information specialist.

Where does this data fit in the assessment system?

When we think about predicting market values for residential properties, we must answer two basic questions: what data do we use to characterize values for each submarket area; and what is the universe of properties that we need to value? Table 1, Model Data is the answer to the first question, and Table 2, Assessment Data is the answer to the second. First Pass Values contains the results of the process at each step. These steps are outlined in our code.

What is Model Data?

Model Data contains every valid arm’s length transaction in a specified geographic area and time period.  We define valid arm’s length as a sale where the buyer and seller act independently and do not have any relationship to each other. We have included property characteristics at the time of sale, as well as location and property attributes for contextualization. Property characteristics include the number of rooms, bathrooms, size of garage, exterior construction of the property, and whether the home has a finished basement. Property Attributes include census tract, assessor neighborhood code, a geographically determined location factor, and street address. We use Model Data to estimate a wide range of predictive models that help us characterize home values in a given area. We then select the best performing models to use to value properties in Assessment Data.

What is Assessment Data?

Where Model Data is a data set of sales, Assessment Data is a data set of properties, even ones that have not sold in a long time. Because these properties still need to be valued, we use the best performing models from Model Data to estimate the market value for the properties contained within the Assessment Data table.

What are first pass values?

First Pass Values are the values upon which re-assessment notices are based. They are the product of our modeling process and post-modeling adjustments. In this data, we have provided each value at each step in the valuation process. Each post-modeling adjustment produces a new set of estimated values 2 through 7. These values, and the resulting ratios, are stored in Table 3, First Pass Values, which reports the estimated market values of properties at each stage in the process.
First Pass Values are not final assessments. After first-pass notices are mailed, the assessor finalizes assessments in township order, and sends those assessments to the Board of Review (BOR), and then to the Property Tax Appeal Board. Later, changes from things like Certificates of Error may also change assessments. 

I emailed the database owner over a week ago - why haven't they responded?

The CCAO has limited human resources. We really want to answer all of your questions, but the central mission of the office is, first and foremost, the production of assessments for taxpayers. We will respond to all questions in due course.

A note about replication

This data was published on April 16, 2019. Since then, we may have made changes to our valuation scripts available on GitLab. In order to minimize the extent to which you will have to alter our scripts to make them run with the data we have made available here, we recommend using version f07f2975 of Residential, Maine branch and version 4e7cbf6a of Utility, Maine branch.
To find the correct version of the Residential repository, search the history for the commit message ‘Merge branch '116-july-open-data-update' into 'master'. In utility, search thee history for ‘added hanover recodes’.
If you are interested in working on a project with the Assessor’s Office, please see our contribution guide.