Cook County Assessor Model & Valuation Data Release
Last Update: 2/13/2020
February 2020 Update
As we enter the second year of Fritz Kaegi's administration, the Data Science department has worked with the Cook County Bureau of Technology to put more context on the data we have already published. This will help users navigate and use our different open data assets. Below is a summary of our current open data assets, and some example use-cases of those assets.
Cook County Assessor's Residential Sales Data. This is a data set of residential sales from 2013-2019 in Cook County. Since this is a data set of sales, not properties, individual PIN numbers may appear twice in a given year. We have joined this sales data with characteristic data from the year of the sale. The same property selling in two different years may have different characteristics.
One use case for this data would be a residential property owner who wants to see nearby sales of properties like their home. Another use case would be a journalist who wants to see whether the CCAO's published ratio statistics are supported by the sales data.
Cook County Assessor's Residential Property Characteristics. This is a data set of all residential properties in Cook County and their characteristics. This data is unique by PIN and year, except for properties that have multiple buildings on them. In those cases, there will be multiple lines for a PIN in a given year. One use case for this data would be a residential property owner looking for a set of similar, nearby properties.
NEW: Cook County Assessor's Residential Assessments. This is a data set of all residential property assessments. The 'Assessment Roll' is produced over roughly a 12 month period of time. First, our statistical modeling pipeline produces initial estimates of home values. Then, analysts review these values and make corrections where appropriate, and the values are mailed to taxpayers. Taxpayers then have the opportunity to appeal, after which the Assessor certifies the Assessment Roll. Taxpayers may then appeal to the Board of Review, after which the Board certifies the roll, and tax bills are sent to taxpayers.
In this data, we have published not just the final, Board of Review certified assessments, but assessments in each major stage of the production process: modeling results, mailed values, appeals and appealed values, and Board of Review values. There are a couple of features of this data users should be aware of:
- Our new modeling framework allows for the separate reporting of modeling and mailed values, but the legacy process does not. This is why the modeling values field is blank for years prior to 2019.
- The Board of Review has not fully certified the 2019 assessment role, so that field will be blank for many PINs. We will update this data at some point after the Board certifies the full assessment role.
- For the sake of reporting, we have counted appeals and re-reviews both as appeals, so the maximum number of appeals a property can have is 3.
This data can be joined on PIN against the sales data to facilitate sales ratio studies, or against the property characteristic data to facilitate describing the properties in given areas.
Experimental Single Family Sales Comparables. This is currently an experimental data set to test a number of potential use cases. The public should not use this data for any serious purposes.
January 2020 Update
We changed the names of two data sets. 'Modeling Data' is now 'Sales Data,' and 'Assessment Data' is now 'Property Characteristics.' This was done in anticipation of a number of use cases of the data. For example, taxpayers may want to search for sales of properties with similar characteristics in a particular area using the Residential Sales Data. Taxpayers may want to search for PINs with similar characteristics using the 'Residential Property Characteristics Data.'
August
2019 Update
We have expanded the number of fields in Model Data, and also
increased its coverage. Model Data now includes sales and characteristics for
the last five years for all of Cook County. Assessment Data only includes PINs
in the North Triennial, but includes additional fields and more accurate
characteristics. We posted a new data set of Single Family sales comps, which
are described more deeply in a different narrative.
We have also made replication simpler. In order to minimize the
extent to which you will have to alter our scripts to make them run with the
data we have made available here, we recommend using version f07f2975 of Residential, Maine
branch and version 4e7cbf6a of Utility, Maine
branch.
To find the correct version of the Residential repository,
search the history for the commit message ‘Merge
branch '116-july-open-data-update' into 'master'. In utility, search thee
history for ‘added
hanover recodes’.
If you are interested in working on a project with the
Assessor’s Office, please see our contribution
guide.
What is the purposes of this data?
One of the goals Assessor Kaegi set was the publication of
residential Computer Assisted Mass Appraisal (CAMA) code and data. These data
sets fulfill part of that goal.
This data, in conjunction with our published code, will
allow any technically proficient member of the public tore-construct our
first-pass residential valuation process. We have intentionally avoided using
expensive software; our entire modeling process is done in R and RStudio, a free statistical program.
This helps minimize the barriers between our internal process and scrutiny by
journalists and academics.
Our modeling process draws data from a number of storage
locations, which are not accessible to the public. In lieu of such access, we
have replicated this data to publish through the County’s Open Data Portal.
These data act as replacements for the queries in the CAMA code that cannot be used
outside the office, allowing the code to function in any environment.
Complete and consistent data is central to successfully
producing accurate, fair assessments. In order to give the public the clearest
picture of the state of our data, we have re-created the data as it exists in
our production databases.
At the time of publication, the current assessor held office for
four months. As such, we offer two disclaimers:
·
First, in instances where this data conflicts with the
taxpayer’s assessment notice, the taxpayer’s notice takes precedence.
·
Second, we are publishing data that may contain errors so this
is provided as-is. For this first publication, we have attempted to document
instances of ambiguity and inaccuracy.
One example of a corrupted field is OT_IMPR. There is a
disconnect between the paper property inspection cards used by CCAO field staff
and the data system that information is entered into. The field cards have
three separate fields to record “Other Improvements.” Such improvements
including pools, private tennis courts, yoga sheds, etc. The AS400, the system
of record in the office, only has a single field in which to record other
improvements. In instances where a property has multiple improvements, both
were entered into this field without a
delimiter. This has made it impossible to determine algorithmically whether
a 12 is a 1 & 2, or a 12, rending this field mostly useless for modeling
purposes.
Another example is the AGE field. In a more advanced data
system, you might capture multiple age characteristics: age of original
structure, age of interior, age of bathrooms, age of kitchen, etc. In the AS400
system, one of the CCAO’s legacy systems, we can only store a single field,
AGE. This means that this field is mixed-use, sometimes capturing original
structure age, someone capturing effective
age, or age from most recent major renovation. While this field is
predictive of property value, it is not well defined.
Other examples are documented in our data dictionary.
How should I use this data?
Our residential modeling code can be downloaded and modified to
run on your local PC or Mac using the data published through the Open Data
Portal. Whether you are an academic, a journalist, or a property tax
professional, we hope that this portal and our code are useful to you. Please
be aware that the CCAO’s code is published under a GNU Affero General
Public Use License, and you should not use CCAO’s code in any manner that
conflicts with this license.
The CCAO is currently
working on a collaborator policy. When a final policy is published, the CCAO
will welcome suggestions on code, modeling, and/or data improvements.
What shouldn't I try to do with this data?
Don’t use this data to look up basic information about your
property – there is an easier and quicker way to do that. If you are a taxpayer
looking for explanations about your property’s values, appeal status, or other
questions pertaining to your assessment, please visit www.cookcountyassessor.com, or
call (312) 443-7550 to
speak with a taxpayer information specialist.
Where does this data fit in the assessment system?
When we think about predicting market values for residential
properties, we must answer two basic questions: what data do we use to
characterize values for each submarket area; and what is the universe of
properties that we need to value? Table 1, Model
Data is the answer to the first question, and Table 2, Assessment Data is the answer to the second. First Pass Values contains the results of the process at each step.
These steps are outlined in our code.
What is Model Data?
Model
Data contains every valid arm’s length transaction in a specified geographic
area and time period. We define valid
arm’s length as a sale where the buyer and seller act independently and do not
have any relationship to each other. We have included property characteristics at
the time of sale, as well as location and property attributes for
contextualization. Property characteristics include the number of rooms,
bathrooms, size of garage, exterior construction of the property, and whether
the home has a finished basement. Property Attributes include census tract,
assessor neighborhood code, a geographically determined location factor, and
street address. We use Model Data to
estimate a wide range of predictive models that help us characterize home
values in a given area. We then select the best performing models to use to
value properties in Assessment Data.
What is Assessment Data?
Where Model Data is a
data set of sales, Assessment Data
is a data set of properties, even ones that have not sold in a long
time. Because these properties still need to be valued, we use the best
performing models from Model Data to estimate the market value for the
properties contained within the Assessment
Data table.
What are first pass values?
First Pass Values are the values upon which re-assessment
notices are based. They are the product of our modeling process and
post-modeling adjustments. In this data, we have provided each value at each
step in the valuation process. Each post-modeling adjustment produces a new set
of estimated values 2 through 7. These values, and the resulting ratios, are
stored in Table 3, First Pass Values,
which reports the estimated market values of properties at each stage in the
process.
First Pass Values are not final assessments. After first-pass
notices are mailed, the assessor finalizes assessments in township order, and
sends those assessments to the Board of Review (BOR), and then to the Property
Tax Appeal Board. Later, changes from things like Certificates of Error may
also change assessments.
I emailed the database owner over a week ago - why haven't they responded?
The CCAO has limited human resources. We really want to answer
all of your questions, but the central mission of the office is, first and
foremost, the production of assessments for taxpayers. We will respond to all
questions in due course.
A note about replication
This data was published on April 16, 2019. Since then, we may
have made changes to our valuation scripts available
on GitLab. In order to minimize the extent to which you will have to alter
our scripts to make them run with the data we have made available here, we
recommend using version f07f2975 of Residential, Maine branch and version
4e7cbf6a of Utility, Maine branch.
To find the correct version of the Residential repository,
search the history for the commit message ‘Merge
branch '116-july-open-data-update' into 'master'. In utility, search thee
history for ‘added
hanover recodes’.
If you are interested in working on a project with the
Assessor’s Office, please see our contribution
guide.