top of page

The Art and Science of Selecting the Right Data

In assessing community well-being, it’s important to look at the most meaningful data indicators, and the most recent and granular data that are disaggregated for every possible demographic group. This is easier said than done! Deciding on the best data for your use case can be daunting; that’s where IP3 comes in. With over a decade of experience working with communities and community indicators, IP3 assists clients in selecting data to power community change work.



Selecting the Right Data

Clients come to IP3 for support with CHNA and community assessment for many reasons: prioritizing investment and action in geographic areas impacted by a disaster, inspiring collective action among collaboratives, advancing policy advocacy, and more. Projects differ with regard to objectives and context. Recommended indicators are not one-size-fits-all—selecting the right data to fit the project needs is part science, part art. Below, we explore considerations for selecting community data indicators to inform your work. 


The science of it—the data science wink wink—has us evaluate the proposed indicators across several dimensions, including data methodology and recency, frequency of update, geographic coverage, and granularity. Assessing the technical elements helps identify the strongest candidate indicators with regard to data quality and availability.


The art of it makes space for context, interpretation, and learning. We consider: lived experience and preferences of stakeholders; project elements and decision-making; alignment to community, regional, and/or state goals, standards, or initiatives; political and other external factors that impact data or data source  fidelity; and the experiences and lessons learned from other peer communities or efforts. All this information is invaluable when selecting indicators to inform and support a project, alongside the technical elements. 


Explore An Example: Vacant Housing

IP3 recently helped a client decide on the best measure of housing vacancy for their project, which aims to advance partnership and neighborhood improvement in lower-resourced urban neighborhoods in a major U.S. city. Vacant housing is defined as the percentage of residential addresses that are vacant and is associated with neighborhood disinvestment, violence and crime, and other deleterious health and well-being outcomes. There are two good potential sources of vacant housing data: administrative United States Postal Service data of vacant residential addresses, and American Community Survey data. We evaluated both datasets (see below); while both are recent, granular, complete, and from trusted sources, the client opted for the USPS data. For their community partners, the direct observation data had more fidelity because it was “ground-truthed,” meaning that the dataset is created from real observations of vacancies (instead of modeled survey data).


Assessing Prospective Indicators for Vacant Housing


Evaluation Questions

Data definition

What are the defining criteria of the measure?

Vacant housing: Percentage of residential addresses identified by the USPS as having been "vacant" or "no-stat"

Estimated percentage of housing units that are vacant, as defined as the housing unit is vacant if no one is living in it at the time of the interview

Methods

How are the data collected and analyzed?

USPS provides aggregate vacancy and no-stat counts of residential and business addresses that are collected by postal workers (and submitted to HUD on a quarterly basis)

ACS is an ongoing survey conducted by the United States Census Bureau, which collects detailed demographic, social, economic, and housing information from a sample of households in the United States annually

Fidelity of source

How credible is the source and their data? Are their methods good? Do they have meet quality standards?

The data are good enough for HUD’s purposes!

Highly credible source for demographic, social, economic, and housing information

Recency

When were the data collected or published?

Data are available up to December 2023

Data are available up to 2022

Frequency of update

How often are new data published?

Data are released quarterly

Data are released annually

Time series

How many years of data are available?

Data go back to 2010

Data go back to 2010

Granularity

What is the smallest usable geography available from the dataset?

Census tract

Census tract

Coverage

What is the geographic coverage of the data? Are there any gaps in geographic availability?

Entire U.S.A.

Entire U.S.A.

Stratification

Are there demographic breakouts of the data available?

No/NA

No/NA

Adoption in the field

Where has this data been used by others in the field? What can we learn from their experiences?

Data are created for and used by HUD, in addition to other known community measurement efforts

Unknown

Strengths and limitations

What are the strengths of this dataset? Are there limitations that make the data more or less desirable than the alternatives? 

  • Represent the universe of all addresses in the United States

  • Updated every three months

  • There are strengths and limitations of observational data collection 

  • With good survey technique and modeling, survey data can provide extraordinarily accurate and reliable results

  • Normalized across multiple years

*HUD Aggregated USPS Administrative Data on Address Vacancies

~American Community Survey


Explore an Example: Learning Proficiency

In another example, we helped clients decide on the best measures of student learning outcomes for their project, which aims to inspire collective action for well-being among collaboratives across their state. Student learning outcomes, like reading and math proficiency at 4th grade and 8th grade levels, are important measures that inform education policy and investment. There are several sources for this kind of data. States publish student educational assessment data, and there are sources such as County Health Rankings and Kids Count that merge student education data from around the country into a single repository. One limitation of student learning outcome data is that data from different states are not directly comparable, as the assessment instruments (i.e. standardized tests) differ state-to-state. The Stanford Education Data Archive overcomes this challenge with its sophisticated analytic approach. After considering their options, the client decided to use their state-specific data because partners across the state would be more familiar with the source, making alignment and acceptance simpler.


When using data becomes daunting, IP3 is here to help. Our customized data selection process overcomes constraints and leverages opportunities of available indicators and datasets. We help our clients understand the full data landscape, so they can select the best option for the project. If you’re interested in more robust support around identifying, accessing, and using community health data in your work, contact us today to learn more!


106 views0 comments
bottom of page