Better predictive modeling requires bigger, more varied, higher quality data sets

In a previous blog about predictive analytics, we discussed how comprehensive health care data is necessary for a high degree of prediction. In this post, we’ll discuss the variables that increase predictive accuracy.

The larger the better. As the sample size of a predictive model grows, the model’s uncertainty level and degree of bias decreases. By increasing sample size, health care organizations also increase the chances of seeing all likely events and patient variation.

On the other hand, by using a population that’s limited to a single health system with similar demographics, a study can’t account for all independent variables that could lead to a dependent event. Use large and diverse sample of patients from many health systems, geographic regions and demographics to reduce the homogeneity of your sample as well as the likelihood of skewing your study.

Variety is not only the spice of life; it’s a key to predictive accuracy. Using relevant and varied data sets, organizations will uncover the most predictive variables. Health care data sets should start with clinical and claims data. But socioeconomic and care management data should also be integrated into the predictive data set.

When the Veterans Health Administration reviewed risk prediction models for hospital readmission in 2011, it found that social and environmental factors contributed to readmission risk. Those factors included access to care, social support and substance abuse. Care management factors such as discharge follow-up and coordination of care with primary care physicians were also identified as likely elements of readmission risk.

In predictive analytics, data quality is job one. Even with data aggregated from diverse sources, settings and organizations, data is relatively useless for prediction if it hasn’t been cleaned, normalized and validated. Each data set must be prepared the same way, using a single consistent ontology (or clinical classification), or statistical validity will suffer.

whitepaperFor more on the variables that determine the accuracy of predictive analytics, download this Optum white paper.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s