Population-disease metrics are crucial in guiding the distribution of resources to health interventions and in implementing public health initiatives but are not always available for all diseases. Google, a popular search engine, could be a viable source of population-disease metrics when other sources are lacking. This study analyses the ability of Google to predict cancer mortality rates.
The internet has become people’s number one source of information on a variety of topics, from home repairs to health concerns. Google, one of the most popular search engines, collects information on the internet searches performed by internet users and compiles this data into reports called Google Trends. Google Trends data are normalized for total Google search volume; data points are “divided by the total searches of the geography and time range it represents” in a comparison of relative popularity. Results are then presented in Search Volume Indices (SVIs) on a scale ranging from 0 to 100; for instance, the state with the highest number of searches for a topic relative to the total number of searches would receive an SVI score of 100 and other states with lower relative search numbers for that topic would receive lower SVI scores. This data can serve multiple purposes, depending on the objectives of the interested party; as such, if accurate it could serve to estimate the impact of disease if other validated population-disease metrics are unavailable.
Wehner and colleagues investigated the association between total internet search volumes for common forms of cancer and published cancer mortality rates and incidence in the United States (per state) to test their hypothesis of the potential for a positive correlation between the two variables. Their findings were published in the Journal of the American Medical Association Dermatology. They collected Google search volume data through Google Trends and used it to estimate the relative search volume for 10 types of cancer in all 50 states and the District of Columbia from 2009 to 2013. The cancers included breast, bladder, colorectal, lung, non-Hodgkin lymphoma, melanoma, prostate, and thyroid, which are considered the most common cancers in the United States by the Centers for Disease Control and Prevention’s National Program of Cancer Registries. This data was compared to age-adjusted cancer incidence and mortality rates for 2009-2013 in all 50 states and the District of Columbia, obtained from the National Program of Cancer Registries.
State-specific relative Google search volumes had a positive association with state-specific cancer incidence and mortality rates for several cancers, including colon, lung, lymphoma, and melanoma. These findings support the potential use of internet search data and other publicly available sources of information on population search trends on health topics to provide estimates of disease characteristics, like incidence and mortality rates. This information may be particularly valuable when there is a lack of national registry data on a certain disease or to obtain real-time information on a condition of interest, as registry data are often several years old at the time they are published.
Wehner and colleagues cite a few limitations with the study. One, use of Google search data for the estimation of disease metrics may not necessarily be generalizable because data will be restricted to internet users who use Google. Second, the study findings may not be generalizable to rare diseases or diseases lacking a common unifying search term, as these may not have recorded search volumes in Google Trends. Lastly, because search volume may vary independently of disease metrics (as could be the case with public health campaigns like awareness or screening initiatives targeted to specific diseases), using internet search data may be inappropriate for the comparison of incidence and mortality rates between diseases.
Although not perfect, there was some ability of Google Trends to predict cancer mortality rates. This information has the potential to inform policy and funding choices in the absence of other, more reliable sources of current information.
Written by Sara Alvarado BSc, MPH
Wehner, M., Nead, K., Linos, E. (2017). Correlation among cancer incidence and mortality rates and internet searches in the United States. DOI: 10.1001/jamadermatol.2017.1870