We use publicly available, anonymized, and nationally aggregated data from Google’s Symptom Search Dataset (SSD), which reports the relative frequency of Internet searches for 420 signs, symptoms, and health conditions with well-documented privacy protections.31. For comparison, we use data from: (1) the Centers for Disease Control and Prevention (CDC) National Syndromic Surveillance Program (NSSP), which tracks emergency department (ED) visits for various conditions at facilities in 48 US states6 and (2) the US Census Bureau’s Household Pulse Survey (HPS), which assesses the social and economic impact of the pandemic7. The key features of these data sets are summarized in Table 1.
SSD is publicly available30 and provides daily and weekly time series of the relative volume of searches in the United States in English or Spanish for common symptoms and conditions. Data is available at the national, state, and county levels in the US and five other English-speaking countries. Search queries related to each symptom are aggregated and anonymized through the use of differential privacy32and then normalized by the total search volume in that region, as detailed elsewhere31.
SSD was built by leveraging Google’s web search tools that map queries to the Knowledge Graph33,3. 4 entities by continuously learning the associations between the words in user queries and the entities described in web pages viewed after those queries. The 420 symptoms and conditions included in SSD represent the most frequently searched entities (by query volume). Each entity (symptom or condition) is associated with tens or hundreds of thousands of individual queries issued by Google users on desktop or mobile devices. Quotes and capitalization in queries are ignored and spelling errors are automatically corrected. Examples of queries included [lexapro], [depression test]either [signs of depression] for depression; [trazodone], [agoraphobia] either [panic attack] for anxiety; Y [I want to die], [how to die] Y [I want to kill myself] for suicidal ideation.
For the current study, we focused on SSD search queries related to anxiety, depression, and suicidal ideation between January 1, 2018, and December 31, 2020. We chose these entities a priori because they represent common conditions that they are sought frequently and because of their high relevance for the mental health of the population. We also considered searches related to motion sickness as a putative negative control in a subset of our analyses.
We compared weekly nationwide data on Internet searches measured by SSD with nationwide data on emergency room visits as reported by the NSSP. The NSSP is a CDC-led collaboration to collect, analyze, and share electronic health data from approximately 3,500 emergency departments, urgent care and outpatient centers, inpatient health care settings, and laboratories (collectively referred to as urgent care facilities). hereafter) in 48 states (except Hawaii and Wyoming) and Washington DC6. These facilities represent approximately 70% of all emergency room facilities in the US. The data used in this analysis was previously used by Holland et al. (2021)twenty and reused in this study with the permission of the authors.
We focus on two variables reported by Holland et al. (2021)twenty: (1) national counts of weekly emergency department visits for mental health conditions associated with natural or man-made disasters, such as stress, anxiety, symptoms consistent with acute stress disorder or post-traumatic stress disorder and panic, and (2 ) national counts of weekly suicide attempts. The data set included weekly ED visit counts from December 30, 2018, to October 10, 2020.
In addition, we compared Internet search data with HPS data. The HPS is a national survey designed to measure the impacts of the COVID-19 pandemic on the economic, physical, and mental health of American households.7. Phase 1 of the survey took place between April 23, 2020 and July 21, 2020, Phase 2 took place from August 19, 2020 to October 26, 2020, and Phase 3 took place between October 28, 2020. October 2020 and March 29, 2021. Although the survey is still ongoing, in the current analysis we use HPS data from these three phases35.
Questions about anxiety and depression symptoms were administered in all phases of the survey, while questions about mental health care were included in Phases 2 and 3. Questions about anxiety and depression symptoms included 4 items that are a modified version of the two-item survey. Patient Health Questionnaire (PHQ-2) and the two-item Generalized Anxiety Disorder Questionnaires (GAD-2). For each question, responses covered the last 7 days and were coded as follows: never = 0, several days = 1, more than half the days = 2, and almost every day = 3. The anxiety and depression were obtained by adding the responses to the two questions for each construct. The percentage of respondents scoring 3 or more on these summed scores is used in analyzes of the survey results. Items indexing mental health care assessed the percentage of adults in the past 4 weeks who reported taking prescription medication, receiving advice or therapy from a mental health professional, or needing advice or therapy from a mental health professional but not receiving it. (i.e. unmet needs). ).
We first use graphical approaches and descriptive statistics to identify temporal patterns in Internet searches related to anxiety, depression, and suicidal ideation. We then fit a generalized linear model with a log link function to quantify the impacts on relative search volumes associated with the week of the Christmas and Thanksgiving holidays and the onset of the COVID-19 pandemic (defined as the first 4 weeks of March 2020), adjusting for calendar year and season.
Second, we quantified the change in search volumes associated with the pandemic by calculating the percentage change in search frequency for each topic compared to the same week 1 year earlier for the period from January 1, 2020 and December 31, 2020. Similarly, we estimated the change in rates of ER visits for mental health symptoms and suicide attempts from the NSSP.
Third, we compute pairwise Pearson correlation coefficients between contemporaneous measures derived from SSD, NSSP, and HPS. The results were not materially different when Spearman’s correlation coefficients were used instead of Pearson’s. Additionally, we use scatter plots to visualize the relationship between specific pairs of markers in more depth. In sensitivity analyses, the possibility of a 1- or 2-week lag between the change in search volumes and the change in rates of ED visits for mental health problems or suicide attempts was considered. Specifically, we used a generalized linear model with a log link function to quantify the relative change in ED visits associated with searches in the same week, the previous week, and 2 weeks earlier. We fit separate models for each search concept. All analyzes were performed using R (version: 4.0.2). The code to replicate these analyzes is publicly available via GitHub at https://github.com/anthonysun95/Google_SSD_and_Mental_Health.