Public dataset highlights: Antibiotic resistance

Friday, June 16, 2017

This is a big one and has been getting a lot more attention ever since the UN met last September to wring their hands over it. This week's Data is Plural features a publicly available dataset on antibiotic resistance genes:
Antibiotic resistance. ResistoMap is an interactive visualization of antibiotic drug resistance, based on more than 1,500 bacteria genome samples from people’s intestinal tracts. The data behind the visualization is available to download. It’s partly based on two prior datasets: McMaster University’s Comprehensive Antibiotic Resistance Database (“a bioinformatic database of resistance genes, their products and associated phenotypes”) and the University of Gothenburg’s BacMet (“an easy-to-use bioinformatics resource of antibacterial biocide- and metal-resistance genes”).

Public dataset highlights: Migrating scientists, workplace injuries, and beach bacteria

Wednesday, June 7, 2017

This week's Data is Plural features three datasets of public health and scientific interest - including one that includes yours truly! I have an ORCID, which means my research profile is included in the first dataset.
Millions of scientists, and their migrations. ORCID is a nonprofit organization that provides unique identifiers for researchers — mostly scientists so far — to make it easier to distinguish between them. It has issued more than 3 million IDs so far, and provides annual bulk downloads of all researchers’ public profiles . In many cases, the researchers have supplied their education and employment histories. That enabled Science magazine to analyze the migrations of more than 110,000 researchers who’ve listed multiple countries in these public CVs. (The data and code underlying the analysis are also available to download .)

Severe workplace injuries. Beginning in January 2015, the Occupational Safety and Health Administration began requiring U.S. employers to report “all severe work-related injuries, defined as an amputation, in-patient hospitalization, or loss of an eye.” You can download a spreadsheet of these injuries — some 20,000 in 2015 and 2016 combined. It contains the injury dates, descriptions, and outcomes, as well as the employers’ names and locations. Previously: OSHA’s more detailed (but slightly more cumbersome) inspection data and API (DIP 2016.07.13).

E. coli at Ocean Beach. The San Francisco Public Utilities Commission’s Beach Water Quality Monitoring Program measures bacteria levels at fifteen locations on the city’s shoreline. You can download the measurements by clicking the “raw data” link below this map . The data powers the (unsurprisingly) unofficial @BeachPooBot account on Twitter.

Things I loved this week: #LegalEpidemiology, @HepVu, and @CDCgov's Healthy Behavior Data Challenge

Friday, May 19, 2017

Part of why I love my new gig at Cadence Group is that, in my responsibility to be informed and up-to-date on all things public health, I am constantly nerding out on new and exciting topics in my favorite fields. I had the chance to watch a webinar on one of those emerging areas - legal epidemiology - earlier this week. It's basically exactly what it sounds like: "the scientific study of law as a factor in the cause, distribution, and prevention of disease in a population." Despite its potential as a complex and fruitful area of study, there isn't much literature out there on the topic, though CDC's Public Health Law Program appears to be the best place to start). Lucky for me (and anyone else who is curious), the National Environmental Health Association is hosting a three-part webinar series on the topic this summer. The first webinar was held last week, with the recording and slides posted. The second installment is on June 14th, and the third on August 16th.

Data visualizations are one of my favorite things, a perfect marriage between my love of data and my experience leading the Communications Committee for APHA's International Health Section. Naturally this meant I got super excited when AIDSVu launched, happened just before I began working as an epidemiologist with the Texas HIV prevention program. Today I discovered that the initiative has launched a similar site, HepVu, which (as the name implies) makes hepatitis surveillance data available via interactive maps and data visualizations.

Finally, I stumbled across the Healthy Behavior Data Challenge, a call by CDC "for new ways to address the challenges and limitations of self-reported health surveillance information and tap into the potential of innovative data sources and alternative methodologies for public health surveillance":
The Healthy Behavior Data (HBD) Challenge will support the development and implementation of prototypes to use these novel methodologies and data sources (e.g., wearable devices, mobile applications, and/or social media) to enhance traditional healthy behaviors surveillance systems in the areas of nutrition, physical activity, sedentary behaviors, and/or sleep among the adult population aged 18 years and older in the US and US territories.

The collection of health data through traditional surveillance modes including telephone and in-person interviewing, however, is becoming increasingly challenging and costly with declines in participation and changes in personal communications. In addition, the self-reported nature of responses particularly in the areas of nutrition, physical activity, sedentary behaviors, and sleep has been a major limitation in these surveillance systems, since self-reported data are subject to under/over reporting and recall bias. Meanwhile, the advent of new technologies and data sources including wearable devices ( such as: smart watches, activity trackers, sleep monitors, etc.), mobile health applications on smartphones or tablets, and data from social media represents an opportunity to enhance the ability to monitor health-related information and potentially adjust for methodological limitations in traditional self-reported data.

The Healthy Behavior Data (HBD) Challenge will be conducted concurrently with a similar challenge proposed by the Public Health Agency of Canada. This will enable the two countries to learn from their respective challenges and leverage information. We expect increased efficiency with a dual challenge.
It struck me as pretty reminiscent of the Data for Climate Action challenge by UN Global Pulse.

Happy Friday!

Public dataset highlights: The cost of food

Wednesday, May 17, 2017

This week's Data is Plural features a dataset on global food prices:
Global food prices. The UN World Food Programme’s vulnerability analysis group collects and publishes food price data for more than 1,000 towns and cities in more than 70 countries. The dataset, which goes back more than a decade, covers basic staples, such as wheat, rice, milk, oil, and more. It’s updated monthly and feeds into (among other things) the UNWFP’s price-spike indicators. Related: The Humanitarian Data Exchange, which hosts the dataset for the UN. Also: The Economist’s Big Mac Index. [h/t Andrew McCartney]

Spatial epidemiology on @NPR @MorningEdition: #Malaria and gold mining

Thursday, May 11, 2017

I've unexpectedly found myself in hog heaven since moving to the Maryland side of DC for a new position at the beginning of this month. I'm staying with a friend while I look for my own place and, while I have a much longer commute than I am used to, I am enjoying all 40 minutes of it because I am spending all of them listening to WAMU, the DC-area NPR station out of American University. I've always liked NPR, not only because they provide (I feel) balanced coverage of major news items, but also because they feature so many interesting stories that wouldn't normally get much press, including engaging pieces on public health and human rights.

Case in point: Yesterday's Morning Edition featured a story on how illegal gold mining has been linked to malaria in Colombia. The segment featured an interview with Sandra Rozo, an economist with USC's Marshall School of Business, whose recent work has focused on providing an evidence base for qualitative data suggesting a link between alluvial gold mining and higher incidence of malaria:
As illegal gold mining is mainly performed in open sky mines that are commonly located inside or close to water surfaces where large pits are dug, it is plausible to conceive that these pits are later filled with water, which would make them ideal breed sites for Anopheles mosquito larva. Because these mines do not follow any protocols or rules and are not registered with local authorities, it is likely that illegal miners have limited knowledge of the need for or methods of malaria prevention. They are likely to leave the pits open and do not take any measures to protect themselves against malaria. Finally, illegal gold miners are also a population that sustains high migration rates, which could also help to propagate the parasite incidence to other areas. Due to data limitations, however, at present, the existing evidence has been concentrated on qualitative studies or on documenting correlations between malaria incidence and gold exploitation.
As she points out in her paper, however, there are a number of other factors that could contribute to the correlation without the relationship being causal per se - hence the value of supporting quantitative epidemiological analysis. Rozo is not the first to explore this from an epidemiological perspective, either. This paper by Castellanos et al finds a strong correlation between gold production rates and malaria cases using malaria surveillance data and government data on legal mining activities. One major limitation, though, was that they could examine correlations to legal mining activities - that is, mining operations that are registered with, and regulated by, the government. The paper also notes that between the two types of mining (traditional alluvial vs. the more modern "open sky"), the "open sky" technique is less regulated and more likely to be performed illegally.

Rozo's analysis is interesting for a couple of reasons. She combines satellite data identifying mining operations with geographic data on geochemical anomalies of gold and matches that to malaria surveillance data to determine the relationship between gold mining activity and malaria incidence. She also controls for several potential confounders, including poverty levels, presence of government and health institutions, climactic factors, and chronic disease.

This is the type of analytical application that makes spatial epidemiology so exciting and demonstrates how epidemiology can be used to build or strengthen the case for policy change to benefit public health. It also spotlights why political and economic forces that we don't typically think of as explicitly health related are still very much relevant to public health researchers and policy makers. Less than a month before Rozo's paper was posted, the New York Times ran a story on how malaria has come back with a vengeance in Venezuela since the economic crisis. As many professionals had turned to gold mining to survive, they were repeatedly getting sick with malaria - and taking it back to the cities with them.