I am now caught up on #opendata highlights - with infectious diseases and genetics!

Thursday, November 30, 2017

Because I've had a gap in blogging over the last few months, I thought I would ease back into things by highlighting public health-related data sets going back through Data is Plural, one edition at a time.

This is the last one - I'm finally caught up!

The June 28 edition of Data is Plural features two health-related data sets - one on infectious diseases in Europe and another that contains people's self-published results of commercial genetics tests:
Infectious diseases in Europe. The European Centre for Disease Prevention and Control’s Surveillance Atlas of Infectious Diseases lets you browse, map, and download data on the historical incidence of several dozen diseases — from anthrax to Zika — in each of the European Economic Area’s countries. Related: Keila GuimarĂ£es’s recent investigation into penicillin shortages, which uses the Centre’s data on syphilis cases.

People’s genes. OpenSNP is a website that lets people publish the results of their genetic tests (such as those sold by 23andMe, deCODEme, FamilyTreeDNA), “find others with similar genetic variations, [get] the latest primary literature on their variations, and help scientists find new associations.” Since 2012, users have uploaded more than 3,000 sets of genetic variants, which you can download individually or in bulk or access via OpenSNP’s API. Users can also list various personal traits, such as eye color, height, coffee consumption, and lactose intolerance. Useful primer: SNP stands for “single nucleotide polymorphism,” the NIH explains. They’re “the most common type of genetic variation”; each one “represents a difference in a single DNA building block, called a nucleotide.”

More #opendata highlights: Brain scans

Wednesday, November 29, 2017

Because I've had a gap in blogging over the last few months, I thought I would ease back into things by highlighting public health-related data sets going back through Data is Plural, one edition at a time.

The August 16 edition of Data is Plural contains a data set with two collections of MRI brain scans:
Brain scans. The Open Access Series of Imaging Studies (OASIS) project is “aimed at making MRI data sets of the brain freely available to the scientific community,” with the goal of “[facilitating] future discoveries in basic and clinical neuroscience.” So far, the project has published two collections: a cross-sectional dataset of scans from 416 people, ages 18 to 96; and a longitudinal dataset, based on 150 people aged 60 to 96, each of whom were scanned at least two different times. [h/t Andrew Beam]

Last week in @CDCMMWR: #Smoking policies in airports

Monday, November 27, 2017

Last week's edition of MMWR featured an article on the smoking policies in the world's 50 busiest airports. Not surprisingly (to those of us who have traveled in Asia, at least), very few of the airports on the list in Asia have smoke-free policies. However, I was surprised to discover that three large US airports (Las Vegas, Atlanta, and Denver) still have indoor smoking rooms. The other North American airport on the list with indoor smoking still allowed is Mexico City.
There is no risk-free level of exposure to secondhand smoke. Eliminating smoking in indoor spaces fully protects nonsmokers from exposure to secondhand smoke. An overwhelming majority of large-hub airports in the United States prohibit smoking indoors.
Among the 50 busiest airports worldwide, 23 airports (46%), including five of the 10 busiest airports, prohibit smoking in all indoor areas. While smoke-free airports among the 50 busiest are common in North America (14 of 18), few airports in Asia (4 of 22) have implemented smoke-free polices.
This issue also features an article on the status of polio eradication in Pakistan.

More #opendata highlights: #malaria mosquitoes

Wednesday, November 22, 2017

Because I've had a gap in blogging over the last few months, I thought I would ease back into things by highlighting public health-related data sets going back through Data is Plural, one edition at a time.

The August 23 edition of Data is Plural (there were no data sets directly related to health in any of the September editions) features geospatial data on Anopheles mosquitoes, which is the type that carries malaria:
A century of malarial mosquitoes. A team of researchers has compiled “the largest ever geo-coded database of anophelines in Africa.” [...] The database covers 1898 to 2016 and includes more than 13,400 observations of mosquitoes in specific locations. For each observation, the dataset lists the country, administrative region(s), and latitude/longitude, as well as the time period, the species identified, the sampling method, and the source of the information. [h/t Michael Chew]

All kinds of good stuff in @LancetGH's December issue: injection drug use, maternal mortality data, and antimicrobial resistance

Tuesday, November 21, 2017

The December issue of Lancet Global Health features articles and commentary on several hot-button issues in global health. I was quite pleased to see two systematic reviews related to injection drug use: one on the prevalence of IDU worldwide and the rates of HIV, HBV, and HCV among IDU, and another on interventions to address HIV and HCV risk among PWID (including syringe exchange programs). The accompanying commentary is a great read:
However, the coverage of NSP [needle and syringe programs], OST [opioid substitution therapy], and HIV services for PWID...is very limited. Of the countries and territories with evidence of IDU, only 52% reported the presence of NSP and 48% reported the presence of OST. The situation is even worse for the uptake of comprehensive harm reduction programmes: the authors estimate that, globally, less than 1% of PWID live in countries with a high coverage of both NSP and OST. It is important to note that many countries in the most affected regions criminalise drug use (with some still having death penalties for drug offences), do not allow access to harm reduction services, or both.
Although the two systematic reviews show that some progress has been made in the estimation of IDU and infection prevalence, they also brutally underscore the absence of significant improvement in the scaling-up of increasingly well documented, evidence-based interventions to prevent new infections among PWID in countries and regions with expanding epidemics.
The headlining editorial looks at the effort to combat antimicrobial resistance, and the issue also features a piece on missing the forest for the trees when trying to classify maternal mortality data. And if you're into vision loss, cerebral palsy, or malnutrition, there's something for you, too.

Last week in @CDCMMWR: Global routine #vaccination coverage, rubella, and #opioid reports

Monday, November 20, 2017

Last week's edition of MMWR featured two global health-focused articles. The one that caught my eye was an update on coverage of routine vaccinations for children around the world. While progress has been substantial since the WHO launched the Expanded Program on Immunization in 1974, it appears to have stalled in the last few years:
Since then, global coverage with vaccines to prevent tuberculosis, diphtheria, tetanus, pertussis, poliomyelitis, and measles has increased from less than 5% to 85% or greater and additional vaccines against hepatitis B, Haemophilus influenzae type B, Streptococcus pneumoniae, rotavirus, and rubella have been included in vaccine recommendations introduced in multiple countries.
Global coverage with the third dose of diphtheria and tetanus toxoids and pertussis–containing vaccine, the third dose of polio vaccine, and first dose of measles- containing vaccine coverage has remained unchanged at 84%–86% since 2010. Among new or underused vaccines, global coverage increased during 2010–2016 for completed vaccine series against rotavirus (8% to 25%), Streptococcus pneumoniae (11% to 42%), rubella (35% to 47%), Haemophilus influenzae type B (42% to 70%) and hepatitis B vaccine (74% to 84%).
There is also an article on the progress of rubella elimination worldwide. As of December 2016, " Elimination of rubella and congenital rubella syndrome was verified in the WHO Region of the Americas in 2015, and 33 (62%) of 53 countries in the European Region have now eliminated endemic rubella and congenital rubella syndrome."

Bonus: The journal has compiled a list of all articles and reports on opioids published since 2000.

More #opendata highlights: X-rays and air quality

Monday, November 13, 2017

Because I've had a gap in blogging over the last few months, I thought I would ease back into things by highlighting public health-related data sets going back through Data is Plural, one edition at a time.

The October 4 edition of Data is Plural featured air quality data and chest X-rays from the EPA and the NIH, respectively:
Four decades of U.S. air quality. The Environmental Protection Agency collects air quality samples from thousands of monitoring stations across the country. The resulting datasets, which go back to the 1980s, are available as daily files, annual files, and via an API. The monitored pollutants include ozone, carbon monoxide, sulfur dioxide, nitrogen dioxide, particulate matter, volatile organic compounds, and more. You can also download daily Air Quality Index ratings and information about each monitoring station. Previously: Global air pollution datasets from Berkeley Earth (DIP 2017.03.22) and from the World Health Organization (DIP 2016.06.15). [h/t Swier Heeres]

Chest x-rays. Last week, the National Institutes of Health released a datasetcontaining more than 100,000 anonymized chest x-rays, from 30,000 patients, “including many with advanced lung disease.” For each image, the associated metadata includes the patient’s age, gender, and diagnosis labels. (The dataset’s authors used natural language processing to extract those labels from radiological reports; they estimate that fewer than 10% of the labels are incorrect.) Related:Andrew L. Beam’s list of medical datasets for machine learning. [h/t Chris Hamby]

This week in @CDCMMWR: Assessing Kenya's and Ghana's immunization information systems

Saturday, November 11, 2017

I'm trying to get back into blogging regularly by doing some regular, manageable features. Since I read CDC's MMWR every week and it often contains articles relevant to global health and/or data quality, I am going to try to feature articles of interest here.

This week's MMWR has an article on a recently revamped data quality assessment tool that is intended to measure immunization information systems in low- and middle-income countries. The WHO partnered with the CDC to develop updated assessment guidelines in 2014, as the original guidelines developed in 2001 were missing the mark. The article presents the results of using the updated assessment tool in Kenya in 2015 and in Ghana in 2016:
The availability, quality, and use of immunization data are widely considered to form the foundation of successful national immunization programs. Lower- and middle-income countries have used systematic methods for the assessment of administrative immunization data quality since 2001, when the World Health Organization (WHO) developed the Data Quality Audit methodology. WHO adapted this methodology for use by national programs as a self-assessment tool, the Data Quality Self-Assessment. This methodology was further refined by WHO and CDC in 2014 as an immunization information system assessment (IISA).
The experience gained from implementing assessments using updated IISA guidance in Kenya and Ghana provides an opportunity to inform other countries interested in best practices for assessing their data quality and creating actionable data quality improvement plans. Data quality improvement is important to provide the most accurate and actionable evidence base for future decision-making and investments in immunization programs. This review provides best practice experiences and recommendations for countries to use an IISA to assess data quality from national administrative structure down to the facility level. This methodology also meets the requirements for use by Gavi, the Vaccine Alliance, for monitoring national immunization data quality at a minimum interval of every 5 years in conjunction with funding decisions.
The issue also has articles on tobacco use and waterborne disease outbreaks in the U.S. - including in drinking water (which is scary, since most of us in the states take safe drinking water for granted).

More public data set highlights: Wildfires, vehicle safety, and water quality

Thursday, November 9, 2017

Because I've had a gap in blogging over the last few months, I thought I would ease back into things by highlighting public health-related data sets going back through Data is Plural, one edition at a time.

The October 11 edition of Data is Plural featured data sets on wildfires, vehicle safety, and the water quality of the San Francisco Bay:
Wildfires.Monitoring Trends in Burn Severity (MTBS) is an interagency program whose goal is to consistently map the burn severity and extent of large fires across all lands of the United States”; the most recent release contains more than 20,000 fires from 1984 to 2015. You can explore the data online, or download it in bulk. For more recent data, see GeoMAC, which aims to map all current wildfires; NOAA’s Hazard Mapping System, which uses satellites to detect fire locations and smoke plumes; and NASA’s MODIS and VIIRS datasets, which provide satellite-based detections for the entire globe. Previously: National Fire Incident Reporting System , which also includes structure fires and vehicle fires (DIP 2016.07.20). [h/t Max Joseph ]

Commercial vehicle safety. The Federal Motor Carrier Safety Administration helps to regulate the United States’ large trucks and passenger buses. The datasets available through its Safety Measurement System include a census of all regulated carriers, the results of safety inspections, and reported crashes. The crash files list the number of injuries and fatalities; the weather, light, and road conditions; the involved vehicle’s VIN and license plate number; and more. [h/t Dan Brady]

San Francisco Bay water. The U.S. Geological Survey has been measuring water quality in the San Francisco Bay for nearly 50 years. The agency recently published 210,826 of these measurements, collected from dozens of monitoring stations between April 1969 and December 2015. (It’s “one of the longest records of water-quality measurements in a North American estuary,” according to a recent academic article describing the data.) Each row specifies the measurement’s date, station, depth, temperature, and salinity; many rows include levels of chlorophyll, oxygen, nitrate, ammonium, and other matter.

More public data set highlights: Puerto Rico's disaster recovery

Saturday, November 4, 2017

Because I've had a gap in blogging over the last few months, I thought I would ease back into things by highlighting public health-related data sets going back through Data is Plural, one edition at a time.

The October 18 edition of Data is Plural featured a data set with different metrics related to Puerto Rico's disaster recovery efforts:
Puerto Rico’s recovery. Since shortly after Hurricane Maria hit Puerto Rico, the territory’s government has been publishing a dashboard of recovery statistics. The website tracks a couple dozen metrics, including the percent of homes with electricity, number of people in shelters, and the number of open hospitals. For several of the main metrics, researcher Michael A. Johansson has been scraping daily figures from the dashboard and publishing them as a CSV file. Related: The Washington Post has been charting the recovery, and published a deep dive into the island’s ongoing power outages.

Public data set highlights: Deepwater Horizon and cardiovascular epidemiology

Friday, November 3, 2017

This week's Data is Plural newsletter features two health-related datasets: one with NOAA data on the effects of the Deepwater Horizon explosion and one on cardiovascular mortality from IHME at the University of Washington. Hooray epidemiology!
Deepwater Horizon’s effects. For years, the National Oceanic & Atmospheric Administration has been working to assess the damage done to natural resources by the April 2010 Deepwater Horizon explosion and oil spill. As part of that effort, they’ve collected and compiled several dozen related datasets, including toxicity studies, plankton samples, necropsies of stranded turtles, dolphin health assessments, and a “backyard boater” survey. [h/t Sebastian Kraus]

County-level cardiovascular deaths. Researchers at the University of Washington’s Institute for Health Metrics and Evaluation to estimated cardiovascular mortality rates for each U.S. county, for every year between 1980 and 2014. The findings, based on 32 million de-identified death records, population data from the Census, and other sources, are also broken down by particular disease (e.g., aortic aneurysm, ischemic stroke, etc.) and gender. Related: The researchers’ JAMA article describing their methodology and findings. Previously: The Global Burden of Disease dataset, published by the same institute (DIP 2016.07.27). [h/t Michael A. Rice, a teacher at Ingraham High School in Seattle]
Bonus: The newsletter also has a public data set on all the sexual assault allegations for recent high-profile cases, including Cosby, Weinstein, and Trump.