More #opendata highlights: X-rays and air quality

Monday, November 13, 2017

Because I've had a gap in blogging over the last few months, I thought I would ease back into things by highlighting public health-related data sets going back through Data is Plural, one edition at a time.

The October 4 edition of Data is Plural featured air quality data and chest X-rays from the EPA and the NIH, respectively:
Four decades of U.S. air quality. The Environmental Protection Agency collects air quality samples from thousands of monitoring stations across the country. The resulting datasets, which go back to the 1980s, are available as daily files, annual files, and via an API. The monitored pollutants include ozone, carbon monoxide, sulfur dioxide, nitrogen dioxide, particulate matter, volatile organic compounds, and more. You can also download daily Air Quality Index ratings and information about each monitoring station. Previously: Global air pollution datasets from Berkeley Earth (DIP 2017.03.22) and from the World Health Organization (DIP 2016.06.15). [h/t Swier Heeres]

Chest x-rays. Last week, the National Institutes of Health released a datasetcontaining more than 100,000 anonymized chest x-rays, from 30,000 patients, “including many with advanced lung disease.” For each image, the associated metadata includes the patient’s age, gender, and diagnosis labels. (The dataset’s authors used natural language processing to extract those labels from radiological reports; they estimate that fewer than 10% of the labels are incorrect.) Related:Andrew L. Beam’s list of medical datasets for machine learning. [h/t Chris Hamby]

No comments :

Post a Comment