Challenge Restricted Access

WiDS (Women in Data Science) Datathon 2020: ICU Mortality Prediction

Meredith Lee Jesse Raffa Marzyeh Ghassemi Tom Pollard Sharada Kalanidhi Omar Badawi Karen Matthys Leo Anthony Celi

Published: Jan. 22, 2020. Version: 1.0.0


When using this resource, please cite: (show more options)
Lee, M., Raffa, J., Ghassemi, M., Pollard, T., Kalanidhi, S., Badawi, O., Matthys, K., & Celi, L. A. (2020). WiDS (Women in Data Science) Datathon 2020: ICU Mortality Prediction (version 1.0.0). PhysioNet. https://doi.org/10.13026/vc0e-th79.

Please include the standard citation for PhysioNet: (show more options)
Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.

Abstract

WiDS (Women in Data Science) Datathon 2020: ICU Mortality Prediction focuses on patient health through data from MIT’s GOSSIS (Global Open Source Severity of Illness Score) initiative. Brought to you by the Global WiDS team, the West Big Data Innovation Hub, and the WiDS Datathon Committee, this year’s datathon is launching on Kaggle: bit.ly/WiDSdatathon2020kaggle


Objective

The challenge is to create a model that uses data from the first 24 hours of intensive care to predict patient survival. MIT's GOSSIS community initiative, with privacy certification from the Harvard Privacy Lab, has provided a dataset of more than 130,000 hospital Intensive Care Unit (ICU) visits from patients, spanning a one-year timeframe. This data is part of a growing global effort and consortium spanning Argentina, Australia, New Zealand, Sri Lanka, Brazil, and more than 200 hospitals in the United States.

Training data are provided for model development; you will then upload your predictions for unlabeled test data to Kaggle and these predictions will be used to determine the leaderboard rankings.

Tutorials, sample code, other resources, and updates will be posted throughout the competition at widsconference.org/datathon and on the Kaggle Discussion Forum. Winners will be announced at the WiDS Conference at Stanford University and via livestream, reaching a community of 100,000+ data enthusiasts across more than 70 countries.


Participation

We invite anyone, from those new to data science to veterans of the field, to participate. For those who have never tried machine learning or worked with health data before, the WiDS Datathon Committee will be releasing a series of tutorials and webinars to help participants get started.

The WiDS Datathon aims to inspire women worldwide to learn more about data science, and to create a supportive environment for women to connect with others in their community who share their interests. Toward these ends, we open the datathon to individuals or teams of up to 4; at least half of each team must be women (individuals identifying as female participants). Participants can include students, faculty, and individuals with various roles in non-profit, academic, government, and industry organizations.

This year's datathon will begin with a Kaggle leaderboard. To help encourage deeper exploration and the development of collaborative data science innovations, WiDS and the National Science Foundation Big Data Innovation Hubs are collaborating to host the first-ever WiDS Datathon Extension Excellence in Research Award. The overall timeline will include the following milestones:

January 13, 2020: Official start of the Datathon. Registration is required at bit.ly/WiDSdatathon2020.

February 21, 2020: Team merger deadline. This is the last day competitors may join (or merge) teams.

February 24, 2020: Entry deadline. Participants must accept the competition rules on Kaggle by this date, and choose up to 2 final submissions to be included in the Kaggle leaderboard. 

March 2, 2020: Leaderboard winners will be announced at the WiDS Conference.

March 31, 2020: Deadline for participants to submit their one-page research paper to be eligible for the first-ever WiDS Datathon 2020 Excellence in Research Award.

May 2020: WiDS and the National Science Foundation Big Data Innovation Hubs will announce the Datathon Extension Excellence in Research Award.


Data Description

MIT's GOSSIS community initiative, with privacy certification from the Harvard Privacy Lab, has provided a dataset of more than 130,000 hospital Intensive Care Unit (ICU) visits from patients, spanning a one-year timeframe. This data is part of a growing global effort and consortium spanning Argentina, Australia, New Zealand, Sri Lanka, Brazil, and more than 200 hospitals in the United States.

The data includes:

  • Training data for 91,713 encounters. 
  • Unlabeled test data for 39,308 encounters, which includes all the information in the training data except for the values for hospital_death. 
  • WiDS Datathon 2020 Dictionary with supplemental information about the data, including the category (e.g., identifier, demographic, vitals), unit of measure, data type (e.g., numeric, binary), description, and examples. 
  • Sample submission files

To learn more about the data, and to accept the conditions for access, please register at bit.ly/WiDSdatathon2020 and visit bit.ly/WiDSdatathon2020kaggle.


Evaluation

For each encounter_id in the test set, you are asked to explore the columns of data (for example, patient laboratory results, demographics, and vital signs) and create a model for predicting the probability of patient survival.

hospital_death value of 1 corresponds patient death and a value of 0 corresponds to survival.

Your submission file on the Kaggle platform should contain a header and have the following format:

encounter_id,hospital_death
1,0.814
2,0.01
3, 0.5

etc.

Submissions will be evaluated on the Area under the Receiver Operating Characteristic (ROC) curve between the predicted mortality and the observed target (hospital_death), resulting in a public and private leaderboard on Kaggle. By default, a user's best public scoring submissions will be used for the final Kaggle leaderboard, but participants can select up to two eligible submissions to be evaluated final private leaderboard. Tutorials and additional resources are included at bit.ly/WiDSdatathon2020kaggle.

In addition to the standard leaderboard evaluation, to help encourage deeper exploration and the development of collaborative data science innovations, WiDS and the National Science Foundation Big Data Innovation Hubs are hosting the first-ever WiDS Datathon Extension Excellence in Research Award. Submissions in the form of one-page research papers will be reviewed for their potential for real-world impact by subject matter experts. To be eligible for the award, entrants must participate in the first phase of the WiDS Datathon 2020 on Kaggle. Individuals and teams should submit their papers with the same names and contact emails used for WiDS Datathon 2020 registration.


Acknowledgements

​The WiDS Datathon 2020 is a collaboration led by the Global WiDS team at Stanford, the West Big Data Innovation Hub, and the WiDS Datathon Committee. Special thanks to the MIT GOSSIS Initiative, the University of Toronto, and the Harvard Data Privacy Lab, as well as our growing community of sponsors and supporters.


Conflicts of Interest

The authors have no conflicts of interest to declare.


Share
Access

Access Policy:
Only registered users who sign the specified data use agreement can access the files.

License (for files):
PhysioNet Restricted Health Data License 1.5.0

Data Use Agreement:
PhysioNet Restricted Health Data Use Agreement 1.5.0

Corresponding Author
You must be logged in to view the contact information.

Files