{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python demo for the 2018 BHI & BSN Data Challenge\n", "\n", "This notebook provides a simple introduction to analysing the MIMIC-III database. It was created as a demonstrator for the [2018 BHI & BSN Data Challenge](https://mimic.physionet.org/events/bhibsn-challenge/), which explores the following question:\n", "\n", "> Are patients admitted to the intensive care unit (ICU) on a weekend more likely to die in the hospital than those admitted on a weekday?\n", "\n", "We have provided an example slide template for final presentations (`slide-template.pptx`) at: https://github.com/MIT-LCP/bhi-bsn-challenge. There is no obligation to use it!\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Background on MIMIC-III\n", "\n", "MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. \n", "\n", "Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. \n", "\n", "For details, see: https://mimic.physionet.org/. The data is downloaded as 26 CSV files, which can then be loaded into a database system. Scripts for loading the data into Postgres are provided in the [MIMIC Code Repository](https://mimic.physionet.org/gettingstarted/dbsetup/). A demo dataset is also available: https://mimic.physionet.org/gettingstarted/demo/\n", "\n", "Points to note:\n", "\n", "- A patient-level shift has been applied to dates. Day of week is retained. \n", "- Patients aged >89 years on first admission have been reassigned an age of ~300 years.\n", "- Patients may have multiple hospital admissions. Each hospital admission may comprise multiple ICU stays (e.g. a patient may visit the ICU, leave for surgery, and then return to the ICU for recovery, all within a single hospital admission).\n", "\n", "If you need help getting set up with access to MIMIC-III, please contact `data-challenge@physionet.org`.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Import libraries" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python2.7/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.\n", " from pandas.core import datetools\n" ] } ], "source": [ "# Data processing libraries\n", "import pandas as pd\n", "import numpy as np\n", "import itertools\n", "\n", "# Database libraries\n", "import psycopg2\n", "\n", "# Stats libraries\n", "from tableone import TableOne\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf\n", "import scipy.stats\n", "\n", "# Image libraries\n", "# https://jakevdp.github.io/pdvega/\n", "# jupyter nbextension enable vega3 --py --sys-prefix\n", "import matplotlib.pyplot as plt\n", "import pdvega \n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Connect to the MIMIC-III database\n", "\n", "If you have created an instance of the MIMIC-III database, then you should be able to connect with the following settings (or similar). If you need help getting set up with access to MIMIC-III, please contact `data-challenge@physionet.org`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Create a database connection\n", "user = 'XXX'\n", "password = 'XXX'\n", "host = 'localhost'\n", "dbname = 'mimic'\n", "schema = 'public, mimiciii_demo'" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Password:········\n" ] } ], "source": [ "# Connect to the database\n", "con = psycopg2.connect(dbname=dbname, user=user, host=host, \n", " password=password)\n", "cur = con.cursor()\n", "cur.execute('SET search_path to {}'.format(schema))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Extract data from MIMIC-III and assign to a Pandas DataFrame\n", "\n", "The following query extracts a simple dataset from the MIMIC-III database, comprising demographics, hospital and ICU admission times, and a severity of illness score ([OASIS](https://www.ncbi.nlm.nih.gov/pubmed/23660729)).\n", "\n", "Before running this query, you must first build the `icustay_detail` and `oasis` materialized views. Code for building these views is available in the MIMIC Code Repository:\n", "- `icustay_detail`: https://github.com/MIT-LCP/mimic-code/tree/master/concepts/demographics\n", "- `oasis`: https://github.com/MIT-LCP/mimic-code/tree/master/concepts/severityscores\n", "\n", "You will notice that our example restricts the analysis to:\n", "\n", "- first hospital admissions \n", "- patients who were `>= 16` years at time of hospital admission.\n", "- the first ICU stay (patients may move to the ICU multiple times within a hospital stay)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Run query and assign the results to a Pandas DataFrame\n", "# Requires the icustay_detail view from:\n", "# https://github.com/MIT-LCP/mimic-code/tree/master/concepts/demographics\n", "# And the OASIS score from:\n", "# https://github.com/MIT-LCP/mimic-code/tree/master/concepts/severityscores\n", "query = \\\n", "\"\"\"\n", "WITH first_icu AS (\n", " SELECT i.subject_id, i.hadm_id, i.icustay_id, i.gender, i.admittime admittime_hospital, \n", " i.dischtime dischtime_hospital, i.los_hospital, i.age, i.admission_type, \n", " i.hospital_expire_flag, i.intime intime_icu, i.outtime outtime_icu, i.los_icu, \n", " s.first_careunit\n", " FROM icustay_detail i\n", " LEFT JOIN icustays s\n", " ON i.icustay_id = s.icustay_id\n", " WHERE i.hospstay_seq = 1\n", " AND i.icustay_seq = 1\n", " AND i.age >= 16\n", ")\n", "SELECT f.*, o.icustay_expire_flag, o.oasis, o.oasis_prob\n", "FROM first_icu f\n", "LEFT JOIN oasis o\n", "ON f.icustay_id = o.icustay_id;\n", "\"\"\"\n", "\n", "data = pd.read_sql_query(query,con)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Check the extracted data\n", "\n", "It is always a good idea to inspect the data after you have extracted it. We will look at the first six patients (rows), and then check the number of rows, and get some summary statistics of the dataset." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index([u'subject_id', u'hadm_id', u'icustay_id', u'gender',\n", " u'admittime_hospital', u'dischtime_hospital', u'los_hospital', u'age',\n", " u'admission_type', u'hospital_expire_flag', u'intime_icu',\n", " u'outtime_icu', u'los_icu', u'first_careunit', u'icustay_expire_flag',\n", " u'oasis', u'oasis_prob'],\n", " dtype='object')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.columns" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idhadm_idicustay_idgenderadmittime_hospitaldischtime_hospitallos_hospitalageadmission_typehospital_expire_flagintime_icuouttime_iculos_icufirst_careuniticustay_expire_flagoasisoasis_prob
010076198503201006M2107-03-21 21:16:002107-03-30 12:00:008.613968.8636EMERGENCY12107-03-24 04:06:142107-03-31 06:55:097.1173MICU1420.305849
142321114648201204F2121-12-07 20:49:002121-12-12 16:40:004.827180.5627EMERGENCY02121-12-07 20:50:362121-12-09 18:43:581.9121CSRU0370.188911
210045126949203766F2129-11-24 00:31:002129-12-01 01:45:007.051468.6669EMERGENCY12129-11-24 22:46:572129-12-01 06:03:556.3034MICU1480.486353
310104177678204201F2120-08-24 17:39:002120-08-31 13:12:006.814670.5196EMERGENCY02120-08-24 23:47:232120-08-25 15:41:490.6628MICU0290.077479
410017199207204881F2149-05-26 17:19:002149-06-03 18:42:008.057673.6792EMERGENCY02149-05-29 18:52:292149-05-31 22:19:172.1436CCU0300.087098
\n", "
" ], "text/plain": [ " subject_id hadm_id icustay_id gender admittime_hospital \\\n", "0 10076 198503 201006 M 2107-03-21 21:16:00 \n", "1 42321 114648 201204 F 2121-12-07 20:49:00 \n", "2 10045 126949 203766 F 2129-11-24 00:31:00 \n", "3 10104 177678 204201 F 2120-08-24 17:39:00 \n", "4 10017 199207 204881 F 2149-05-26 17:19:00 \n", "\n", " dischtime_hospital los_hospital age admission_type \\\n", "0 2107-03-30 12:00:00 8.6139 68.8636 EMERGENCY \n", "1 2121-12-12 16:40:00 4.8271 80.5627 EMERGENCY \n", "2 2129-12-01 01:45:00 7.0514 68.6669 EMERGENCY \n", "3 2120-08-31 13:12:00 6.8146 70.5196 EMERGENCY \n", "4 2149-06-03 18:42:00 8.0576 73.6792 EMERGENCY \n", "\n", " hospital_expire_flag intime_icu outtime_icu los_icu \\\n", "0 1 2107-03-24 04:06:14 2107-03-31 06:55:09 7.1173 \n", "1 0 2121-12-07 20:50:36 2121-12-09 18:43:58 1.9121 \n", "2 1 2129-11-24 22:46:57 2129-12-01 06:03:55 6.3034 \n", "3 0 2120-08-24 23:47:23 2120-08-25 15:41:49 0.6628 \n", "4 0 2149-05-29 18:52:29 2149-05-31 22:19:17 2.1436 \n", "\n", " first_careunit icustay_expire_flag oasis oasis_prob \n", "0 MICU 1 42 0.305849 \n", "1 CSRU 0 37 0.188911 \n", "2 MICU 1 48 0.486353 \n", "3 MICU 0 29 0.077479 \n", "4 CCU 0 30 0.087098 " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
subject_id99.026324.37373716202.63565810006.00000010068.00000040124.00000042278.00000044228.000000
hadm_id99.0151749.45454528975.138680100375.000000127326.000000157466.000000174868.000000199395.000000
icustay_id99.0249167.94949527983.121905201006.000000224042.000000246080.000000271795.500000298685.000000
los_hospital99.010.00678514.1035710.0382003.5430506.83060011.922250123.984700
age99.089.07701564.85591917.19200065.70925076.93190085.233050300.003400
hospital_expire_flag99.00.3232320.4700910.0000000.0000000.0000001.0000001.000000
los_icu99.04.5863556.6774010.1059001.1237002.0141004.59700035.406500
icustay_expire_flag99.00.2222220.4178550.0000000.0000000.0000000.0000001.000000
oasis99.034.7474758.65712312.00000029.00000034.00000039.00000056.000000
oasis_prob99.00.1940930.1627870.0095220.0774790.1370990.2311020.724202
\n", "
" ], "text/plain": [ " count mean std min \\\n", "subject_id 99.0 26324.373737 16202.635658 10006.000000 \n", "hadm_id 99.0 151749.454545 28975.138680 100375.000000 \n", "icustay_id 99.0 249167.949495 27983.121905 201006.000000 \n", "los_hospital 99.0 10.006785 14.103571 0.038200 \n", "age 99.0 89.077015 64.855919 17.192000 \n", "hospital_expire_flag 99.0 0.323232 0.470091 0.000000 \n", "los_icu 99.0 4.586355 6.677401 0.105900 \n", "icustay_expire_flag 99.0 0.222222 0.417855 0.000000 \n", "oasis 99.0 34.747475 8.657123 12.000000 \n", "oasis_prob 99.0 0.194093 0.162787 0.009522 \n", "\n", " 25% 50% 75% \\\n", "subject_id 10068.000000 40124.000000 42278.000000 \n", "hadm_id 127326.000000 157466.000000 174868.000000 \n", "icustay_id 224042.000000 246080.000000 271795.500000 \n", "los_hospital 3.543050 6.830600 11.922250 \n", "age 65.709250 76.931900 85.233050 \n", "hospital_expire_flag 0.000000 0.000000 1.000000 \n", "los_icu 1.123700 2.014100 4.597000 \n", "icustay_expire_flag 0.000000 0.000000 0.000000 \n", "oasis 29.000000 34.000000 39.000000 \n", "oasis_prob 0.077479 0.137099 0.231102 \n", "\n", " max \n", "subject_id 44228.000000 \n", "hadm_id 199395.000000 \n", "icustay_id 298685.000000 \n", "los_hospital 123.984700 \n", "age 300.003400 \n", "hospital_expire_flag 1.000000 \n", "los_icu 35.406500 \n", "icustay_expire_flag 1.000000 \n", "oasis 56.000000 \n", "oasis_prob 0.724202 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.describe().T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Add day of week to DataFrame\n", "\n", "If we are going to examine the weekend effect, we need to pull this out of the dataset, as you can see, all we have above are dates. We will define a weekend, as anytime between Saturday (00:00:00) until Sunday (23:59:59). The dates above are shifted, and that's why they look odd, but they are matched on the day of week, so this aspect is preserved." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subject_idhadm_idicustay_idgenderadmittime_hospitaldischtime_hospitallos_hospitalageadmission_typehospital_expire_flag...los_icufirst_careuniticustay_expire_flagoasisoasis_probadmitday_hospitaldischday_hospitalinday_icuinday_icu_seqoutday_icu
027513163557200003M2199-08-02 17:02:002199-08-22 19:00:0020.081948.2960EMERGENCY0...5.8884SICU0350.152892FridayThursdayFriday4Thursday
120707129310200007M2109-02-17 10:02:002109-02-20 15:47:003.239643.3450EMERGENCY0...1.2914CCU0260.054187SundayWednesdaySunday6Monday
29514127229200014M2105-02-16 23:15:002105-02-21 13:46:004.604984.7300EMERGENCY0...1.7338SICU0560.724202MondaySaturdayMonday0Wednesday
321789112486200019F2178-07-08 09:02:002178-07-11 06:45:002.904982.8831EMERGENCY1...3.0594CCU1470.454600WednesdaySaturdayWednesday2Saturday
441710181955200028M2133-10-29 10:00:002133-11-01 14:54:003.204264.8677ELECTIVE0...2.9038CCU0350.152892ThursdaySundayThursday3Sunday
\n", "

5 rows × 22 columns

\n", "
" ], "text/plain": [ " subject_id hadm_id icustay_id gender admittime_hospital \\\n", "0 27513 163557 200003 M 2199-08-02 17:02:00 \n", "1 20707 129310 200007 M 2109-02-17 10:02:00 \n", "2 9514 127229 200014 M 2105-02-16 23:15:00 \n", "3 21789 112486 200019 F 2178-07-08 09:02:00 \n", "4 41710 181955 200028 M 2133-10-29 10:00:00 \n", "\n", " dischtime_hospital los_hospital age admission_type \\\n", "0 2199-08-22 19:00:00 20.0819 48.2960 EMERGENCY \n", "1 2109-02-20 15:47:00 3.2396 43.3450 EMERGENCY \n", "2 2105-02-21 13:46:00 4.6049 84.7300 EMERGENCY \n", "3 2178-07-11 06:45:00 2.9049 82.8831 EMERGENCY \n", "4 2133-11-01 14:54:00 3.2042 64.8677 ELECTIVE \n", "\n", " hospital_expire_flag ... los_icu first_careunit \\\n", "0 0 ... 5.8884 SICU \n", "1 0 ... 1.2914 CCU \n", "2 0 ... 1.7338 SICU \n", "3 1 ... 3.0594 CCU \n", "4 0 ... 2.9038 CCU \n", "\n", " icustay_expire_flag oasis oasis_prob admitday_hospital \\\n", "0 0 35 0.152892 Friday \n", "1 0 26 0.054187 Sunday \n", "2 0 56 0.724202 Monday \n", "3 1 47 0.454600 Wednesday \n", "4 0 35 0.152892 Thursday \n", "\n", " dischday_hospital inday_icu inday_icu_seq outday_icu \n", "0 Thursday Friday 4 Thursday \n", "1 Wednesday Sunday 6 Monday \n", "2 Saturday Monday 0 Wednesday \n", "3 Saturday Wednesday 2 Saturday \n", "4 Sunday Thursday 3 Sunday \n", "\n", "[5 rows x 22 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['admitday_hospital'] = data.admittime_hospital.dt.weekday_name\n", "data['dischday_hospital'] = data.dischtime_hospital.dt.weekday_name\n", "data['inday_icu'] = data.intime_icu.dt.weekday_name\n", "data['inday_icu_seq'] = data.intime_icu.dt.weekday\n", "data['outday_icu'] = data.outtime_icu.dt.weekday_name\n", "data.head()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Friday 6263\n", "Tuesday 6141\n", "Monday 6097\n", "Wednesday 5985\n", "Thursday 5876\n", "Saturday 4235\n", "Sunday 3960\n", "Name: inday_icu, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data['inday_icu'].value_counts()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "weekday 30362\n", "weekend 8195\n", "Name: inday_icu_wkd, dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# create weekday vs weekend column for icu_intime \n", "data['inday_icu_wkd'] = np.where(data.intime_icu.dt.weekday <= 4, \n", " 'weekday','weekend')\n", "data['inday_icu_wkd'].value_counts()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# Produce some Summary Statistics by DOW and Weekday vs. Weekend\n", "\n", "Next, it's good to look at some basic summaries of the data. We will compute simple averages and percentages/counts for each of the variables we have extracted, and look at it by day of week and weekend." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index([u'subject_id', u'hadm_id', u'icustay_id', u'gender',\n", " u'admittime_hospital', u'dischtime_hospital', u'los_hospital', u'age',\n", " u'admission_type', u'hospital_expire_flag', u'intime_icu',\n", " u'outtime_icu', u'los_icu', u'first_careunit', u'icustay_expire_flag',\n", " u'oasis', u'oasis_prob', u'admitday_hospital', u'dischday_hospital',\n", " u'inday_icu', u'inday_icu_seq', u'outday_icu', u'inday_icu_wkd'],\n", " dtype='object')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.columns" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Grouped by inday_icu
FridayMondaySaturdaySundayThursdayTuesdayWednesdayisnull
variablelevel
n6263609742353960587661415985
admission_typeELECTIVE1016 (16.22)1265 (20.75)162 (3.83)101 (2.55)999 (17.0)1292 (21.04)1243 (20.77)0
EMERGENCY5118 (81.72)4687 (76.87)3852 (90.96)3681 (92.95)4746 (80.77)4704 (76.6)4600 (76.86)
URGENT129 (2.06)145 (2.38)221 (5.22)178 (4.49)131 (2.23)145 (2.36)142 (2.37)
age74.56 (53.22)73.13 (51.37)73.92 (58.58)75.26 (60.66)75.51 (55.46)75.48 (55.22)74.16 (53.88)0
first_careunitCCU838 (13.38)918 (15.06)695 (16.41)621 (15.68)850 (14.47)919 (14.96)851 (14.22)0
CSRU1416 (22.61)1632 (26.77)237 (5.6)194 (4.9)1282 (21.82)1575 (25.65)1268 (21.19)
MICU2139 (34.15)1940 (31.82)1765 (41.68)1706 (43.08)2020 (34.38)2019 (32.88)2020 (33.75)
SICU1044 (16.67)865 (14.19)743 (17.54)743 (18.76)996 (16.95)933 (15.19)1038 (17.34)
TSICU826 (13.19)742 (12.17)795 (18.77)696 (17.58)728 (12.39)695 (11.32)808 (13.5)
genderF2662 (42.5)2559 (41.97)1857 (43.85)1736 (43.84)2603 (44.3)2671 (43.49)2636 (44.04)0
M3601 (57.5)3538 (58.03)2378 (56.15)2224 (56.16)3273 (55.7)3470 (56.51)3349 (55.96)
hospital_expire_flag05576 (89.03)5468 (89.68)3658 (86.38)3388 (85.56)5202 (88.53)5491 (89.42)5350 (89.39)0
1687 (10.97)629 (10.32)577 (13.62)572 (14.44)674 (11.47)650 (10.58)635 (10.61)
icustay_expire_flag05768 (92.1)5650 (92.67)3811 (89.99)3548 (89.6)5399 (91.88)5673 (92.38)5514 (92.13)0
1495 (7.9)447 (7.33)424 (10.01)412 (10.4)477 (8.12)468 (7.62)471 (7.87)
inday_icu_wkdweekday6263 (100.0)6097 (100.0)5876 (100.0)6141 (100.0)5985 (100.0)0
weekend4235 (100.0)3960 (100.0)
los_hospital10.23 (11.61)9.71 (9.92)10.00 (10.84)9.82 (10.70)9.88 (10.90)9.86 (10.58)9.77 (10.22)0
los_icu4.15 (6.19)3.83 (5.52)4.42 (6.58)4.44 (6.29)3.96 (6.12)3.80 (5.65)4.04 (5.94)2
oasis31.17 (9.02)31.21 (8.77)31.49 (9.27)32.07 (9.02)31.10 (8.82)30.90 (8.77)30.58 (9.11)0
oasis_prob0.14 (0.14)0.14 (0.14)0.15 (0.15)0.16 (0.15)0.14 (0.14)0.14 (0.14)0.14 (0.14)0
\n", "
" ], "text/plain": [ " Grouped by inday_icu \\\n", " Friday Monday \n", "variable level \n", "n 6263 6097 \n", "admission_type ELECTIVE 1016 (16.22) 1265 (20.75) \n", " EMERGENCY 5118 (81.72) 4687 (76.87) \n", " URGENT 129 (2.06) 145 (2.38) \n", "age 74.56 (53.22) 73.13 (51.37) \n", "first_careunit CCU 838 (13.38) 918 (15.06) \n", " CSRU 1416 (22.61) 1632 (26.77) \n", " MICU 2139 (34.15) 1940 (31.82) \n", " SICU 1044 (16.67) 865 (14.19) \n", " TSICU 826 (13.19) 742 (12.17) \n", "gender F 2662 (42.5) 2559 (41.97) \n", " M 3601 (57.5) 3538 (58.03) \n", "hospital_expire_flag 0 5576 (89.03) 5468 (89.68) \n", " 1 687 (10.97) 629 (10.32) \n", "icustay_expire_flag 0 5768 (92.1) 5650 (92.67) \n", " 1 495 (7.9) 447 (7.33) \n", "inday_icu_wkd weekday 6263 (100.0) 6097 (100.0) \n", " weekend \n", "los_hospital 10.23 (11.61) 9.71 (9.92) \n", "los_icu 4.15 (6.19) 3.83 (5.52) \n", "oasis 31.17 (9.02) 31.21 (8.77) \n", "oasis_prob 0.14 (0.14) 0.14 (0.14) \n", "\n", " \\\n", " Saturday Sunday Thursday \n", "variable level \n", "n 4235 3960 5876 \n", "admission_type ELECTIVE 162 (3.83) 101 (2.55) 999 (17.0) \n", " EMERGENCY 3852 (90.96) 3681 (92.95) 4746 (80.77) \n", " URGENT 221 (5.22) 178 (4.49) 131 (2.23) \n", "age 73.92 (58.58) 75.26 (60.66) 75.51 (55.46) \n", "first_careunit CCU 695 (16.41) 621 (15.68) 850 (14.47) \n", " CSRU 237 (5.6) 194 (4.9) 1282 (21.82) \n", " MICU 1765 (41.68) 1706 (43.08) 2020 (34.38) \n", " SICU 743 (17.54) 743 (18.76) 996 (16.95) \n", " TSICU 795 (18.77) 696 (17.58) 728 (12.39) \n", "gender F 1857 (43.85) 1736 (43.84) 2603 (44.3) \n", " M 2378 (56.15) 2224 (56.16) 3273 (55.7) \n", "hospital_expire_flag 0 3658 (86.38) 3388 (85.56) 5202 (88.53) \n", " 1 577 (13.62) 572 (14.44) 674 (11.47) \n", "icustay_expire_flag 0 3811 (89.99) 3548 (89.6) 5399 (91.88) \n", " 1 424 (10.01) 412 (10.4) 477 (8.12) \n", "inday_icu_wkd weekday 5876 (100.0) \n", " weekend 4235 (100.0) 3960 (100.0) \n", "los_hospital 10.00 (10.84) 9.82 (10.70) 9.88 (10.90) \n", "los_icu 4.42 (6.58) 4.44 (6.29) 3.96 (6.12) \n", "oasis 31.49 (9.27) 32.07 (9.02) 31.10 (8.82) \n", "oasis_prob 0.15 (0.15) 0.16 (0.15) 0.14 (0.14) \n", "\n", " \n", " Tuesday Wednesday isnull \n", "variable level \n", "n 6141 5985 \n", "admission_type ELECTIVE 1292 (21.04) 1243 (20.77) 0 \n", " EMERGENCY 4704 (76.6) 4600 (76.86) \n", " URGENT 145 (2.36) 142 (2.37) \n", "age 75.48 (55.22) 74.16 (53.88) 0 \n", "first_careunit CCU 919 (14.96) 851 (14.22) 0 \n", " CSRU 1575 (25.65) 1268 (21.19) \n", " MICU 2019 (32.88) 2020 (33.75) \n", " SICU 933 (15.19) 1038 (17.34) \n", " TSICU 695 (11.32) 808 (13.5) \n", "gender F 2671 (43.49) 2636 (44.04) 0 \n", " M 3470 (56.51) 3349 (55.96) \n", "hospital_expire_flag 0 5491 (89.42) 5350 (89.39) 0 \n", " 1 650 (10.58) 635 (10.61) \n", "icustay_expire_flag 0 5673 (92.38) 5514 (92.13) 0 \n", " 1 468 (7.62) 471 (7.87) \n", "inday_icu_wkd weekday 6141 (100.0) 5985 (100.0) 0 \n", " weekend \n", "los_hospital 9.86 (10.58) 9.77 (10.22) 0 \n", "los_icu 3.80 (5.65) 4.04 (5.94) 2 \n", "oasis 30.90 (8.77) 30.58 (9.11) 0 \n", "oasis_prob 0.14 (0.14) 0.14 (0.14) 0 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columns = ['gender', 'los_hospital', 'age', 'admission_type', 'hospital_expire_flag', \n", " 'los_icu','icustay_expire_flag', 'oasis', 'oasis_prob', 'first_careunit',\n", " 'inday_icu_wkd']\n", "\n", "groupby = 'inday_icu'\n", "\n", "pval = False\n", "\n", "categorical = ['gender','admission_type','hospital_expire_flag','icustay_expire_flag',\n", " 'first_careunit','inday_icu_wkd']\n", "\n", "t = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, pval=pval)\n", "t.tableone" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Grouped by inday_icu_wkd
isnullweekdayweekend
variablelevel
n303628195
admission_typeELECTIVE05815 (19.15)263 (3.21)
EMERGENCY23855 (78.57)7533 (91.92)
URGENT692 (2.28)399 (4.87)
age074.56 (53.84)74.57 (59.59)
first_careunitCCU04376 (14.41)1316 (16.06)
CSRU7173 (23.62)431 (5.26)
MICU10138 (33.39)3471 (42.36)
SICU4876 (16.06)1486 (18.13)
TSICU3799 (12.51)1491 (18.19)
genderF013131 (43.25)3593 (43.84)
M17231 (56.75)4602 (56.16)
hospital_expire_flag0027087 (89.21)7046 (85.98)
13275 (10.79)1149 (14.02)
icustay_expire_flag0028004 (92.23)7359 (89.8)
12358 (7.77)836 (10.2)
los_hospital09.89 (10.67)9.91 (10.77)
los_icu23.96 (5.89)4.43 (6.44)
oasis030.99 (8.90)31.77 (9.15)
oasis_prob00.14 (0.14)0.15 (0.15)
\n", "
" ], "text/plain": [ " Grouped by inday_icu_wkd \\\n", " isnull weekday \n", "variable level \n", "n 30362 \n", "admission_type ELECTIVE 0 5815 (19.15) \n", " EMERGENCY 23855 (78.57) \n", " URGENT 692 (2.28) \n", "age 0 74.56 (53.84) \n", "first_careunit CCU 0 4376 (14.41) \n", " CSRU 7173 (23.62) \n", " MICU 10138 (33.39) \n", " SICU 4876 (16.06) \n", " TSICU 3799 (12.51) \n", "gender F 0 13131 (43.25) \n", " M 17231 (56.75) \n", "hospital_expire_flag 0 0 27087 (89.21) \n", " 1 3275 (10.79) \n", "icustay_expire_flag 0 0 28004 (92.23) \n", " 1 2358 (7.77) \n", "los_hospital 0 9.89 (10.67) \n", "los_icu 2 3.96 (5.89) \n", "oasis 0 30.99 (8.90) \n", "oasis_prob 0 0.14 (0.14) \n", "\n", " \n", " weekend \n", "variable level \n", "n 8195 \n", "admission_type ELECTIVE 263 (3.21) \n", " EMERGENCY 7533 (91.92) \n", " URGENT 399 (4.87) \n", "age 74.57 (59.59) \n", "first_careunit CCU 1316 (16.06) \n", " CSRU 431 (5.26) \n", " MICU 3471 (42.36) \n", " SICU 1486 (18.13) \n", " TSICU 1491 (18.19) \n", "gender F 3593 (43.84) \n", " M 4602 (56.16) \n", "hospital_expire_flag 0 7046 (85.98) \n", " 1 1149 (14.02) \n", "icustay_expire_flag 0 7359 (89.8) \n", " 1 836 (10.2) \n", "los_hospital 9.91 (10.77) \n", "los_icu 4.43 (6.44) \n", "oasis 31.77 (9.15) \n", "oasis_prob 0.15 (0.15) " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columns = ['gender', 'los_hospital', 'age', 'admission_type', 'hospital_expire_flag', \n", " 'los_icu','icustay_expire_flag', 'oasis', 'oasis_prob', 'first_careunit']\n", "\n", "groupby = 'inday_icu_wkd'\n", "\n", "pval = False\n", "\n", "categorical = ['gender','admission_type','hospital_expire_flag','icustay_expire_flag',\n", " 'first_careunit']\n", "\n", "t = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, pval=pval)\n", "t.tableone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like there's a higher rate of hospital mortality (14.0% vs 10.8%) and ICU mortality (10.2% vs 7.8%) on weekends when compared to weekdays. There are also statistically significant differences between several other important variables, including: admission type, disease severity (OASIS), and the patient's first care unit, suggesting that these groups may be fundamentally different in some way. Let's explore this a little further." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Plot the data\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
admission_typeELECTIVEEMERGENCYURGENT
inday_icu_seq
00.0205530.1243870.137931
10.0193500.1303150.082759
20.0265490.1273910.112676
30.0270270.1331650.114504
40.0265750.1266120.093023
50.0802470.1370720.162896
60.0891090.1469710.123596
\n", "
" ], "text/plain": [ "admission_type ELECTIVE EMERGENCY URGENT\n", "inday_icu_seq \n", "0 0.020553 0.124387 0.137931\n", "1 0.019350 0.130315 0.082759\n", "2 0.026549 0.127391 0.112676\n", "3 0.027027 0.133165 0.114504\n", "4 0.026575 0.126612 0.093023\n", "5 0.080247 0.137072 0.162896\n", "6 0.089109 0.146971 0.123596" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Pivot data to summarise by day\n", "dat_dow = data.groupby(['admission_type',\n", " 'inday_icu_seq'])['hospital_expire_flag'].mean().reset_index()\n", "\n", "dat_dow = dat_dow.pivot(index='inday_icu_seq', \n", " columns='admission_type', values='hospital_expire_flag')\n", "\n", "dat_dow" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n" ] }, "metadata": { "jupyter-vega3": "#947caf96-c17f-4466-ad0c-6634cc666d2a" }, "output_type": "display_data" }, { "data": { "application/javascript": [ "var spec = {\"selection\": {\"grid\": {\"bind\": \"scales\", \"type\": \"interval\"}}, \"encoding\": {\"y\": {\"field\": \"Hospital mortality rate\", \"type\": \"quantitative\"}, \"x\": {\"field\": \"inday_icu_seq\", \"type\": \"quantitative\"}, \"color\": {\"field\": \"variable\", \"type\": \"nominal\"}}, \"height\": 300, \"width\": 450, \"$schema\": \"https://vega.github.io/schema/vega-lite/v2.json\", \"mark\": \"line\", \"data\": {\"values\": [{\"variable\": \"ELECTIVE\", \"inday_icu_seq\": 0, \"Hospital mortality rate\": 0.020553359683794466}, {\"variable\": \"ELECTIVE\", \"inday_icu_seq\": 1, \"Hospital mortality rate\": 0.01934984520123839}, {\"variable\": \"ELECTIVE\", \"inday_icu_seq\": 2, \"Hospital mortality rate\": 0.02654867256637168}, {\"variable\": \"ELECTIVE\", \"inday_icu_seq\": 3, \"Hospital mortality rate\": 0.02702702702702703}, {\"variable\": \"ELECTIVE\", \"inday_icu_seq\": 4, \"Hospital mortality rate\": 0.0265748031496063}, {\"variable\": \"ELECTIVE\", \"inday_icu_seq\": 5, \"Hospital mortality rate\": 0.08024691358024691}, {\"variable\": \"ELECTIVE\", \"inday_icu_seq\": 6, \"Hospital mortality rate\": 0.0891089108910891}, {\"variable\": \"EMERGENCY\", \"inday_icu_seq\": 0, \"Hospital mortality rate\": 0.12438660123746532}, {\"variable\": \"EMERGENCY\", \"inday_icu_seq\": 1, \"Hospital mortality rate\": 0.13031462585034015}, {\"variable\": \"EMERGENCY\", \"inday_icu_seq\": 2, \"Hospital mortality rate\": 0.12739130434782608}, {\"variable\": \"EMERGENCY\", \"inday_icu_seq\": 3, \"Hospital mortality rate\": 0.13316477033291194}, {\"variable\": \"EMERGENCY\", \"inday_icu_seq\": 4, \"Hospital mortality rate\": 0.12661195779601406}, {\"variable\": \"EMERGENCY\", \"inday_icu_seq\": 5, \"Hospital mortality rate\": 0.13707165109034267}, {\"variable\": \"EMERGENCY\", \"inday_icu_seq\": 6, \"Hospital mortality rate\": 0.1469709318120076}, {\"variable\": \"URGENT\", \"inday_icu_seq\": 0, \"Hospital mortality rate\": 0.13793103448275862}, {\"variable\": \"URGENT\", \"inday_icu_seq\": 1, \"Hospital mortality rate\": 0.08275862068965517}, {\"variable\": \"URGENT\", \"inday_icu_seq\": 2, \"Hospital mortality rate\": 0.11267605633802817}, {\"variable\": \"URGENT\", \"inday_icu_seq\": 3, \"Hospital mortality rate\": 0.11450381679389313}, {\"variable\": \"URGENT\", \"inday_icu_seq\": 4, \"Hospital mortality rate\": 0.09302325581395349}, {\"variable\": \"URGENT\", \"inday_icu_seq\": 5, \"Hospital mortality rate\": 0.16289592760180996}, {\"variable\": \"URGENT\", \"inday_icu_seq\": 6, \"Hospital mortality rate\": 0.12359550561797752}]}};\n", "var selector = \"#947caf96-c17f-4466-ad0c-6634cc666d2a\";\n", "var type = \"vega-lite\";\n", "\n", "var output_area = this;\n", "require(['nbextensions/jupyter-vega3/index'], function(vega) {\n", " vega.render(selector, spec, type, output_area);\n", "}, function (err) {\n", " if (err.requireType !== 'scripterror') {\n", " throw(err);\n", " }\n", "});\n" ] }, "metadata": { "jupyter-vega3": "#947caf96-c17f-4466-ad0c-6634cc666d2a" }, "output_type": "display_data" }, { "data": { "image/png": "" }, "metadata": { "jupyter-vega3": "#947caf96-c17f-4466-ad0c-6634cc666d2a" }, "output_type": "display_data" } ], "source": [ "# day_map = {0:'Mon', 1:'Tue', 2:'Wed', 3:'Thu', 4:'Fri', 5:'Sat', 6:'Sun'}\n", "dat_dow.vgplot.line(value_name='Hospital mortality rate')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
admission_typeELECTIVEEMERGENCYURGENT
inday_icu_wkd
weekday0.0237320.1283590.108382
weekend0.0836500.1419090.145363
\n", "
" ], "text/plain": [ "admission_type ELECTIVE EMERGENCY URGENT\n", "inday_icu_wkd \n", "weekday 0.023732 0.128359 0.108382\n", "weekend 0.083650 0.141909 0.145363" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dat_wkd = data.groupby(['admission_type','inday_icu_wkd'])['hospital_expire_flag'].mean().reset_index()\n", "dat_wkd = dat_wkd.pivot(index='inday_icu_wkd', columns='admission_type', values='hospital_expire_flag')\n", "dat_wkd.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n" ] }, "metadata": { "jupyter-vega3": "#5ee9966a-ba11-4902-afdb-475556325e20" }, "output_type": "display_data" }, { "data": { "application/javascript": [ "var spec = {\"selection\": {\"grid\": {\"bind\": \"scales\", \"type\": \"interval\"}}, \"encoding\": {\"y\": {\"field\": \"Hospital mortality rate\", \"type\": \"quantitative\"}, \"x\": {\"field\": \"inday_icu_wkd\", \"type\": \"nominal\"}, \"color\": {\"field\": \"variable\", \"type\": \"nominal\"}}, \"height\": 300, \"width\": 450, \"$schema\": \"https://vega.github.io/schema/vega-lite/v2.json\", \"mark\": \"line\", \"data\": {\"values\": [{\"variable\": \"ELECTIVE\", \"Hospital mortality rate\": 0.023731728288907995, \"inday_icu_wkd\": \"weekday\"}, {\"variable\": \"ELECTIVE\", \"Hospital mortality rate\": 0.08365019011406843, \"inday_icu_wkd\": \"weekend\"}, {\"variable\": \"EMERGENCY\", \"Hospital mortality rate\": 0.1283588346258646, \"inday_icu_wkd\": \"weekday\"}, {\"variable\": \"EMERGENCY\", \"Hospital mortality rate\": 0.14190893402362936, \"inday_icu_wkd\": \"weekend\"}, {\"variable\": \"URGENT\", \"Hospital mortality rate\": 0.10838150289017341, \"inday_icu_wkd\": \"weekday\"}, {\"variable\": \"URGENT\", \"Hospital mortality rate\": 0.14536340852130325, \"inday_icu_wkd\": \"weekend\"}]}};\n", "var selector = \"#5ee9966a-ba11-4902-afdb-475556325e20\";\n", "var type = \"vega-lite\";\n", "\n", "var output_area = this;\n", "require(['nbextensions/jupyter-vega3/index'], function(vega) {\n", " vega.render(selector, spec, type, output_area);\n", "}, function (err) {\n", " if (err.requireType !== 'scripterror') {\n", " throw(err);\n", " }\n", "});\n" ] }, "metadata": { "jupyter-vega3": "#5ee9966a-ba11-4902-afdb-475556325e20" }, "output_type": "display_data" }, { "data": { "image/png": "" }, "metadata": { "jupyter-vega3": "#5ee9966a-ba11-4902-afdb-475556325e20" }, "output_type": "display_data" } ], "source": [ "dat_wkd.vgplot.line(value_name='Hospital mortality rate')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Model building\n", "\n", "Let's try to incorporate what we saw above into a very simple model. We will use logistic regression with hospital mortality as our outcome. First an unadjusted estimate, and then we will try to adjust for admission type.\n", "\n", "The unadjusted analysis should mirror pretty closely what we saw in the one of the tables above. The odds ratio corresponding with 14.0% and 10.8% mortality in the the weekend and weekday groups, respectively, is about 1.35. Performing logistic regression on the same data:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: GLM AIC: 27416.8526
Link Function: logit BIC: -379723.8199
Dependent Variable: hospital_expire_flag Log-Likelihood: -13706.
Date: 2018-03-02 10:45 LL-Null: -13738.
No. Observations: 38557 Deviance: 27413.
Df Model: 1 Pearson chi2: 3.86e+04
Df Residuals: 38555 Scale: 1.0000
Method: IRLS
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept -2.1127 0.0185 -114.2000 0.0000 -2.1490 -2.0765
C(inday_icu_wkd)[T.weekend] 0.2992 0.0368 8.1288 0.0000 0.2270 0.3713
" ], "text/plain": [ "\n", "\"\"\"\n", " Results: Generalized linear model\n", "=============================================================================\n", "Model: GLM AIC: 27416.8526 \n", "Link Function: logit BIC: -379723.8199\n", "Dependent Variable: hospital_expire_flag Log-Likelihood: -13706. \n", "Date: 2018-03-02 10:45 LL-Null: -13738. \n", "No. Observations: 38557 Deviance: 27413. \n", "Df Model: 1 Pearson chi2: 3.86e+04 \n", "Df Residuals: 38555 Scale: 1.0000 \n", "Method: IRLS \n", "-----------------------------------------------------------------------------\n", " Coef. Std.Err. z P>|z| [0.025 0.975]\n", "-----------------------------------------------------------------------------\n", "Intercept -2.1127 0.0185 -114.2000 0.0000 -2.1490 -2.0765\n", "C(inday_icu_wkd)[T.weekend] 0.2992 0.0368 8.1288 0.0000 0.2270 0.3713\n", "=============================================================================\n", "\n", "\"\"\"" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# R style syntax\n", "simple_glm = smf.glm('hospital_expire_flag ~ C(inday_icu_wkd)', \n", " data=data, family=sm.families.Binomial()).fit()\n", "simple_glm.summary2()\n", "\n", "# Alternative syntax\n", "# y = data.hospital_expire_flag\n", "# X = sm.tools.add_constant(data.inday_icu_wkd.factorize()[0])\n", "# simple_glm = sm.GLM(y, X, family=sm.families.Binomial()).fit()\n", "# simple_glm.summary2()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "...yields the same results. The coefficient shown above for weekend is on the log scale, so when we exponentiate it, we get the odds-ratio: `exp(0.2992) = 1.35`. So, looking at these crude rates and odds ratios, we can see that patients admitted on a weekend have about a 35% increase in the odds of dying in the hospital when compared to those on a weekday. This effect is statistically significant (p<0.001). \n", "\n", "Are we done?\n", "\n", "I hope not. We saw from the tables and figures above, there is likely some confounding and maybe even effect modification happening. Next let''s look at admission type and weekend ICU admission in the same model. There are two such models we could consider. \n", "\n", "The first adjusts for admission type, but assumes that the effect of weekend admission is the same regardless if the patient is of any of the admission types. The second one adjusts for admission type, but then allows the effect of weekend ICU admission to vary across the different levels of admission type. \n", "\n", "The first type of model would be able to account for confounding (when a nuisance variable is associated with both the outcome and the exposure/variable of interest), while the second permits what is called effect modification or a statistical interaction. \n", "\n", "Interactions are sometimes difficult to understand, but if ignored, can lead to incorrect conclusions about the effect of one or more of the variables. In this example, we fit both models, output estimates of the log-odds ratios, and perform a hypothesis test which evaluates the statistical significance of dropping one of the variables. Below is the resulting output:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: GLM AIC: 26729.0141
Link Function: logit BIC: -380394.5386
Dependent Variable: hospital_expire_flag Log-Likelihood: -13361.
Date: 2018-03-02 10:45 LL-Null: -13738.
No. Observations: 38557 Deviance: 26721.
Df Model: 3 Pearson chi2: 3.85e+04
Df Residuals: 38553 Scale: 1.0000
Method: IRLS
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept -3.6173 0.0801 -45.1364 0.0000 -3.7743 -3.4602
C(inday_icu_wkd)[T.weekend] 0.1444 0.0372 3.8871 0.0001 0.0716 0.2172
C(admission_type)[T.EMERGENCY] 1.6944 0.0822 20.6095 0.0000 1.5333 1.8555
C(admission_type)[T.URGENT] 1.5881 0.1231 12.9034 0.0000 1.3469 1.8293
" ], "text/plain": [ "\n", "\"\"\"\n", " Results: Generalized linear model\n", "===============================================================================\n", "Model: GLM AIC: 26729.0141 \n", "Link Function: logit BIC: -380394.5386\n", "Dependent Variable: hospital_expire_flag Log-Likelihood: -13361. \n", "Date: 2018-03-02 10:45 LL-Null: -13738. \n", "No. Observations: 38557 Deviance: 26721. \n", "Df Model: 3 Pearson chi2: 3.85e+04 \n", "Df Residuals: 38553 Scale: 1.0000 \n", "Method: IRLS \n", "-------------------------------------------------------------------------------\n", " Coef. Std.Err. z P>|z| [0.025 0.975]\n", "-------------------------------------------------------------------------------\n", "Intercept -3.6173 0.0801 -45.1364 0.0000 -3.7743 -3.4602\n", "C(inday_icu_wkd)[T.weekend] 0.1444 0.0372 3.8871 0.0001 0.0716 0.2172\n", "C(admission_type)[T.EMERGENCY] 1.6944 0.0822 20.6095 0.0000 1.5333 1.8555\n", "C(admission_type)[T.URGENT] 1.5881 0.1231 12.9034 0.0000 1.3469 1.8293\n", "===============================================================================\n", "\n", "\"\"\"" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Without effect modification\n", "adj_glm = smf.glm('hospital_expire_flag ~ C(inday_icu_wkd) + C(admission_type)', \n", " data=data, family=sm.families.Binomial()).fit()\n", "adj_glm.summary2()\n", "# drop1(adj.glm,test=\"Chisq\")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Model: GLM AIC: 26712.4403
Link Function: logit BIC: -380393.9926
Dependent Variable: hospital_expire_flag Log-Likelihood: -13350.
Date: 2018-03-02 10:45 LL-Null: -13738.
No. Observations: 38557 Deviance: 26700.
Df Model: 5 Pearson chi2: 3.86e+04
Df Residuals: 38551 Scale: 1.0000
Method: IRLS
\n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "\n", " \n", "\n", "
Coef. Std.Err. z P>|z| [0.025 0.975]
Intercept -3.7169 0.0862 -43.1428 0.0000 -3.8858 -3.5481
C(inday_icu_wkd)[T.weekend] 1.3232 0.2388 5.5409 0.0000 0.8551 1.7912
C(admission_type)[T.EMERGENCY] 1.8014 0.0883 20.4002 0.0000 1.6283 1.9744
C(admission_type)[T.URGENT] 1.6095 0.1496 10.7598 0.0000 1.3164 1.9027
C(inday_icu_wkd)[T.weekend]:C(admission_type)[T.EMERGENCY] -1.2071 0.2418 -4.9913 0.0000 -1.6812 -0.7331
C(inday_icu_wkd)[T.weekend]:C(admission_type)[T.URGENT] -0.9872 0.3036 -3.2521 0.0011 -1.5822 -0.3922
" ], "text/plain": [ "\n", "\"\"\"\n", " Results: Generalized linear model\n", "===========================================================================================================\n", "Model: GLM AIC: 26712.4403 \n", "Link Function: logit BIC: -380393.9926\n", "Dependent Variable: hospital_expire_flag Log-Likelihood: -13350. \n", "Date: 2018-03-02 10:45 LL-Null: -13738. \n", "No. Observations: 38557 Deviance: 26700. \n", "Df Model: 5 Pearson chi2: 3.86e+04 \n", "Df Residuals: 38551 Scale: 1.0000 \n", "Method: IRLS \n", "-----------------------------------------------------------------------------------------------------------\n", " Coef. Std.Err. z P>|z| [0.025 0.975]\n", "-----------------------------------------------------------------------------------------------------------\n", "Intercept -3.7169 0.0862 -43.1428 0.0000 -3.8858 -3.5481\n", "C(inday_icu_wkd)[T.weekend] 1.3232 0.2388 5.5409 0.0000 0.8551 1.7912\n", "C(admission_type)[T.EMERGENCY] 1.8014 0.0883 20.4002 0.0000 1.6283 1.9744\n", "C(admission_type)[T.URGENT] 1.6095 0.1496 10.7598 0.0000 1.3164 1.9027\n", "C(inday_icu_wkd)[T.weekend]:C(admission_type)[T.EMERGENCY] -1.2071 0.2418 -4.9913 0.0000 -1.6812 -0.7331\n", "C(inday_icu_wkd)[T.weekend]:C(admission_type)[T.URGENT] -0.9872 0.3036 -3.2521 0.0011 -1.5822 -0.3922\n", "===========================================================================================================\n", "\n", "\"\"\"" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# With effect modification\n", "adj_glm_int = smf.glm('hospital_expire_flag ~ C(inday_icu_wkd) * C(admission_type)', \n", " data=data, family=sm.families.Binomial()).fit()\n", "adj_glm_int.summary2()\n", "# drop1(adj.glm,test=\"Chisq\")" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
admission_typeinday_icu_wkd
0ELECTIVEweekday
1ELECTIVEweekend
2EMERGENCYweekday
3EMERGENCYweekend
4URGENTweekday
5URGENTweekend
\n", "
" ], "text/plain": [ " admission_type inday_icu_wkd\n", "0 ELECTIVE weekday\n", "1 ELECTIVE weekend\n", "2 EMERGENCY weekday\n", "3 EMERGENCY weekend\n", "4 URGENT weekday\n", "5 URGENT weekend" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create data structure to hold odds of hospital death\n", "def expand_grid(data_dict):\n", " rows = itertools.product(*data_dict.values())\n", " return pd.DataFrame.from_records(rows, columns=data_dict.keys())\n", "\n", "weekend_grid = expand_grid({'inday_icu_wkd': ['weekday', 'weekend'],\n", " 'admission_type': ['ELECTIVE', 'EMERGENCY', 'URGENT']})\n", "\n", "weekend_grid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the first model (no interaction), we see that although the effect of weekend is almost halved, it remains statistically significant, after adjusting for admission type (p<0.001).\n", "\n", "In the second model, we are primarily interested in the significance of the interaction. We can see when assessed with the `drop1` function, the interaction (`weekend:admission_type`) is statistically significant (p<0.001), suggesting that the effect of weekend may be different depending on which hospital admission type you are. How exactly to interpret this:\n", "\n", "One way of looking at this complexity is by computing the odds ratio in each of the levels of admission type. We can do this using the `predict` function, which by default outputs the log-odds of death. If for each hospital admission type, we calculate the log odds of death for each of the levels of weekend," ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def prob2logodds(prob):\n", " odds = prob / (1 - prob)\n", " logodds = np.log(odds)\n", " return logodds" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
predictlog_odds
inday_icu_wkdadmission_type
weekdayELECTIVE0.023732-3.716925
weekendELECTIVE0.083650-2.393754
weekdayEMERGENCY0.128359-1.915548
weekendEMERGENCY0.141909-1.799525
weekdayURGENT0.108382-2.107381
weekendURGENT0.145363-1.771439
\n", "
" ], "text/plain": [ " predict log_odds\n", "inday_icu_wkd admission_type \n", "weekday ELECTIVE 0.023732 -3.716925\n", "weekend ELECTIVE 0.083650 -2.393754\n", "weekday EMERGENCY 0.128359 -1.915548\n", "weekend EMERGENCY 0.141909 -1.799525\n", "weekday URGENT 0.108382 -2.107381\n", "weekend URGENT 0.145363 -1.771439" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weekend_grid['predict'] = adj_glm_int.predict(weekend_grid[['inday_icu_wkd','admission_type']])\n", "weekend_grid['log_odds'] = prob2logodds(weekend_grid['predict'])\n", "weekend_grid.set_index(['inday_icu_wkd','admission_type'], inplace=True)\n", "weekend_grid" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now compute the log odds ratio ($log(OR) = logOdds_{weekend} - logOdds_{weekday}$), and exponentiate to get the odds ratio:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "admission_type\n", "ELECTIVE 3.755307\n", "EMERGENCY 1.123022\n", "URGENT 1.399257\n", "Name: log_odds, dtype: float64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "diff_grid = weekend_grid.loc['weekend']['log_odds'] - weekend_grid.loc['weekday']['log_odds']\n", "np.exp(diff_grid)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, this mirrors what we saw above. While there may be differences between EMERGENCY and URGENT admission types, an ELECTIVE admission occurring on a weekend has an odds of mortality almost four times that of an ELECTIVE admission on a weekday. This seems particularly odd -- patients usually do not get admitted to a hospital electively on a weekend.\n", "\n", "What do you think?\n", "\n", "- Do patients admitted on a weekend have a higher rate of mortality than those admitted during the week?\n", "- Who is most affected, if at all?\n", "- What factors can you rule out might be causing this effect? e.g., is it because the patients are simply sicker on a weekend? Are they more likely to have complications?\n", "\n", "Looking forward to see what you guys come up with!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 2 }