MIMIC-IV, a freely accessible digital well being file dataset

MIMIC-IV, a freely accessible digital well being file dataset

Health Information Technology

Entry to MIMIC-IV is supplied by way of PhysioNet19. The hosp and icu modules in MIMIC-IV can be found within the MIMIC-IV challenge on PhysioNet20. The observe module is offered from the MIMIC-IV-Be aware: Deidentified free-text medical notes challenge on PhysioNet21.

Desk 1 summarizes the demographics for ICU sufferers in MIMIC-IV. Determine 3 visualizes a affected person who’s admitted to an ICU for a cardiac arrest, discharged to a common ward, admitted to the working room, has a deliberate readmission to an ICU after their operation, and is finally discharged house.

Desk 1 Demographics for sufferers admitted to an intensive care unit (ICU) in MIMIC-IV v2.2.
Fig. 3
figure 3

Visualization of knowledge inside MIMIC-IV for a single affected person’s hospitalization: hadm_id 28503629. Three vertically stacked panels spotlight the number of info obtainable. Very important indicators are proven within the high panel: observe the frequency of knowledge assortment for temperature is way increased firstly of the ICU keep because of the usage of focused temperature administration. Procedures from a number of sources are proven within the center panel, together with from billing info, the supplier order entry system, in addition to the ICU info system. The underside panel shows affected person laboratory measurements. Be aware that whereas frequent important indicators are solely obtainable when the affected person is within the ICU, laboratory measures can be found all through their hospitalization.

Hospital module (hosp)

The hosp module shops info concerning affected person transfers, billed occasions, remedy prescription, remedy administration, laboratory values, microbiology measurements, and supplier orders. The subject_id column is current in all tables and permits linkage to affected person demographics within the sufferers desk. The hadm_id column can be current in all tables and represents a single hospitalization; rows with out an hadm_id pertain to knowledge collected outdoors of an inpatient encounter. Most tables could also be interpreted with out cross-linking to different tables. Tables which comprise item_ids are an exception, they usually have to be linked to a dimension desk prefixed with d_ as a way to purchase a human interpretable description of the item_ids. Different tables, similar to emar and poe, could also be linked with “element” tables (emar_detail and poe_detail) which offer extra info for every row.

Affected person monitoring

Affected person demographics and in-hospital motion are described in three tables: sufferers, admissions, and transfers. Every distinct affected person is assigned a subject_id, and the sufferers desk has a singular subject_id for every row. The sufferers desk supplies the affected person’s administrative gender, their age, and their date of loss of life.

With the intention to appropriately deidentify the precise date of affected person stays, the sufferers desk accommodates an anchor_year column. This column “anchors” knowledge saved within the sufferers desk to a 12 months occurring of their deidentified timeline (e.g. 2150). At this deidentified 12 months, the affected person’s age is supplied within the anchor_age column and the approximate true 12 months of admission is supplied within the anchor_year_group column. For instance, if a affected person’s anchor_age is 50 and their anchor_year is 2150, then they had been 50 years outdated within the 12 months 2150. Persevering with the instance, if this affected person’s anchor_year_group is 2011–2013, then we all know that any hospitalizations occurring within the deidentified 12 months 2150 (i.e. the anchor_year) really occurred someday between 2011 and 2013, and that they had been 50 years outdated throughout this time interval. The anchor_year_group column was added to MIMIC-IV to permit analyses which incorporate adjustments in medical observe over time.

Lastly, a affected person’s date of loss of life is offered within the dod column. Dates of loss of life are censored at one-year from the affected person’s final hospital discharge. Consequently, null dates of loss of life point out the affected person was alive at the least as much as that point level. Inferences concerning affected person loss of life past one 12 months can’t be made utilizing MIMIC-IV. Nearly all of affected person loss of life info is acquired from hospital data when the person dies throughout the BIDMC or an affiliated institute.


Three tables within the hosp module present administration associated info: companies, poe, and poe_detail. The companies desk supplies info on the hospital-related service below which a affected person is hospitalized. The poe and poe_detail tables comprise orders made within the supplier order entry (POE) system. The POE system is used throughout the hospital to make orders associated to diagnoses, imaging, session, and therapy. Usually the poe tables present the date and time of an order similar to an x-ray examine, remedy order, or vitamin order, however present restricted element concerning the order itself.


Billing info is saved within the diagnoses_icd, procedures_icd, drgcodes, and hcpcsevents tables. The diagnoses_icd desk accommodates coded diagnoses representing the hospitalization as decided by educated professionals after reviewing signed affected person notes. The ontology of the diagnoses_icd desk is the Worldwide Classification of Ailments, Ninth Revision, Medical Modification (ICD-9-CM) diagnoses and the ICD Tenth Revision, Medical Modification (ICD-10-CM) diagnoses. Definitions for ICD codes are supplied within the d_icd_diagnoses desk. A most of 39 diagnoses could also be billed for a single hospital encounter, and seq_num supplies an approximate ordering of analysis. There are few incentives for the billing division to make sure seq_num is an ideal rank ordering of analysis significance, nonetheless, and warning ought to be taken when utilizing seq_num for analysis functions. An analogous desk construction is adopted for billed procedures that are saved within the procedures_icd desk with descriptions of codes supplied within the d_icd_procedures desk.

Diagnoses are recorded with the ICD-9-CM or ICD-10-CM ontologies, whereas procedures are recorded with the ICD-9-PCS or ICD-10-PCS ontologies. As these ontologies had been up to date all through the info assortment interval of MIMIC-IV, the d_icd_diagnoses and d_icd_procedures tables comprise all codes which had been legitimate at any level throughout the 2008–2019 time interval.

Prognosis Associated Teams (DRGs) are billable codes used to assign an total value to a hospitalization. Many ontologies for DRG codes exist, and the drg_type column shops the ontology for the given row. The ultimate billing tables are hcpcsevents and its related dimension desk d_hcpcsevents. The hcpcsevents desk data billing by the hospital for supplied companies similar to mechanical air flow or provision of ICU care.


Measurements sourced from affected person derived specimens can be found in microbiologyevents and labevents, with the d_labitems desk offering definitions for ideas current within the labevents desk. Laboratory measurements have a less complicated construction in comparison with microbiology measurements although each relate to affected person derived specimens similar to blood. A number of measurements are sometimes taken for a single specimen, delineated by the specimen_id column within the labevents desk and the micro_specimen_id column in microbiologyevents. For instance, blood gasoline measurements made on the identical pattern will share the identical specimen_id with one idea specifying the kind of specimen (arterial, venous, and many others).

Microbiology measurements are saved in a single desk with columns devoted to area particular ideas. Measurements comply with a directed hierarchy of specimen, organism, isolate, antibiotic, and dilution. To simplify evaluation of this knowledge, components increased within the hierarchy similar to specimen and organism are repeated for components decrease within the hierarchy similar to antibiotic and dilution. Microbiology cultures usually have interim outcomes reported to the care suppliers. This info isn’t captured on this desk, which solely supplies the ultimate interpretation of a microbiology tradition because it was documented at storetime.

Lastly, the omr desk supplies info from the On-line Medical Document (OMR) for the affected person. OMR is a common system used for documenting affected person info from visits at BIDMC affiliated institutes. As of MIMIC-IV v2.2, the OMR desk accommodates knowledge for 5 measurements: blood strain, top, weight, physique mass index, and the Estimated Glomerular Filtration Charge (eGFR). These values can be found from each inpatient and outpatient visits, and in lots of circumstances a “baseline” worth from earlier than a affected person’s hospitalization is offered.


There are 4 tables within the hosp module which observe remedy prescription and administration: prescriptions, pharmacy, emar, and emar_detail. The prescriptions and pharmacy tables are meant for use collectively: prescriptions accommodates the order made by a supplier and pharmacy shops detailed info concerning the compound prescribed. Not all prescribed compounds are related to an entry within the pharmacy desk.

The opposite two remedy associated tables, emar and emar_detail, are sourced from the digital Drugs Administration Document (eMAR). The eMAR system requires barcode scanning of a affected person wristband and the remedy on the time of administration and was deployed all through the BIDMC between 2014–2016. By 2016, all models of the hospital had the eMAR system deployed, and thus all hospitalizations from 2016 onward could be anticipated to have data inside eMAR. Not like the prescriptions desk which shops remedy requests, the eMAR system data administration. The emar desk has one row per administration, with emar_id uniquely figuring out rows and emar_seq being a monotonically rising integer ordering occasions chronologically. The poe_id and pharmacy_id columns permit linking to the poe and pharmacy tables, respectively. Importantly, each row in emar hyperlinks to a number of rows in emar_detail. As every formulary dose have to be scanned as part of the workflow, an administration of 200 mg with formulary doses of 100 mg will lead to three rows in emar_detail: one row for the general administration (with a lacking worth for parent_field_ordinal), and two rows for every scanned formulary dose (with rising values of parent_field_ordinal). Columns which describe all the administration occasion similar to complete_dose_not_given and dose_due are solely current for the first row. Most columns consult with particular person formulary doses (dose_given, product_description, and so forth), and are solely current for the formulary dose rows.

Determine 4 visualizes the complementary info current within the emar, emar_detail, prescriptions, and inputevents tables for a single affected person.

Fig. 4
figure 4

Visualization of remedy info documented inside MIMIC-IV for a single affected person’s hospitalization: hadm_id 28503629. The annotated gray line signifies care models for the affected person all through their keep. Bolus medicines are indicated by markers, steady infusions as strains, and vary doses as crammed bins. For instance, on day 5 of their hospital keep, the affected person had two energetic prescriptions for heparin (one for 1600–3500 models of heparin, brown crammed field, and one for 1000 models of heparin, pink line with triangles). Moreover on day 5, the affected person continued to obtained heparin in accordance with emar (orange circle), and was imminently transferred to the drugs/cardiology intermediate ward.

ICU module (icu)

The MetaVision medical info system (iMDsoft, Israel) is the supply of knowledge for sufferers admitted to the ICU. MetaVision was the one medical info system used within the ICU for the time interval of knowledge assortment for MIMIC-IV. Tables within the icu module embrace chartevents, d_items, datetimeevents, icustays, inputevents, outputevents, and procedureevents. The icu module adopts a star schema, with all occasion tables referencing d_items for outlining itemid and icustays for outlining stay_id.

The stay_id column is a major key for the icustays desk, and as such is exclusive for every row. ICU stays are outlined utilizing the executive file of affected person motion throughout the hospital, i.e. the icustays desk is derived from the transfers desk within the hosp module. ICU stays are recognized utilizing a lookup desk matching bodily location to an ICU value middle. Every switch that corresponds to an ICU keep is assigned a stay_id, and consecutive transfers are merged right into a single stay_id. The time of admission (intime) and discharge (outtime) can be found within the icustays desk for every ICU keep. Importantly, if a switch to a non-ICU ward happens between two ICU stays, a singular stay_id shall be assigned to every of the 2 stays.

Every documented merchandise is colloquially known as an “occasion” within the icu module, and occasions are grouped into tables based mostly on the underlying knowledge sort. Occasions which correspond to dates, such because the time of final dialysis, are saved in datetimeevents with the worth column comparable to the date. Steady and bolus infusions are supplied within the inputevents desk with a starttime, endtime, charge, and quantity. Affected person outputs are documented within the outputevents desk with a single numeric worth occurring at a single charttime. The procedureevents desk captures processes which have a starttime and endtime together with organ help remedies similar to mechanical air flow. Lastly, the chartevents desk is the biggest of all of the occasions desk, and acts as a catch-all for documentation on the bedside. Every row in chartevents has a charttime, indicating the time at which the measurement was related and a worth column storing the worth documented. All occasions tables comprise affected person subject_id, hadm_id, and stay_id, in addition to a storetime indicating the time at which the measurement was validated by bedside workers.

Notes module (observe)

The observe module accommodates free-text, deidentified medical notes. The notes are organized into two tables: discharge, and radiology. Discharge summaries, saved within the discharge desk, are in-depth notes which overview a affected person’s historical past and course all through a given hospitalization. Discharge summaries are organized into sections together with chief grievance, historical past of current sickness, previous medical historical past, transient hospital course, bodily exams, and discharge diagnoses. As part of the deidentification course of, the Social Historical past and Discharge Directions sections have been eliminated. These sections usually contained social and logistical info which was irrelevant for medical care however launched a better threat of reidentification as in comparison with different sections. Auxiliary info related to every observe has been saved in entity-attribute-value tables with the “_detail” suffix. For the discharge summaries these knowledge can be found within the discharge_detail desk.

The radiology desk accommodates radiologist stories for imaging research carried out. Radiology stories cowl a large set of imaging modalities together with x-ray, computed tomography, magnetic resonance imaging (MRI), and ultrasound. Radiology stories comply with structured reporting greatest practices and have a devoted part for the indication, comparability, findings, and impression of the imaging examine. For extra in-depth imaging scans similar to full physique MRIs, stories could describe findings organized in accordance with the physique system examined. The related radiology_detail desk supplies a coded ontology for radiology examinations in addition to present procedural terminology (CPT) codes for every examine. If an addendum for a report exists, the radiology_detail supplies the related note_id.

Constructing on earlier variations of MIMIC

MIMIC-IV is comparable in some ways to earlier variations of MIMIC. Tables with equivalent names in MIMIC-III and MIMIC-IV shall be broadly suitable. The similarities assist to make sure that code and analyses may be carried over from research developed on earlier variations of the database. Desk 2 summarizes notable adjustments for customers transitioning from MIMIC-III to MIMIC-IV.

Desk 2 Main adjustments between MIMIC-III v1.4 and MIMIC-IV v2.2.

Importantly, all affected person identifiers have been regenerated for MIMIC-IV. Consequently, it’s not doable to hyperlink sufferers throughout the databases utilizing an identifier similar to subject_id, regardless that MIMIC-III and MIMIC-IV have an overlap of their knowledge assortment durations (particularly the years 2008–2012). To help analysis that spans the durations of MIMIC-III (2002–2008) and MIMIC-IV (2008–2019), we now have revealed the MIMIC-III Medical Database CareVue subset22. The CareVue subset accommodates solely these sufferers from MIMIC-III who are usually not in MIMIC-IV.