Consultation on External Data Sources
This service is provided by the Biomedical Informatics Resource (BMIR) and the CD2H Hub Liaison Team.
Overview
Our Biomedical Informatics Resource and Network Capacity teams can guide researchers at Columbia University Irving Medical Center (CUIMC) to request access to clinical health data from several external sources. Learn more about each of these sources below.
IBM MarketScan Data Sets/Commercial Claims and Medicare Supplemental Databases
BMIR in conjunction with the Department of Biomedical Informatics (DBMI) successfully negotiated with IBM for the purchase of a MarketScan data use license agreement. Access to this data will allow unprecedented accessibility to longitudinal claims and clinical data, thus facilitating epidemiological studies, benefitting the entire research community. Through an outreach process, individual investigators as well as departments were contacted to determine interest. Current users include the Cancer Center, and the Department of Digestive and Liver Diseases.
These files were produced using IBM MarketScan® databases. The MarketScan databases reflect the healthcare experience of employees and dependents covered by the health benefit programs of large employers. These claims data are collected from approximately 100 different insurance companies, Blue Cross Blue Shield plans, and third party administrators. These data represent the medical experience of insured employees and their dependents for active employees, early retirees, COBRA continuees and Medicare-eligible retirees with employer-provided Medicare Supplemental plans. Coverage is provided under a variety of fee-for-service, fully capitated, and partially capitated health plans, including preferred provider organizations, point of service plans, indemnity plans, and health maintenance organizations.
The IBM MarketScan® databases include the MarketScan Commercial Database; the MarketScan Medicare Database; and the MarketScan Multi-State Medicaid Database.
If you would like more information, please contact Linda Busacca at lb103@cumc.columbia.edu
All of Us Researcher Workbench
The Researcher Workbench platform and its suite of custom tools are available to approved researchers. The Researcher Workbench provides access to Registered Tier data. Its powerful tools support data analysis and collaboration. The workbench also provides integrated help and educational resources through the Workbench User Support Hub. Our team can assist you in becoming an approved researcher. Submit a request to begin the approval process.
INSIGHT Clinical Research Network (CRN)
INSIGHT Clinical Research Network (CRN) is the largest urban clinical network in the nation. Bringing together five top academic medical centers across New York City, INSIGHT collects comprehensive clinical records for 12 million unique patients. This robust dataset reflects the racial, ethnic, and socioeconomic diversity of the population as well as the extensive set of healthcare services offered across a fragmented healthcare landscape. The Network also provides important research services, such as clinical trial support, patient engagement, and a centralized IRB, to advance and streamline research. Columbia University participates in the INSIGHT consortium and researchers can make use of INSIGHT data for a fee. Submit a request to learn more about accessing this data.
National COVID Cohort Collaborative (N3C)
The National COVID Cohort Collaborative (N3C) is a collaboration among the NCATS-supported Clinical and Translational Science Awards (CTSA) Program hubs, the National Center for Data to Health (CD2H), distributed clinical data networks (PCORnet, OHDSI, ACT, TriNetX), and other partner organizations, with overall stewardship by NIH’s National Center for Advancing Translational Sciences (NCATS). The N3C aims to improve the efficiency and accessibility of analyses with COVID-19 clinical data, expand our ability to analyze and understand COVID, and demonstrate a novel approach for collaborative pandemic data sharing. Submit a request to learn more about accessing this data.
Examples of published papers from Columbia researches using these data are:
- Bennett TD, Moffitt RA, Hajagos JG, et al. The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction. medRxiv [Preprint]. 2021 Jan 13:2021.01.12.21249511. doi: 10.1101/2021.01.12.21249511. PMID: 33469592; PMCID: PMC7814838.
- Sharafeldin N, Bates B, Song Q, Madhira V, Yan Y, Dong S, Lee E, Kuhrt N, Shao YR, Liu F, Bergquist T, Guinney J, Su J, Topaloglu U. Outcomes of COVID-19 in Patients With Cancer: Report From the National COVID Cohort Collaborative (N3C). J Clin Oncol. 2021 Jun 4:JCO2101074. doi: 10.1200/JCO.21.01074. Epub ahead of print. PMID: 34085538.
Columbia Open Health Data (COHD)
Columbia Open Health Data (COHD) provides access to counts and patient prevalence (i.e., prevalence from electronic health records) of conditions, procedures, drug exposures, and patient demographics, and the co-occurrence frequencies among them. Count and frequency data were derived from the Columbia University Irving Medical Center's OHDSI database including inpatient and outpatient data. Counts are the number of patients with the concept, e.g., diagnosed with a condition, exposed to a drug, or who had a procedure. Frequencies are the number of patients with the concept divided by the total number of patients in the dataset. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model. To protect patient privacy, all concepts and pairs of concepts where the count ≤ 10 were excluded, and counts were randomized by the Poisson distribution.
Two released data sets and one beta data set are available:
- 5-year non-hierarchical dataset: Includes clinical data from 2013-2017
- Lifetime non-hierarchical dataset: Includes clinical data from all dates
- BETA! 5-year hierarchical dataset: Counts for each concept include patients from descendant concepts. Includes clinical data from 2013-2017.
Columbia Open Health Data for COVID (COVID-COHD)
Columbia Open Health Data (COHD) for COVID-19 Research provides access to counts and visit prevalence (i.e., prevalence from electronic health records) of conditions, procedures, drug exposures, and the co-occurrence frequencies between them. Count and frequency data were derived from the Columbia University Irving Medical Center's OHDSI database including inpatient data. Counts are the number of visits with the concept, e.g., diagnosed with a condition, exposed to a drug, or a procedure was performed. Frequencies are the number of visits with the concept divided by the total number of visits in the dataset. Clinical concepts (e.g., conditions, procedures, drugs) are coded by their standard concept ID in the OMOP Common Data Model. To protect patient privacy, all concepts and pairs of concepts where the count ≤ 10 were excluded, and counts were randomized by the Poisson distribution.
Datasets from three primary cohorts are available:
- COVID-19: Hospitalized patients aged 18 or older with a COVID-19 related condition diagnosis and/or a confirmed positive COVID-19 test during their hospitalization period or within the prior 21 days. Date range: March 1, 2020 to September 1, 2020. This cohort is also further stratified by sex (male and female) and age (adult: 18-64, senior: 65+).
- General inpatient: All hospitalized patients aged 18 or older. Date range: January 1, 2014 to December 31, 2019.
- Influenza: Hospitalized patients aged 18 or older who had at least one occurrence of influenza conditions or pre-coordinated positive measurements or positive influenza testing in the prior 21 days or during their hospitalization period. Date range: January 1, 2014 to December 31, 2019.
Both hierarchical and non-hierarchical datasets are available for each cohort. In the hierarchical datasets, the counts for each concept include the visits from all descendant concepts. For example, the count for ibuprofen (ID 1177480) includes visits with Ibuprofen 600 MG Oral Tablet (ID 19019073), Ibuprofen 400 MG Oral Tablet (ID 19019072), Ibuprofen 20 MG/ML Oral Suspension (ID 19019050), etc.
COVID-19 Collaboration Platform
The COVID-19 Collaboration Platform (covidcp.org) is a method for finding effective treatments for COVID-19 by sharing protocols for randomized clinical trials. This platform is a home for RCT protocols that are available for collaboration.
If protocols are public and open for collaboration, an RCT can be picked up in different regions as the outbreak moves across the country. CovidCP publicizes protocols whose PIs are open to various levels of collaboration: joining forces with other research teams to create a core protocol; admitting new sites under the existing PI and IRB; sharing anonymized interim and/or final data (through Vivli) with other sites that choose to independently operate a trial under a similar protocol.
For more information, email contact@covidcp.org
Eligibility
This service is available to all researchers, clinical trialists, study team members, students and faculty at Columbia University.
When to Request This Service
This service should be requested at least one month prior to any grant or protocol submission deadlines.
Cost
There is no cost for this service.
Cite it, Submit it, Share it!
If your research has benefited from one or more Irving Institute resources, please remember to:
Cite our CTSA grant, UL1 TR001873, in any relevant publications, abstracts, chapters, and/or posters.
Submit your publications to PubMed Central (PMC) for compliance with the NIH Public Access Policy.
Share your research updates with us by sending an email to: irving_institute@cumc.columbia.edu(link sends e-mail)