<< Back to posts

Walkthrough of the OMOP CDM (Part 1)

Defining core concepts in the OMOP CDM

Posted on December 30, 2022 • Tags: omop databases health IT ehrs healthcare

In Part 1 of this series on the OMOP CDM, we explain what the OMOP CDM is, define key terms such as “concept”, “source value”, “vocabulary”, and “domain”, and describe how they related to each other.

This post is organized as follow:

  1. Motivation
  2. Technical Overview of OMOP CDM
  3. Terminology + Key Concepts
  4. Domains, Vocabularies, and Concepts
  5. Field Naming Conventions
  6. Read / Write Permissions

Motivation

Problem: Healthcare Data Lacks Standardization

Healthcare data is messy, and electronic health records (EHR) are notoriously incompatible.

Making health records “interoperable” has been a multi-decade battle.

Almost every EHR has a unique database set-up, and the specific values stored within those databases often differ across hospitals (i.e. some use ICD-10 codes for billing, others ICD-9 codes, etc.).

As a concrete example, the below image shows five different EHR datasets, where each row represents the same exact type of event being recorded:

Screen Shot 2022-12-29 at 3.51.38 AM

Note that each database has a different number of columns, column names, and cell values, even though they all encode the same event!

Solution: A “Common Data Model” (CDM)

The OMOP CDM solves this problem by creating a standard format for working with healthcare data.

More specifically, OMOP (Observational Medical Outcomes Partnership) Common Data Model (CDM) is a standardized EHR database schema defined by the OHDSI (Observational Health Data Science & Informatics) consortium.

Let’s see what the five aforementioned tables look like once they’ve been converted into the OMOP CDM format:

Screen Shot 2022-08-11 at 8.51.28 PM

Much cleaner!

This enables research done at one hospital to easily transfer to any other hospital that also uses the OMOP CDM.

OMOP CDM creates a standard schema (“common data model”) and a standard set of medical terms (“common representation”) to improve healthcare data interoperability.

As the OHDSI community page outlines:

“The OMOP Common Data Model allows for the **systematic analysis of disparate observational databases [i.e. different hospitals’ EHRs] **.

The concept behind this approach is to transform data contained within those databases into a common format (data model) as well as a common representation (terminologies, vocabularies, coding schemes), and then perform systematic analyses using a library of standard analytic routines that have been written based on the common format.

Technical Overview of OMOP CDM

The OMOP CDM is a person-centric, relational database schema that contains 39 tables split over 7 main table categories.

The below image (taken from here) shows the structure of the OMOP CDM.

Screen Shot 2022-12-29 at 12.05.05 AM

Let’s breakdown the bolded terms in the above sentence piece-by-piece.

39 tables

Each white box in the above image is a unique table. Each table contains a set of fields (i.e. columns) which have a specifically defined datatype, semantic meaning, and relationship to the other tables. The columns for all 39 tables are defined here.

7 main table categories

The tables are categorized into the following main groups:

Table Name Color # of Tables
Clinical data Blue 16
Vocabularies Orange 10
Health system Red 4
Derived elements Light purple 3
Health economics Green 2
Metadata Turquoise 2
Results Dark purple 2

The Results tables are the only ones that are writable by end-users.

Person-centric

We take a person-centric view of healthcare, where the individual patient is our primary unit of analysis.

Every table containing clinical events (i.e. everything in the blue Clinical data square above) must be linked to a specific person in the PERSON table.

Relational Database Schema

The OMOP CDM is simply a blueprint (i.e. “schema”) for how data should be structured.

OMOP CDM does not require any particular software implementation – any type of relational database will suffice (e.g. MySQL, Postgres, SQL Server, Oracle, etc.)

A more detailed view of the OMOP schema, with all foreign keys (required in solid black lines, optional in dashed lines) between tables, is shown below (using the alternative grouping described here):

Screen Shot 2022-12-31 at 2.12.24 AM

Terminology + Key Concepts

tl;dr

A “concept” is any sort of entity in your data (diseases, diagnoses, procedures, medical terms, etc.).

The relationship between source values, concepts, domains, and vocabularies is as follows:

  1. A source value is the raw clinical event extracted from the EHR (“0.011”)
  2. The source value gets transformed into a source concept (“44828631” for ICD9CM/011 for pulmonary tuberculosis), which has associated with it one domain (e.g. “Condition”) and one vocabulary (e.g. “ICD9CM”)
    1. If a source value does not have a corresponding code in an existing vocabulary, it may be defined de-novo as a custom “OMOP” code, and would then just be referred to as a concept (since there’s no “source” vocab for that concept)
  3. If this is a non-standard concept, it then gets transformed into a standard concept (253954 for pulmonary tuberculosis, taken from SNOMED/154283005).

Here, we define each of these terms and *underline* their relationship to each other.

Source Value

A source value is the raw form that a clinical fact was represented in the original source EHR.

  • Usage
    • The [EVENT]_SOURCE_VALUE column of tables
  • Example: 011 – This could be ICD9CM code for “Pulmonary Tuberculosis”, the DRG code for “Nervous System Neoplasms without Complications, Comorbidities”, or the UB04 code for “Hospital Inpatient (Including Medicare Part A)”

A source value gets transformed into a...**source concept**

Source Concept

A source concept is an OMOP-specific entity that “normalize[s] the meaning of a clinical fact.” In other words, it is a fully standardized, uniform representation of a clinical fact from the source (Note: This occurs before conversion to a standard concept, as described below)

  • Definition
    • Table: CONCEPT
    • Fields (shown in table below)
  • Usage
    • In the [EVENT]_SOURCE_CONCEPT_ID column of tables as foreign keys to the CONCEPT table
  • Example
    • 44828631 – This is unambigously ICD9CM/011 for “Pulmonary Tuberculosis”, and belongs to the domain “Condition”
Field Example Value Note
CONCEPT_ID 313217 Primary key (globally unique)
CONCEPT_NAME Atrial fibrillation English description
DOMAIN_ID Condition Single alphanumeric string for its one domain
VOCABULARY_ID SNOMED Vocab this concept came from
CONCEPT_CLASS_ID Clinical finding Class in original vocab (SNOMED)
STANDARD_CONCEPT S If S , then standard concept. Otherwise, NULL
CONCEPT_CODE 49436004 Code in original vocab (SNOMED)
VALID_START_DATE 01-Jan-1970 Time interval code is valid
VALID_END_DATE 31-Dec-2099 Time interval code is valid
INVALID_REASON   Time interval code is valid

A source concept gets transformed into a...**standard concept** (sometimes).

Standard Concept

A standard concept is the single concept chosen for usage across the OMOP CDM to represent a clinical fact, when multiple source concepts from different vocabularies are synonymous. This ensures normalization across analyses.

  • Definition
    • Table: CONCEPT
    • Fields:
      • Have a S value in the column STANDARD_CONCEPT
  • Usage
    • In the [EVENT]_CONCEPT_ID column of tables as foreign keys to the CONCEPT table
  • Example
    • 253954 – This is unambigously the standard concept for “Pulmonary Tuberculosis.” It comes from the concept SNOMED/154283005, and is used instead of 44828631 (which is the ICD9CM/011 code for pulmonary tuberculosis)

Each concept belongs to a single...**domain**

Domain

A domain is a collection of concepts. It is a generic classification of the type of a concept (e.g. “Drug”, “Device”, “Measurement”).

  • Domains appear in the DOMAIN_ID column of the CONCEPT table
    • Each domain is a short case-sensitive alphanumeric string
  • One-to-many relationship from a domain -> concepts

Each concept also belongs to a single...**vocabulary**

Vocabulary

A vocabulary is also a collection of concepts. It represents the source where the corresponding concept was defined (e.g. “ICD9CM”). Thus, concepts in the same vocabulary can span many different domains.

  • Vocabularies appear in the VOCABULARY_ID column of the CONCEPT table
    • Each vocabulary is a short case-sensitive alphanumeric string
  • One-to-many relationship from a vocabulary -> concepts

Domains, Vocabularies, and Concepts

I’ve include more detail below on domains, vocabularies, and concepts for the curious reader.

Domains

There are a total of 30 domains in OMOP CDM.

They are defined in the DOMAIN table.

  • DOMAIN_ID = primary key – unique alphanumeric string of <=20 chars (e.g. “Drug”, “Device”)
  • Linked to concepts as a foreign key on the DOMAIN_ID field in the CONCEPT table

A list of all domains and their associated concepts is reproduced below (taken from here):

Domain Concept Count Domain (cont.) Concept Count (cont.)
Drug 1731378 Route 183
Device 477597 Currency 180
Procedure 257000 Payer 158
Condition 163807 Visit 123
Observation 145898 Cost 51
Measurement 89645 Race 50
Spec Anatomic Site 33759 Plan Stop Reason 13
Meas Value 17302 Plan 11
Specimen 1799 Episode 6
Provider Specialty 1215 Sponsor 6
Unit 1046 Meas Value Operator 5
Metadata 944 Spec Disease Status 3
Revenue Code 538 Gender 2
Type Concept 336 Ethnicity 2
Relationship 194 Observation Type 1

Vocabularies

There are a total of 111 vocabularies currently supported by OHDSI, of which 78 are externally developed and 33 were internally developed by OHDSI.

They are defined in the VOCABULARY table.

  • VOCABULARY_ID = primary key – unique alphanumeric string of <=20 chars (e.g. “SNOMED”, “ICD9CM”)
  • Linked to concepts as a foreign key on the VOCABULARY_ID field in the CONCEPT table

A standardized vocabulary is the 1+ vocabular(ies) for each domain that take(s) primacy over all other vocabularies. All of the concepts within a domain will be mapped to a concept from that domain’s standardized vocabulary.

The table below shows the standardized vocabularies for some domains, taken from here:

Domain Standard Concepts
Condition SNOMED, ICDO3
Procedure SNOMED, CPT4, HCPCS, ICD10PCS, ICD9Proc, OPCS4
Measurement SNOMED, LOINC
Drug RxNorm, RxNorm Extension, CVX
Device SNOMED
Observation SNOMED
Visit CMS Place of Service, ABMT, NUCC

Concepts

The below image, taken from here, shows the IDs of all concepts in the CONCEPT table:

Screen Shot 2022-12-29 at 3.45.21 AM

Note: Athena is a useful web GUI provided by OHDSI for searching through all OMOP concepts.

Relationships Between Concepts

Relationships between two concepts can exist across domains/vocabularies.

They are defined in the RELATIONSHIP table.

  • RELATIONSHIP_ID = primary key – unique alphanumeric string of <=20 chars, defines the type of relationship (e.g. “Maps to”, “Equivalent concepts” )

  • Linked to concepts as a foreign key on the RELATIONSHIP_ID field in the CONCEPT_RELATIONSHIP table

Mapping Source Concepts -> OMOP Standard Concepts

Here, we’ll walk through an example taken from the OHDSI training materials.

Goal: We’re given the ICD-9 concept 427.31 (atrial fibrilation). How do we get its corresponding standard OMOP concept?

Steps:

First, find the CONCEPT_ID in the CONCEPT table that corresponds to ICD-9 code 427.31

SELECT * FROM concept WHERE concept_code = '427.31'

Screen Shot 2022-12-03 at 1.42.14 AM

Here, we see that its correspondingCONCEPT_ID = 44821957

Second, map the CONCEPT_ID to its corresponding standard concept ID using the CONCEPT_RELATIONSHIP table.

Note that the RELATIONSHIP_ID = 'Maps to' maps a non-standard concept (labeled as concept_id_1 below) to its OMOP standard concept (labeled as concept_id_2)

SELECT * FROM concept_relationship WHERE concept_id_1 = 44821957 AND relationship_id = 'Maps to';

Screen Shot 2022-12-03 at 1.45.51 AM

That’s it! We now know that the OMOP standard concept ID 313217 corresponds to ICD-9 code 427.31 (atrial fibrilation).

Generalizing:

Note that this process works for every concept and vocabulary.

We started with the ICD-9 code for atrial fibrilation (427.31), but we could also have used the Read code for atrial fibrilation (G573000), or the SNOMED code for atrial fibrilation (49436004).

The magic of OMOP is that all of these concepts will map to the same standard concept ID of 313217, which makes sense since they all refer to the same thing!

Mapping Lab Tests

One source concept can map to multiple concepts.

This is often the case for lab tests – the source concept will contain an attribute (e.g. the name of the test) and an associated value (e.g. the test result).

The source concept will have a “Maps to” relationship to the standard concept for the type of lab test, while the value will have a “Maps to Value” relationship to the standard concept for the test result.

The below image illustrates this mapping and is taken from here:

Screen Shot 2022-12-29 at 3.07.41 AM

Field Naming Conventions

OMOP tables follow a standardized naming convention for some of their columns. This makes it simple to understand the relationships between tables.

The table below is taken from here and defines the meaning of columns in a given table named [Event]

Notation Description
[Event]_ID Unique primary key.
Usage: Foreign key for other Event tables.
Example: PERSON_ID for patients, VISIT_OCCURRENCE_ID for Visits
[Event]_CONCEPT_ID Foreign key to a Standard Concept in the CONCEPT table.
Usage: Maps event to a normalized, unambiguous representation
Example: CONDITION_CONCEPT_ID = 31967 references the SNOMED concept “Nausea”
[Event]_SOURCE _CONCEPT_ID Foreign key to a Concept in the CONCEPT table.
This is the equivalent of Source Value, and it may happen to be a Standard Concept, at which point it would be identical to the [Event]_CONCEPT_ID
Example: CONDITION_SOURCE_CONCEPT_ID = 45431665 refernces the Read concept “Nausea”
[Event]_TYPE_CONCEPT_ID Foreign key to a Concept in the CONCEPT table.
Represents the capture mechanism that created this record.
Example: DRUG_TYPE_CONCEPT_ID = "Pharmacy dispensing" if Drug was derived from a dispensing in the pharmacy, or "Prescription written" if derived from e-prescribing
[Event]_SOURCE_VALUE Verbatim code or free text of this record in the source data.
Example: CONDITION_SOURCE_VALUE = 78702 might correspond to ICD-9 code “787.02” written without a dot

Read / Write Permissions

End-users can only write to the COHORT and COHORT_DEFINITION tables (from the Results category).

All of the other OMOP tables are read-only.

The only user that can write to these other tables is the ETL pipeline that the hospital uses to extract data out of their EHR and into OMOP.

References

  1. Book of OHDSI
  2. OMOP Common Data Model Docs
  3. OMOP CDM Tutorial