Skip to content

ehrQL: electronic health record query language🔗

Danger

This page discusses the new OpenSAFELY Data Builder for accessing OpenSAFELY data sources.

Use OpenSAFELY cohort-extractor, unless you are specifically involved in the development or testing of Data Builder.

OpenSAFELY Data Builder and its documentation are still undergoing extensive development. We will announce when Data Builder is ready for general use on the Platform News page.

ehrQL, or electronic health record query language, is the language used by Data Builder to extract datasets for analysis. It was developed to help researchers write simple, unambiguous queries; and to help researchers write queries that weren't anticipated by ehrQL's developers.

The first stage when carrying out research with the OpenSAFELY platform is for the researcher to write a dataset definition in ehrQL. A dataset definition specifies the criteria for selecting, and the characteristics of, the population. It is used by Data Builder to extract a dataset from an EHR system, where a dataset is a tabular data structure with one row per patient and one column per characteristic.

The following dataset definition, which is written in ehrQL, is from the tutorial. It specifies a single criterion for selecting the population; namely, that the population should consist of all patients who were born in or after 2000. It also specifies a single characteristic of the population; namely, each patient's year of birth.

from databuilder.ehrql import Dataset
from databuilder.tables.beta.smoketest import patients

year_of_birth = patients.date_of_birth.year
dataset = Dataset()
dataset.define_population(year_of_birth >= 2000)
dataset.year_of_birth = year_of_birth

You can find out more about ehrQL in the tutorial and the examples. For in-depth technical documentation, there is also the reference.

TO BE REPLACED IN FULL DOCS BUILD

This snippet will be replaced in the main docs with the parent file 'includes/glossary.md'