Cohort Discovery

Cohort Discovery is a Gateway feature that allows users to carry out a more specific search and assessment on a subset of datasets listed in the Gateway, without having access to the underlying data.

Users can specify defined characteristics relevant to their proposed analysis (e.g. patients that don’t smoke aged between 18-30 and who live in England) through the Cohort Discovery user interface. This requirement is then sent as a query to multiple pseudonymised datasets, with results returned in the form of a numerical count of individuals that meet those criteria.

Undertaking Cohort Discovery before requesting access to the relevant data improves research productivity, by avoiding requests for data not suitable for the research need and by improving the specificity of future data access requests where a cohort of sufficient size is identified.


About Cohort Discovery

Cohort Discovery has the potential to save researchers time in finding datasets that are suitable for their research, and also save data custodians time by minimising enquiries to them about the content of the datasets they hold.

Cohort Discovery is able to send a query to run against pseudonymised (de-identified) datasets that are hosted, managed and remain behind the firewall of a data custodian. The query looks for matches to a set of characteristics defined by the user and a numerical response (count) is returned from each dataset showing the number of people in the dataset who meet the characteristics selected.

Researchers can then understand whether a dataset contains a cohort (group) of interest and if yes, submit data access request(s) to the appropriate data custodian(s).

Statistical disclosure control policies are in place for each data custodian, so low numbers are excluded from query results and results may also be rounded to eliminate any potential risk of identification.

This functionality has been developed as part of the CO-CONNECT programme and further information can be found on their website.

The summary metadata of the current datasets that are available to query via Cohort Discovery can be found in this Collection on the Gateway.

How you can request access to Cohort Discovery

In line with the UK Health Data Research Alliance principles for participation we use a proportionate governance approach based on the Five Safes Framework. For accessing Cohort Discovery, we focus on Safe People and Safe Projects as Safe Setting, Safe Data and Safe Outputs are managed by the data and technology partners.

To access Cohort Discovery you must demonstrate your Safe People status either as a researcher, NHS analyst or equivalent. This will be assessed based on your Gateway registered user profile, including institutional email address, role description and ORCID record.

Please ensure you are a registered user of the Gateway and that your profile is up to date and includes your institutional email address and role description before you submit a request to access Cohort Discovery.

To satisfy a proportionate assessment of Safe Project, you will also need to provide information on why you are requesting access, which will be reviewed to ensure there is potential for public benefit.  Access, if granted, will be for a period of 6-months after which you will need to renew.

If, after your application, your Safe People status or the potential public benefit is indeterminate, we will contact you for further information and reserve the right not to provide access. 

Sign up / Sign in
to the Gateway.

If your request to access Cohort Discovery has been approved, you can access the tool by navigating to the dataset search page.

Scroll down past the filters on the left of the page until the Cohort Discovery component is in view.

Click on the green button "Search using Cohort Discovery”.