-
Notifications
You must be signed in to change notification settings - Fork 10
Home
This Wiki describes how the Ontario Laboratories Information System (OLIS) Coronavirus Disease 2019 (COVID-19) laboratory results data are received, processed, used and interpreted at ICES – an independent, non-profit research institute in Ontario, Canada. OLIS is the provincial repository of laboratory tests and results, and is provided to ICES by eHealth Ontario (now part of Ontario Health Digital Services, OHDS).
COVID-19 is a highly infectious disease that is caused by the SARS-CoV-2 virus. It was first reported in China in December 2019, and the first known case in Ontario was on January 25, 2020. Coronaviruses are transmitted through respiratory droplets, causing respiratory infections. Symptoms can range from mild fever and cough, to more severe breathing difficulties, pneumonia, and death. COVID-19 infection is confirmed through laboratory tests by taking an upper or lower respiratory tract specimen (e.g. nasopharyngeal swab) and performing a real-time polymerase chain reaction (PCR) test. For more information on COVID-19, refer to the Public Health Agency of Canada website – Coronavirus disease (COVID-19). All Ontario labs reporting COVID-19 tests must adhere to reporting guidelines set out by the OHDS, including ensuring that all results are reported to OLIS[1].
Since March 2020, the ICES OLIS COVID-19 Data Working Group (DWG), in collaboration with teams from the Health Analytics and Insights and Health Data Science branches from Ontario’s Ministry of Health (MOH), have worked swiftly to understand, interpret, and validate the OLIS COVID-19 data so they can be used for reporting and research purposes to address immediate public health and health systems questions.
On April 7th, 2020, ICES started to receive daily feeds of COVID-19 test orders recorded in OLIS, from OHDS. The data received are a minimum dataset extracted from lab orders with COVID-19-specific test request (TR) codes or Logical Observation Identifiers Names and Codes (LOINC), and other TR/LOINC codes indicative of viral or respiratory virus testing. See section iv) Codes, scale or range of values for the full list of the TR and LOINC codes used.
The goal was to create an efficient method to accurately interpret large amounts of incoming COVID-19 data so that it can be used for research purposes as rapidly as possible. A set of scripts were created at ICES to process the OHDS original minimum dataset into two research-ready datasets by: 1) parsing lab results pertaining to SARS-CoV-2 and other respiratory viruses, and 2) rolling-up the COVID-19 lab results into more clinically relevant “testing episodes”. We used Jupyter Notebook (including Python libraries: pandas, numpy, nltk, re) and designed the algorithms based on laboratory data from OLIS, but respiratory virus test results are likely to be reported in a similar fashion elsewhere.
The original minimum dataset received from OHDS contains identifiers to allow for linkage, in addition to test result information such as patient IDs, order IDs, lab names, TR codes, LOINC codes, test result release times, test result statuses, and test result in free-text. The free-text test result in this dataset is the main focus of cleaning activities using the COVID19_processing.ipynb script developed by ICES. This script first cleans the text in the text field using some string manipulation and regular expressions, then employs tokenization to split the strings into smaller units (tokens).
The output of this script assigns test results for COVID-19 and other respiratory viruses at the observation-level, which are appended to the original minimum dataset. The original free-text test results and other identifying variables are removed before posting this dataset.
The following variables are appended to the original minimum dataset:
- covid – SARS-CoV-2
- adenovirus - Adenovirus
- bocavirus – Bocavirus
- coronavirus – Seasonal coronavirus, which is distinct from COVID-19
- flu – Influenza
- flu_a – Influenza A
- flu_a_h1 – Influenza A/H1N1
- flu_a_h3 - Influenza A/H3N2
- flu_b – Influenza B
- entero_rhino – Enterovirus / Rhinovirus
- hmv – Human metapneumovirus
- para – Parainfluenza
For each virus, a value is assigned, as Positive [P], Presumptive-positive [S], Indeterminate [I], Negative [N], Pending [D], Cancelled [C], Rejected [R]. If the value is null, it is assumed that the individual was not tested for that specific virus on that date.
The analysis here is applied at the TEST RESULT level, and each observation that is part of the test result will have the same virus interpretations. Before any downstream analysis, the output file of this script should be "rolled up" into more clinically relevant units of analysis (e.g., test result -> test request -> lab report -> testing episode). See Step 2 below.
This script is modified and refined daily as new data comes in and as COVID-19 testing expands to more laboratories. Please refer to the Overview section of the script for a more detailed description.
The recommended analytical dataset is derived by aggregating information from the minimum dataset (created in Step 1) by unique individual + testing date (i.e., testing episode), to represent the overall COVID-19 testing result status for each tested individual on each day, using the COVID19_rollup.ipynb script created by ICES.
Table 1. Variables included in the recommended analytical dataset.
Variable | Label |
---|---|
prov | Province based on the postal code from OLIS record |
pstlcode | Postal code from OLIS record |
bdate | Patient Date of Birth from OLIS record |
sex | Patient Sex from OLIS record |
ikn | ICES Key Number |
valikn | Valid IKN? |
observationdate | Specimen collection date |
observationreleasedate | Result release date (the most recent observationreleasedate is selected among the ORDERSIDs with the same final COVIDRESULT) |
ordersid1-n | This ordersid variable can be linked back to the original dataset to get additional laboratory information (e.g., performing/reporting labs, specimen source [e.g., NASOPHARYGEAL, THROAT], TR/LOINC codes) if needed. If there is >1 ordersid on the same observation date, each will be listed from 1-n |
numordersid | Total # ORDERSIDs for this OBSERVATIONDATE (specimen collection date) |
covidtest | Is it a COVID test (T/F)? - T if COVIDRESULT = Positive/Presumptive, Indeterminate, Negative, or Pending - F if COVIDRESULT = Cancelled or Rejected/Invalid |
covidresult | COVID19 test result assigned at the testing episode-level based on the hierarchy of results among all ordersid on the same date (Positive/Presumptive > Indeterminate > Negative > Pending > Cancelled > Rejected). |
final_result_ordersids | All ORDERSIDs within the same testing episode that have the same final COVIDRESULT (assigned after roll-up) |
The COVID19_rollup.ipynb script rolls-up interpretations from test results into "testing episodes", which we define as each unique combination of patientid and observationdate (i.e., specimen collection date). We created different hierarchies so that the most relevant COVID-related test results would take priority in the roll-up. Please refer to the Overview section of the script for a more detailed description.
The following is a list of COVID-19-specific TR or LOINC codes, and other TR/LOINC codes indicative of respiratory virus testing (Table 2). These codes, alongside keywords such as “COVID”, “SARS-CoV-2”, “Novel coronavirus” or “nCOV” or microorganism SNOMED codes (840533007 [SARS-CoV-2], 168209000 [No Virus Identified]), were used to define the data pull criteria from OLIS.
Table 2. Test Request (TR) codes and Logical Observation Identifiers Names and Codes (LOINC) indicative of respiratory virus testing.
TR or LOINC code | Description |
---|---|
TR12936-1 | 2019 Novel Coronavirus PCR |
TR12937-9 | 2019 Novel coronavirus RNA panel |
41461-5 | VIRUS IDENTIFIED:PRID:PT:XXX:NOM |
94314-2 | 2019 Novel coronavirus RdRp gene:PrThr:Pt:XXX:Ord:Probe.amp.tar |
94315-9 | 2019 Novel coronavirus E gene:PrThr:Pt:XXX:Ord:Probe.amp.tar |
94316-7 | 2019 Novel coronavirus N gene:PrThr:Pt:XXX:Ord:Probe.amp.tar |
XON10842-3 | INTERPRETATION:IMP:Pt:XXX:NAR |
XON13512-9 | Coronavirus RdRp gene:PrThr:Pt:XXX:Ord:probe.amp.tar |
XON13527-7 | XON13527-7 / COVID-19 virus PCR Interpretation:Imp:PT:XXX:Nar |
XON13528-5 | SARS coronavirus 1:PrThr:Pt:XXX:Ord:Probe.amp.tar |
XON13529-3 | SARS coronavirus 2 ORF1ab:PrThr:Pt:XXX:Ord:Probe.amp.tar |
XON13531-9 | SARS coronavirus 2 S gene RNA:PrThr:Pt:XXX:Ord:Probe.amp.tar |
These criteria were developed by OHDS, in consultation with ICES and the Ontario MOH, after review of early data feeds. Standardized reporting of COVID-19 results is encouraged but variability between laboratories is observed. The DWG monitors and communicates changes as they are identified.
To quantify the degree of OLIS under-reporting, ICES determined the overlap between OLIS and other sources of COVID-19 laboratory test data.
Methodology:
Other sources of COVID-19 laboratory test data were integrated to create a comprehensive (‘gold standard’) dataset, including:
- Distributed Testing Labs (DL): Testing data from labs within the COVID-19 Diagnostic Network (only up to April 13, 2020).
- Public Health Ontario (PHO) Labware: All COVID-19 tests performed by a PHO laboratory (only up to May 21, 2020).
- Integrated Public Health Information System (iPHIS): Confirmed COVID-19 cases with additional risk factor and outcome information (cumulative dataset received daily) from all Ontario Public Health Units. This is the case data reported by the Ontario MOH.
- OLIS.
Challenges included no common identifier for test/specimen across all 4 datasets; dates from each dataset are used differently (e.g., specimen collection date [OLIS] vs. login date [DL, Labware]); iPHIS is an individual-level dataset with only positive cases, while others are at the testing episode level (positive and negative). To avoid double counting testing episodes, we identified the “same” testing event across datasets by matching on dates (+/- 3 days). Results between datasets for “same” event were concordant >99%. To avoid double counting positive cases, we identified the “closest” test episode for each iPHIS case using specimen date > collection date > case reported date. The relative contributions of each dataset were compared after accounting for overlap. Cases created in iPHIS underwent deterministic and probabilistic linkage with the Ontario MOH’s Registered Persons Database (RPDB) containing encrypted health card numbers and demographic information for people ever registered with the Ontario Health Insurance Plan.
Results:
Figure 1. Comparison of OLIS and iPHIS datasets up to August 26th, PHO Labware up to May 21st, and DL data up to April 13th, and shows the relative contributions of each dataset after accounting for overlap. At the time of this analysis, there were 10,616, 5,366, and 2,645 records unlinked from DL, PHO Labware and iPHIS, respectively. Cases created in iPHIS up to August 23rd underwent deterministic and probabilistic linkage.
Figure 2. Cases in iPHIS as of August 26th (n=41,813) by diagnosing public health unit.
The number of individuals who tested positive in OLIS is around 90% of the cases reported by the Ontario MOH website (i.e. iPHIS). Under-reporting may be due to a number of factors, including that early tests were performed at the National Microbiology Laboratory (in addition PHO); some hospital labs did not contribute all or part of their data to OLIS (e.g., Hospital for Sick Children performed tests in March but was not a reporting lab in OLIS until April 20, therefore OLIS likely underestimates the number of children tested); changes in the responsibility for reporting of lab results into OLIS (e.g., miscommunication between ordering facility and performing laboratory as to who reports the final result to OLIS); unconsented records (~500 individuals); lab requisitions entered into OLIS without a health card number (HCN), or a medical record number (MRN) that would permit linkage. OHDS links HCNs to test results submitted to OLIS with an MRN if possible, however, as of July 31 2020, OLIS records for ~4,000 individuals are not in the ICES data feed due to unresolved HCNs.
Start date
We recommend using the following criteria to analyze COVID-19 episodes with a valid result:
- ObservationDate ≥ January 15, 2020
- covidtest = ‘T’
- valikn = ‘V’
NOTE: In the ICES OLIS data, the first positive COVID-19 case was detected in mid-February 2020. It is suspected that earlier positive cases were not reported in OLIS; or if reported, came as a faxed document which ICES currently does not have access to. Tests dated prior to January 2020 are likely data entry errors in the ObservationDate field. ICES did not modify any dates from the original OLIS feed.
Reported date
The reported date (ObservationReleaseDate) is when the COVID-19 result was completed and released. Nearly all results reported up to the day before receiving the dataset are in each daily dataset. There is variability in the time between specimen collection and reported date. The date used as an index date depends on the research question. For e.g., the specimen collection date (ObservationDate) = closest time to infection; the reported date = date when result was “finalized” and reported to public health units for case management follow-up.
Unit of analysis
Each row in the recommended analytic dataset is a testing episode. If counting the number of testing episodes per day then the interpretation = “Number of unique individuals tested per day”, not “Number of tests completed”. Individuals could have > 1 testing episode (multiple rows) over time (e.g. Day 1 = Negative, Day 10 = re-tested and positive). Depending on the research objective, you may need to select 1 testing episode per individual as your index.
- E.g.: To compare individuals ever tested positive vs. negative, choose 1st positive testing episode.
- E.g.: To study health care use after COVID-19 testing, choose earliest testing episode.
Postal code
The pstlcode recorded in OLIS on a testing episode date might not be the same as the pstlcode in the most recent Ontario MOH's RPDB. The address information in OLIS is the patient’s address (“whatever was recorded by the lab”). Based on our current knowledge, it is unclear whether the address can represent other locations (e.g., setting of test, workplace).
Concurrent testing for other respiratory viruses
In some hospitals or in some types of patients (e.g., inpatients), other respiratory viruses along with COVID-19 are tested for simultaneously as part of the same lab order or test request. Where possible, the Python script assigned values for each respiratory virus. These variables are only available in the minimum dataset. If it is important to know what other respiratory viruses were tested for on the same testing episode date, the ordersid(s) of the testing episode can be linked back to the minimum dataset. However, it is possible that test requests for other respiratory virus that were part of the same ordersid or testing episode as a COVID-19 test were not part of the data cut from OHDS. Thus, if there are no results for other respiratory viruses this does not imply they were negative for other viruses.
Location of specimen collection
The lab that performed the test and generated the result (PerformingLabOrgName) and the organization that reported the results into OLIS (ReportingLabOrgName) in the minimum dataset may not be an accurate proxy for where the specimen was collected. Other variables available include OrderingFacilityOrgName and SpecimenCollectionOrgName, however a high proportion of observations have these values missing.
Reporting delay
Due to the time required for transportation and processing of specimens, it takes up to 6 days for approximately 95% of results to be finalized and reported for a given testing date. In other words, the larger the difference between when the dataset was received and the specimen collection date (observationdate), the more likely we have all of the results for that specimen collection date.
Uncertain how to define unique “tests” in OLIS
The Ontario MOH website reports “Total tests completed”, which represents the total tests completed by the COVID-19 Clinical Lab network and not persons tested (based on number reported by the laboratories in this network). How unique tests are recorded in OLIS depends on the performing and reporting laboratories. For example, a unique “test” may be defined by Testing event, Test request, Lab order, or by the target gene.
Under-reporting in OLIS
Please refer to the v) Validation section for more information.
- Ontario Health Digital Services. Guidance to reporting COVID-19 results: Ontario Laboratories Information System (OLIS) Requirements Version 3.1. July 16, 2020. [Retrieved September 23, 2020]
Please email Branson Chen [branson.chen@ices.on.ca] for any questions about the COVID19_processing script, Kinwah Fung [kinwah.fung@ices.on.ca] and Hannah Chung [hannah.chung@ices.on.ca] for any questions about the COVID19_rollup script, and Mahmoud Azimaee [mahmoud.azimaee@ices.on.ca] for any other inquiries.
ICES is an independent, nonprofit research institute. As a prescribed entity under Ontario’s privacy legislation, ICES is authorized to collect and use health care data for the purposes of health system analysis, evaluation and decision support. Secure access to these data is governed by policies and procedures that are approved by the Information and Privacy Commissioner of Ontario. ICES research provides measures of health system performance, a clearer understanding of the shifting health care needs of Ontarians, and a stimulus for discussion of practical solutions to optimize scarce resources.