FHIR to CDM tool can be used to create an ADF pipeline to export data from a FHIR server, rectangularize it based on user configurations, and move it to a CDM folder in Azure Data Lake Storage Gen 2.
A CDM folder is a folder in a data lake that conforms to specific, well-defined, and standardized metadata structures and self-describing data. These folders facilitate metadata discovery and interoperability between data producers and data consumers.
FHIR to CDM pipeline acts as a data producer. Azure Synapse, Power BI, Azure Data Factory, Azure Databricks, Azure Machine Learning etc. act as data consumers in this scenario.
The FHIR to CDM tool has three components, shown in green in the diagram below:
Table configuration generator takes YAML instructions from user and generates a table configuration folder. The schema viewer, described later, helps visualize the schema of the generated tables.
First, ensure you already have Node.js and npm installed, see Downloading and installing Node.js and npm if they don't exist
Then clone this FHIR-Analytics-Pipelines repo to your local machine, browse to the FhirToCdm\Configuration-Generator
directory, use command below to install the dependencies.
npm ci
Subsequently, you can run the following command from the Configuration-Generator folder to generate table configuration. You may use the sample yaml files, resourcesConfig.yml and propertiesGroupConfig.yml provided in the project. See the YAML instructions format document if you want to write yaml instructions as per your needs.
Configuration-Generator> node .\generate_from_yaml.js -r {resource configuration file} -p {properties group file} -o {output folder}
Example:
Configuration-Generator> node .\generate_from_yaml.js -r resourcesConfig.yml -p propertiesGroupConfig.yml -o tableConfig
Option | Name | Optionality | Default | Description |
---|---|---|---|---|
-r | resourcesConfigFile | Optional | resourcesConfig.yml | Name of the input resource configuration file |
-p | propertiesConfigFile | Optional | propertiesGroupConfig.yml | Name of the input propertiesGroup file |
-o | output | Required | The output folder to which the configuration will be generated | |
-h | Help | Optional | Shows help |
The table configuration folder contains metadata describing the structure of tables, and processing instructions for converting FHIR data to those tables. You can use the Schema Viewer tool to see the schema of a given table.
Configuration-Generator> node .\program.js show-schema -d {output folder} -t {Table Name} -maxDepth 4
Example:
Configuration-Generator> node .\program.js show-schema -d tableConfig -t Patient -maxDepth 4
Option | Name | Optionality | Default | Description |
---|---|---|---|---|
-h | help | Optional | Shows help | |
-t | tableName | Required | Name of the table to show its schema. Table name may be different from the file name: PatientAddress instead of Patient_Address. | |
-d | destination | Required | Name of the configuration folder | |
-maxDepth | maxDepth | Optional | 3 | Max recursion depth to travel in the configuration file |
Pipeline generator uses the content of Table configuration folder, and a few other configurations to generate an ADF pipeline. This ADF pipeline, when triggered, exports the data from the FHIR server using $export API, rectangularizes it, and write to a CDM folder along with associated CDM metadata.
Use the following steps to create FHIR to CDM pipeline
Follow FHIR export configuration document to enable $export on your FHIR server if needed.
The ADF pipeline uses an Azure batch service to do the transformation. We need to register an Azure AD application for the batch service. Follow the documentation to create an AAD application and service principal.
Note the service principle id and client id for the application by navigating to Azure Portal => Azure Active Directory => Enterprise applications => your app.
Create a client secret by navigating to Azure Portal => Azure Active Directory => App Registrations => your app => Certificates & secrets => New client secret. Take note of the client secret.
In the Access Control of the export storage grant Storage Blob Data Contributor role to the Azure AD application created above.
Use the button below to deploy egress pipeline through the Azure Portal.
Or you can download and save the fhirServiceToCdm.json deployment template. Use this template to do a custom deployment on Azure.
Parameter | Description | Example |
---|---|---|
Region | The Azure region where the ADF pipeline will be deployed | East US 2 |
Pipeline Name | Name of the ADF Pipeline to be created. | fhir2cdm (Keep the length less than 17 characters) |
FHIR service url | Base URL of the FHIR server from where the data will be exported | https://myfhirserver.azurehealthcareapis.com |
Principal Id | The service principal id of the application created in step 2 | aa1decb5-7c11-4000-916b-ac7abd4f135b |
Client Id | The client id from step 2 | cafa1d08-b71c-42b2-8fb7-61e6790f241f |
Client Secret | The client secret from step 2 | 7A6-89_BpM1d7.P34H_StR_fKKa_uTJjbU |
Configuration Container | The name of container on the storage account where you want to keep table configurations. | myconfigcontainer |
Batch Pool VM Size | Size of the VM to use for Azure batch | STANDARD_A1 |
Batch Pool Node Count | Number of nodes in the Batch Pool. Different resource types can be processed in parallel using this pool | 3 |
Package Link | The link to the binary used for transformation | (Do not change this value) |
It will create the the following Azure resources:
- An ADF pipeline with the name {pipelinename}-df.
- A key vault with the name {pipelinename}-kv to store the client secret.
- A batch account with the name {pipelinename}batch to run the transformation.
- A storage account with the name {pipelinename}storage. This storage will be used for different purposes such as running the batch job and the destination storage for the CDM data. This is also where you will keep the table configuration.
In the access control of the FHIR service grant FHIR data exporter & FHIR data reader role to the data factory, {pipelinename}-df, created in the previous step.
Upload the content of the table configuration folder to the configuration container that you specified in Step 4. The rectangularization behavior of the pipeline is goverened by the content of this folder. You can update the content of this folder to change the rectangularization behavior.
Go to the {pipelinename}-df, and trigger the pipeline. One the pipeline execution is completed, you should see the exported data in the CDM folder on the storage account {pipelinename}storage. You should see one folder for each table having a csv file.
In case the pipeline run is successful, but you do not see data in the CDM folder, go to the adfjobs container within {pipelinename}storage account, look for the latest run-folder, which has a GUID name, and see the stderr.txt file for details.
We provide a local tool to convert FHIR data to CDM, it also uses the content in table configuration folder to generate CDM metadata, and then convert input FHIR ndjson data to CDM.
You need to build the Microsoft.Health.Fhir.Transformation.Cdm.Tool
project, then call the Microsoft.Health.Fhir.Transformation.Cdm.Tool.exe like:
./Microsoft.Health.Fhir.Transformation.Cdm.Tool.exe --config {Table config folder} --input {Input folder that contains FHIR ndjson data} --output {CDM output folder}
Option | Optionality | Default | Description |
---|---|---|---|
--config | Required | Name of the table configuration folder | |
--input | Required | Name of the input folder contains FHIR ndjson data | |
--output | Required | Name of the CDM output folder | |
--maxDepth | Optional | 3 | Max recursion depth to generate CDM |
Once you have the data in a CDM folder, it can be consumed by several Microsoft services such as Synapse Analytics, ADF, Azure Databricks, Azure Machine Learning, Azure SQL, and Power BI. See the instructions for moving the data from a CDM folder to Synapse analytics.