FHIR to CDM tool

FHIR to CDM tool can be used to create an ADF pipeline to export data from a FHIR server, rectangularize it based on user configurations, and move it to a CDM folder in Azure Data Lake Storage Gen 2.

A CDM folder is a folder in a data lake that conforms to specific, well-defined, and standardized metadata structures and self-describing data. These folders facilitate metadata discovery and interoperability between data producers and data consumers.

FHIR to CDM pipeline acts as a data producer. Azure Synapse, Power BI, Azure Data Factory, Azure Databricks, Azure Machine Learning etc. act as data consumers in this scenario.

The FHIR to CDM tool has three components, shown in green in the diagram below:

1. Table configuration generator

Table configuration generator takes YAML instructions from user and generates a table configuration folder. The schema viewer, described later, helps visualize the schema of the generated tables.

First, ensure you already have Node.js and npm installed, see Downloading and installing Node.js and npm if they don't exist

Then clone this FHIR-Analytics-Pipelines repo to your local machine, browse to the FhirToCdm\Configuration-Generator directory, use command below to install the dependencies.

npm ci

Subsequently, you can run the following command from the Configuration-Generator folder to generate table configuration. You may use the sample yaml files, resourcesConfig.yml and propertiesGroupConfig.yml provided in the project. See the YAML instructions format document if you want to write yaml instructions as per your needs.

Configuration-Generator> node .\generate_from_yaml.js -r {resource configuration file} -p {properties group file} -o {output folder}

Example:

Configuration-Generator> node .\generate_from_yaml.js -r resourcesConfig.yml -p propertiesGroupConfig.yml -o tableConfig

Option	Name	Optionality	Default	Description
-r	resourcesConfigFile	Optional	resourcesConfig.yml	Name of the input resource configuration file
-p	propertiesConfigFile	Optional	propertiesGroupConfig.yml	Name of the input propertiesGroup file
-o	output	Required		The output folder to which the configuration will be generated
-h	Help	Optional		Shows help

2. Schema viewer

The table configuration folder contains metadata describing the structure of tables, and processing instructions for converting FHIR data to those tables. You can use the Schema Viewer tool to see the schema of a given table.

Configuration-Generator> node .\program.js show-schema -d {output folder} -t {Table Name} -maxDepth 4

Example:

Configuration-Generator> node .\program.js show-schema -d tableConfig -t Patient -maxDepth 4

Option	Name	Optionality	Default	Description
-h	help	Optional		Shows help
-t	tableName	Required		Name of the table to show its schema. Table name may be different from the file name: PatientAddress instead of Patient_Address.
-d	destination	Required		Name of the configuration folder
-maxDepth	maxDepth	Optional	3	Max recursion depth to travel in the configuration file

3. Pipeline generator

Pipeline generator uses the content of Table configuration folder, and a few other configurations to generate an ADF pipeline. This ADF pipeline, when triggered, exports the data from the FHIR server using $export API, rectangularizes it, and write to a CDM folder along with associated CDM metadata.

Use the following steps to create FHIR to CDM pipeline

3.1. Ensure that $export is enabled on Azure API for FHIR

Follow FHIR export configuration document to enable $export on your FHIR server if needed.

3.2. Create an Azure AD application and service principal.

The ADF pipeline uses an Azure batch service to do the transformation. We need to register an Azure AD application for the batch service. Follow the documentation to create an AAD application and service principal.

Note the service principle id and client id for the application by navigating to Azure Portal => Azure Active Directory => Enterprise applications => your app.

Create a client secret by navigating to Azure Portal => Azure Active Directory => App Registrations => your app => Certificates & secrets => New client secret. Take note of the client secret.

3.3. Grant access of export storage location to the service principal

In the Access Control of the export storage grant Storage Blob Data Contributor role to the Azure AD application created above.

3.4. Deploy egress pipeline

Use the button below to deploy egress pipeline through the Azure Portal.

Or you can download and save the fhirServiceToCdm.json deployment template. Use this template to do a custom deployment on Azure.

Parameter	Description	Example
Region	The Azure region where the ADF pipeline will be deployed	East US 2
Pipeline Name	Name of the ADF Pipeline to be created.	fhir2cdm (Keep the length less than 17 characters)
FHIR service url	Base URL of the FHIR server from where the data will be exported	https://myfhirserver.azurehealthcareapis.com
Principal Id	The service principal id of the application created in step 2	aa1decb5-7c11-4000-916b-ac7abd4f135b
Client Id	The client id from step 2	cafa1d08-b71c-42b2-8fb7-61e6790f241f
Client Secret	The client secret from step 2	7A6-89_BpM1d7.P34H_StR_fKKa_uTJjbU
Configuration Container	The name of container on the storage account where you want to keep table configurations.	myconfigcontainer
Batch Pool VM Size	Size of the VM to use for Azure batch	STANDARD_A1
Batch Pool Node Count	Number of nodes in the Batch Pool. Different resource types can be processed in parallel using this pool	3
Package Link	The link to the binary used for transformation	(Do not change this value)

It will create the the following Azure resources:

An ADF pipeline with the name {pipelinename}-df.
A key vault with the name {pipelinename}-kv to store the client secret.
A batch account with the name {pipelinename}batch to run the transformation.
A storage account with the name {pipelinename}storage. This storage will be used for different purposes such as running the batch job and the destination storage for the CDM data. This is also where you will keep the table configuration.

3.5. Grant access of the FHIR service to the Azure Data Factory

In the access control of the FHIR service grant FHIR data exporter & FHIR data reader role to the data factory, {pipelinename}-df, created in the previous step.

3.6. Upload the table configurations to the blob container

Upload the content of the table configuration folder to the configuration container that you specified in Step 4. The rectangularization behavior of the pipeline is goverened by the content of this folder. You can update the content of this folder to change the rectangularization behavior.

3.7. Trigger the ADF pipeline

Go to the {pipelinename}-df, and trigger the pipeline. One the pipeline execution is completed, you should see the exported data in the CDM folder on the storage account {pipelinename}storage. You should see one folder for each table having a csv file.

Troubleshooting the pipeline

In case the pipeline run is successful, but you do not see data in the CDM folder, go to the adfjobs container within {pipelinename}storage account, look for the latest run-folder, which has a GUID name, and see the stderr.txt file for details.

4. FHIR to CDM local tool

We provide a local tool to convert FHIR data to CDM, it also uses the content in table configuration folder to generate CDM metadata, and then convert input FHIR ndjson data to CDM.

You need to build the Microsoft.Health.Fhir.Transformation.Cdm.Tool project, then call the Microsoft.Health.Fhir.Transformation.Cdm.Tool.exe like:

./Microsoft.Health.Fhir.Transformation.Cdm.Tool.exe --config {Table config folder} --input {Input folder that contains FHIR ndjson data} --output {CDM output folder}

Option	Optionality	Default	Description
--config	Required		Name of the table configuration folder
--input	Required		Name of the input folder contains FHIR ndjson data
--output	Required		Name of the CDM output folder
--maxDepth	Optional	3	Max recursion depth to generate CDM

Next Steps

Once you have the data in a CDM folder, it can be consumed by several Microsoft services such as Synapse Analytics, ADF, Azure Databricks, Azure Machine Learning, Azure SQL, and Power BI. See the instructions for moving the data from a CDM folder to Synapse analytics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fhir-to-cdm.md

fhir-to-cdm.md

FHIR to CDM tool

1. Table configuration generator

2. Schema viewer

3. Pipeline generator

3.1. Ensure that $export is enabled on Azure API for FHIR

3.2. Create an Azure AD application and service principal.

3.3. Grant access of export storage location to the service principal

3.4. Deploy egress pipeline

3.5. Grant access of the FHIR service to the Azure Data Factory

3.6. Upload the table configurations to the blob container

3.7. Trigger the ADF pipeline

Troubleshooting the pipeline

4. FHIR to CDM local tool

Next Steps

Files

fhir-to-cdm.md

Latest commit

History

fhir-to-cdm.md

File metadata and controls

FHIR to CDM tool

1. Table configuration generator

2. Schema viewer

3. Pipeline generator

3.1. Ensure that $export is enabled on Azure API for FHIR

3.2. Create an Azure AD application and service principal.

3.3. Grant access of export storage location to the service principal

3.4. Deploy egress pipeline

3.5. Grant access of the FHIR service to the Azure Data Factory

3.6. Upload the table configurations to the blob container

3.7. Trigger the ADF pipeline

Troubleshooting the pipeline

4. FHIR to CDM local tool

Next Steps