DICOM to Synapse Sync Agent enables you to query directly and perform Analytics on DICOM metadata by moving DICOM metadata to Azure Data Lake in near real time and making it available to a Synapse workspace.
It is an Azure Container App that extracts data from a DICOM server using DICOM Change Feed APIs, converts it to hierarchical Parquet files, and writes it to Azure Data Lake in near real time. This solution also contains a script to create External Table in Synapse Serverless SQL pool pointing to the DICOM metadata Parquet files. For more information about DICOM External Tables, see Data mapping from DICOM to Synapse.
This solution enables you to query against the entire DICOM metadata with tools such as Synapse Studio in SQL. You can also access the Parquet files directly from a Synapse Spark pool.
Note: An API usage charge will be incurred in the DICOM server if you use this tool to copy data from the DICOM server to Azure Data Lake.
- An instance of DICOM server, or the DICOM service in Azure Healthcare APIs. The pipeline will sync data from this DICOM server.
- A Synapse workspace.
- Deploy the pipeline to Azure Container App using the given ARM template.
- Provide access of the DICOM service to the Container App that was deployed in the previous step.
- Verify that the data gets copied to the Storage Account. If data is copied to the Storage Account, then the pipeline is working successfully.
- Provide access of the Storage Account and the Synapse workspace to your account for running the PowerScript mentioned below.
- Provide access of the Storage Account to the Synapse Workspace to access the data from Synapse.
- Run the provided PowerShell script that creates following artifacts:
- DICOM specific folder in the Azure Storage Account.
- A database in Synapse serverless pool with External Table pointing to the DICOM Parquet files in the Storage Account.
- Query data from Synapse Studio.
-
To deploy the DICOM to datalake sync pipeline, use the button below to deploy through the Azure Portal.
Or you can browse to the Custom deployment page in the Azure portal, select Build your own template in the editor, then copy the content of the provided ARM template to the edit box and click Save.
The deployment page should open the following form.
-
Fill the form based on the table below and click on Review and Create to start the deployment.
Parameter Description Resource Group Name of the resource group where you want the pipeline related resources to be created. Location The location to deploy the DicomToDatalake pipeline. Pipeline Name A name for the DicomToDatalake pipeline, need to be unique in your subscription. Dicom Server Url The URL of the DICOM server. If the baseUri has relative parts (like http://www.example.org/r4), then the relative part must be terminated with a slash, (like http://www.example.org/r4/). Dicom Api Version Version of the DICOM server. Currently only V1 is supported. Server Authentication The authentication method to access the DICOM server. Container Name A name for the Storage Account container to which Parquet files will be written. The Storage Account with autogenerated name will automatically be created during the installation. You need not change this. Job Concurrency Concurrent jobs exeucting in parallel. Customized Schema Image Reference The customized schema image reference for the image on Container Registry. Refer TemplateManagement for how to manage your template images. Image DicomToDatalake container image to deploy. You need not change this. Max Instance Count Maximum number of replicas running for pipeline Container App. Storage Account type Azure Storage Account type to deploy. -
Refer here for more information about using customized schema to sync up DICOM metadata to data lake.
-
Ensure to make note of the names of the Storage Account and the Azure Container Apps created during the deployment.
If you are using the DICOM service in Azure Healthcare APIs, assign the DICOM Data Reader role to the Azure Container App deployed above.
If you are using the DICOM server for Azure with anonymous access, then you can skip this step.
The Azure Container App runs automatically. You'll notice the progress of the Azure Container App in the Azure portal. The time taken to write the data to the storage account depends on the amount of metadata in the DICOM server. After the Azure Container App execution is completed, you should have Parquet files in the Storage Account. Browse to the results folder inside the container. You should see a folder corresponding to DICOM metadata.
You must provide the following roles to your account to run the PowerShell script in the next step. You may revoke these roles after the installation is complete.
- In your Synapse workspace, select Synapse Studio > Manage > Access Control, and then provide the Synapse Administrator role to your account.
- In the Storage Account created during the pipeline installation, select the Access Control (IAM) and assign the Storage Blob Data Contributor role to your account.
To enable Synapse to read the data from the Storage Account, assign the Storage Blob Data Contributor role to it. You can do this by selecting Managed identify while adding members to the role. You should be able to pick your Synapse workspace instance from the list of managed identities shown on the portal.
Running the PowerShell script that creates following artifacts:
- DICOM specific folder in the Azure Storage Account.
- A database in Synapse serverless SQL pool with External Tables pointing to the DICOM metadata Parquet files in the Storage Account.
To run the PowerShell Script, perform the following steps:
- Clone this FHIR-Analytics-Pipelines repo to your local machine.
- Open the PowerShell console, ensure that you have the latest version of the PowerShell 7 or Powershell 5.1.
- Install Powershell Az and separated Az.Synapse modules if they don't exist.
Install-Module -Name Az Install-Module -Name Az.Synapse
- Install Powershell SqlServer module if it doesn't exist.
Install-Module -Name SqlServer
- Sign in to your Azure account to the subscription where synapse is located.
Connect-AzAccount -SubscriptionId 'yyyy-yyyy-yyyy-yyyy'
- Browse to the scripts folder under this path (..\FhirToDataLake\scripts).
- Run the following PowerShell script.
For more details, refer to the complete syntax in Set-SynapseEnvironment Syntax.
./Set-SynapseEnvironment.ps1 -SynapseWorkspaceName "{Name of your Synapse workspace instance}" -StorageName "{Name of your storage account where Parquet files are written}" -Container dicom -Database dicomdb -DataSourceType DICOM
Go to your Synapse workspace serverless SQL pool. You should see a new database named dicomdb. Expand External Tables to see the entities. Your DICOM Parquet format metadata is now ready to be queried.
As you add more data to the DICOM server, it will be fetched automatically to the Data Lake and become available for querying.