DAF-Research IG describes four capabilities C1, C2, C3, C4, each one of which is intended to help improve the data infrastructure for PCORnet and in the larger context a Learning Health System. This part of the IG provides additional guidance for the implementation of each of the four capabilities. This section is not normative and is only intended to provide guidance to implementers.
Implementing C1 capability involves four steps.
Note: Data Source actor can be implemented by system like an EMR natively or alternately it can be implemented as a layer (additional software module) on top of an EMR system. The partitioning of what resources, profiles and capability statements are implemented by an EMR natively versus an external module is left to the implementers. In either case the implementation has to meet the Data Source Conformance requirements. The next few paragraphs will provide details for each step above.
Instantiation of a Task for extraction at the Data Source
The Data Source actor has to support the creation of a DAF-Task resource instance. This can be achieved using the POST operation specified by FHIR. This task instance has to have the following data:
This Task instance would then be persisted for execution. The actual execution of the task can be controlled using a scheduled timer or a manual kick off. Note: If this is a task that is set up to repeat at a regular frequency, this step can be skipped after the first time.
Execution of the Task to extract data from the Data Source
The Task created earlier is Step 1 is executed at some point of time automatically or manually and the following actions are expected to happen.
NOTE: These extraction tasks could be inefficient and the initial extraction may take a long time. Implementers have to be aware of these inefficiencies in extracting data especially if they choose to use extraction of the data one patient at a time. Extraction tasks may return identifiable patient information or de-identified patient information. The task to load the Data Mart supporting PCORnet CDM has to appropriately address de-identification requirements prior to loading the data. This is further discussed in Step 4 below and has to be accomplished prior to loading and using the data.
NOTE: Also in the case of mappings from one data model to another such as FHIR to PCORnet CDM or FHIR to OMOP etc., there is always a potential for data loss. In these cases where there is not an exact mapping between local codes and standardized codes the extraction process is encouraged to include the actual raw values as part of the coding element.
The PCORnet CDM is a consensus artifact that has been adopted by PCORnet as a model for Data Marts which can then be queried by Researchers. Since this is a different data model than FHIR the following guidance can be used to extract data so that PCORnet CDM can be appropriately populated. However data extraction programs have to be aware that vendors may be supporting just US-Core or a subset of US-Core for their initial implementation and hence may not have all the PCORnet CDM data elements available. Implementers should prepare for significant heterogeneity in source data and budget time and resources accordingly not only for data extraction, but for transformation and loading depending on approaches used for extraction.
The Profiles,Operations and Mappings page provides the necessary mapping between FHIR Resources and PCORnet CDM.
Note: As is the case in all data mapping exercises, there could be data loss in the mapping. There is a large amount of legacy data already captured and hence the mapping problem will persist. In addition to existing data, systems capturing data are not do not use standards for data capture and hence this problem will persist. Researchers should be made aware of the existence of these data losses so that they have an idea of the data quality.
Some PCORnet sites are using OMOP CDM as a source or destination and hence a mapping between FHIR and OMOP CDM would be useful for these sites. The following is a mapping that was developed by the DAF pilot sites and can be a starting point for the implementation of C1 capability. Please note that this mapping is not bi-directional, (i.e., FHIR to OMOP) but it could be a good starting point for such a mapping. Profiles which are not US-Core are annotated accordingly in the table below. For PCORnet further enhancing the OMOP mapping will bring in additional data sources to contribute data to the network along with analytical tools that may be also provided by the various OMOP implementers.
Instantiation of a Task for loading of data at the Data Mart
The Data Mart actor has to support the creation of a DAF-Task resource instance. This can be achieved using a FHIR API using the POST operation or using a graphical user interface which allows an end user to create the Task instance. This task instance has to have the following data:
This Task instance would then be persisted for execution. The actual execution of the task can be controlled using a scheduled timer or a manual kick off. Note: If this is a task that is set up to repeat at a regular frequency, this step can be skipped after the first time.
Population of the Data Mart with the extracted data
A Bundle returned from Step 2 will conform to FHIR and US-Core or other specific IG requirements. This Bundle may have to go through additional transformations, mappings and other processing before it is loaded into a destination Data Mart. These additional actions may include de-identifying the data discussed next as well as other steps beyond the scope of this IG, such as pseudo anonymization, data standardization and patient matching as needed. Also implementers need to be aware that the data extraction task must have been completed for the data loading to start to ensure integrity of the data extracted. The status of the Task instance can be used to verify if a Task has been completed or if it is pending.
In cases where the data extracted from a Data Source contains identifiable patient information, an external process has to de-identify the data prior to loading the data mart with the extracted data. It is expected that most vendors supporting the ONC 2015 Edition CCDS API’s or the Patient/$everything operation would be returning identifiable patient information as part of the API. Since PCORnet requires de-identified data the de-identification has to be performed subsequently. Implementations can choose internally approved mechanisms or HHS de-identification guidance for de-identifying the data and populating the PCORnet CDM.
One of the value propositions of the data extract standardization is the need to eliminate mappings from each Data Source. As long as a Data Source has performed the right mapping to its FHIR Resources and profiles, the extracted data can be directly mapped to a destination model of choice such as the PCORnet CDM. Conformance of a Data Source to US-Core can be verified by using automated open source automated test tools and the US-Core conformance statements. The [FHIR wiki] provides a listing of many of these tools.
The following is a mapping of FHIR to PCORnet CDM and from OMOP to FHIR developed by DAF working with PCORnet community and data experts.
DAF-Research Mappings between data models
Using the above mapping the task to load the data would be executed as follows.
Implementing C2 capability involves three steps.
The next few paragraphs will provide details for each step above.
Instantiation of the CapabilityStatement
The Data Mart has to instantiate a CapabilityStatement Resource instance to declare its characteristics that would help a Researcher to compose queries. In addition the CapabilityStatement Resource should also help a Data Mart administrator to manage the data within the Data Mart. The CapabilityStatement Resource declares the various profiles, operations and other specifics about the implementation. For the DAF Data Mart actor the following data is expected to be present within the CapabilityStatement resource instance.
CapabilityStatement.rest.mode - Populate with “Server”
For each Operation that is supported by the Server, a DAF-OperationDefinition instance should be created with the appropriate data and then the CapabilityStatement.Operation.definition should point to the instance that has been created.
The following extensions should be populated for the CapabilityStatement resource instance * PCORnet Data Mart Active Flag - This indicates if the Data Mart is still active and is accepting queries.
The DAF-OperationDefinition profile has been created to help servers declare conformance to the various DAF-Research operations. In order to declare support for various operations, an implementation would create an instance of DAF-OperationDefinition and then point to it by the CapabilityStatement.rest.operation part of the CapabilityStatement resource.
The following data elements are expected to be populated for each DAF-OperationDefinition that is instantiated.
The following Extensions have to be populated as part of the OperationDefinition
Population and Updation of the CapabilityStatement
The CapabilityStatement once published gets updated less frequently as compared to other clinical resources. However updates to the CapabilityStatement will be performed due to changes in the following data elements
Making the CapabilityStatement available to Researchers
The CapabilityStatement resource just like other FHIR resources can be queried by researchers.
CapabilityStatement resources should be available for querying without requiring additional authorization.
The CapabilityStatement resource will be published at the well known FHIR URL
Capability C3 implementation involves two steps
Instantiation of a Task for executing a query
In PCORnet and most research environments, queries submitted to access data are asynchronous in nature, repeated frequently and may involve humans in the work flow performing approvals, rejections etc. In order to support these requirements an instantiation of a Task is performed. In order to track the Tasks across multiple Data Marts and states the following Task hierarchy is implemented.
A Task (this is known as the Root Task) would be created based on the query composed by the Researcher. For each Data Mart that the query will be sent to, a new Task instance (Data Mart specific task) would be created using the data from the Root Task. The parent of the Data Mart specific Task would be Root Task. Each Data Mart when it executes it’s Task would create an instance of the Task for the execution (Execution specific Task) from the Data Mart specific Task and then populate it accordingly with the results of the execution. This hierarchical nature would facilitate the Researcher to retrieve data specific to an execution within the Data Mart, across all Data Mart executions to date or across all the Data Marts.
This Root Task instances created will have the following data
The following extensions need to be populated on the Task
The following are the list of inputs to the daf-execute-query operation which would be populated on the Task.input data element.
Optionally the query can indicate the type of data expected as part of the results as part of the queryResultsPhiDisclosureLevel.
Submitting the query to multiple Data Marts
In order for the Researcher to execute the query against multiple Data Marts, the Research Query Requester system has to create an instance of the Root Task created in Step 1 for each Data Mart. In order to make Tasks specific to a Data Mart, the following Task data elements would be set.
All the other data elements would be replicated from the root task. Once the Data Mart specific Tasks are created, Research Query Responders can access these tasks via the search mechanism on the Task.owner data element.
Implementing C4 capability involves the following three steps
Retrieving the query specific to the Data Mart
Each Research Query Responder can access the queries that it needs to execute by performing a GET on the Task where Task.owner would be itself. This GET operation on the Task resource may cross firewall boundaries and might require appropriate authorization before the resources can be accessed.
The Research Query Responder would then duplicate the task with all the data for the specific execution. This new Task instance would have the Data Mart specific Task as its parent. The Research Query Responder would set the Task.status to “Received”, “Accepted” and “Ready” as appropriate.
Executing the query and returning the query results
The Research Query Responder would start the execution specific Task instance by updating the Task.status to “In-Progress”. The Research Query Responder would then translate the incoming query to native execution language based on the following parameters
The query would then be executed and the results would be created using the DAF-QueryResults Observation profile. The data would be represented as follows
Observation.code - Set this to the types of things being aggregated. It can be Patient, Encounter, Observation etc.
The Observation.component is sliced and there are two different occurrences. The first Observation.component is to capture the Aggregate Results. The second Observation.component is to capture the filters or stratifiers that were applied.
For example the first occurrence of Observation.component would contain ` { code: “Patient”, valueQuantity: “10”, interpretation: “Count” }
The second occurrence of Observation.component needs to repeated multiple times, once for each filter or stratifier. { [ {“code”: “Sex”, “valueString”: “M”, “interpretation”: “None”, }, {“code”: “Race”, “valueString”: “09”, “interpretation”: “None”, } ] },
Once these results are created the Research Query Responder should create the Bundle and set the execution specific Task.output to the Bundle instance. The Task.status should be set to “Completed”. In case of failures the data is returned as part of the OperationOutcome element. These execution specific Task instances are now available for retrieval by the Researcher.
Retrieving query results from multiple Data Marts
In order for a Researcher to get a complete picture of the population based on their query submitted, the query results from multiple Data Marts have to be retrieved. For this purpose the Research Query Requester, has to query each of the Data Marts for execution specific Task instances with the parent set to the Data Mart specific Task that was created during the initiation of the query. Once these task instances are retrieved then the Task.output would contain the result of each query execution for each Data Mart. These results would then be made available for the Researcher for further analysis.
For more examples of the various resources implemented by DAF-Pilots, please refer to the [DAF-Pilots] presentations.
[DAF-Pilots] : (https://oncprojectracking.healthit.gov/wiki/display/TechLabSC/DAF+Pilots)