Structured Data Capture
3.0.0 - STU 3 International flag

This page is part of the Structured Data Capture FHIR IG (v3.0.0: STU 3) based on FHIR R4. This is the current published version. For a full list of available versions, see the Directory of published versions

Form Data Extraction

Page standards status: Informative

Questionnaires are excellent tools for data capture. They allow tight control over what data is gathered and ensure information is gathered consistently across multiple users. However, data gathered using different questionnaires - or even different versions of the same questionnaire - is often not comparable. It is also not very searchable or easily integrated with discrete data sources. Because of this, the general recommendation in FHIR is to use questionnaires for raw data capture but then to convert the resulting QuestionnaireResponse instances into other FHIR resources - Observations, MedicationStatements, FamilyMemberHistories, etc. This allows the data gathered to then be easily combined with other data into FHIR documents and messages and exposed over FHIR REST interfaces.

Such conversion can be done with custom code written on a Questionnaire by Questionnaire basis. However, it makes the process much easier if it's possible to write generic software that can convert any arbitrary QuestionnaireResponse into appropriate FHIR resources leveraging metadata embedded in the Questionnaire. This portion of the SDC guide defines mechanisms for doing so.

Caveats, considerations and rules with form data extraction

  • Extraction is a step that only makes sense to occur once a QuestionnaireResponse is completed. Prior to that, the information to create valid resource instances may not be available and conversion logic is likely to fail. The types of data produced will vary by Questionnaire.
  • Some questionnaires might result in a single resource. Others will produce a Bundle of resources. In some cases, the result might be a transaction intended to create some resources and update others.
  • If the Questionnaire is being designed with conversion to a resource in mind, conversion will be a straight-forward process because the questions will align well with how FHIR stores data - and the questions can even be tuned to align with particular profiles around the use of terminology, which elements are mandatory, etc. However, if converting data from questionnaires designed without FHIR in mind, mapping may be more challenging. Required elements may need to be inferred, codes may need to be transformed and other transformations may be necessary to ensure that converted data meets expectations for use, such as aligning with country-specific implementation guides. In some cases, alignment may not be possible.
  • Once a QuestionnaireResponse has been converted, it might not be necessary to retain the QuestionnaireResponse any longer, as all data access and subsequent maintenance will occur through the 'traditional' resources. However, in many environments, the QuestionnaireResponse will be retained anyhow to keep a record of the original source-of-truth and for traceability reasons.
  • As with population, 'extraction' can happen in one of two ways: The SDC Form Filler can invoke the $extract operation on a SDC Form Manager to generate the resource or Bundle containing the information expressed in the QuestionnaireResponse. Alternatively, the SDC Form Filler can perform the extraction process locally without relying on an external process. Some Form Fillers might rely on the operation approach for certain extraction techniques and perform others internally.
  • When resources have been generated from a QuestionnaireResponse, the Provenance instance associated with the creation of the resource instance(s) can and SHOULD include an entity reference of type 'source' that points back to the original QuestionnaireResponse. If Observations are generated, they can also have an explicit derivedFrom link pointing back to the QuestionnaireResponse.
  • In theory, it is possible to use Questionnaire as a user-facing interface to allow maintenance of one or more resources. The source data can be used to populate the QuestionnaireResponse and once the response is 'submitted', the data can then be extracted and used to update the existing resources. For this to work, the id of the resource must be retained through the round-trip process to allow for the update. This can be supported by storing the id as a hidden question not shown to the user. This question item would have a type of 'string' and, if using the definition-based extraction approach, a definition that corresponded to the 'id' element of the relevant resource type (e.g. http://hl7.org/fhir/AllergyIntolerance#AllergyIntolerance.id). Note that the inclusion of the id as a hidden question is only relevant for the definition-based and StructureMap-based approaches. It is not needed for the Observation-based approach.
  • Sometimes the author interested in making a Questionnaire "extractable" does not have authority to make changes to the "official" Questionnaire. In other cases, there might be one official Questionnaire, but a need to create extracted resources that comply with different sets of profiles - and thus a need for different metadata in the Questionnaire to support the extraction process. In this case, rather than basing the extraction on the original Questionnaire, it can be based on a derived Questionnaire - one that has a Questionnaire.derivedFrom relationship to the same canonical URL the QuestionnaireResponse refers to. The derived Questionnaire would contain the same content as the base Questionnaire, but would have additional extensions inserted to support data extraction.
  • When capturing quantities, it's common for questionnaires to prompt for the numeric value and to note the 'fixed' unit as part of the question. For example, "Please specify the patient's weight in kilograms". When extracting the value for representation in a resource, a unit will be added. The questionnaire-unit extension SHOULD be included on the question to support the extraction process.
  • When performing extraction, the considerations around errors and lack of support for expressions may come into play. In addition, similar considerations can also be relevant in terms of what 'extraction' approach the Questionnaire supports, as opposed to what level of extraction capability the Form Filler has. The desired behavior of alerting the user that extraction won't be possible is the same whether the issue is "handles FHIRPath vs. CQL" or "uses Observation-based vs. definition-based".
  • When generating or updating resources based on information in a QuestionnaireResponse, there may be a desire to capture where that information was sourced from, what software was used to perform conversion, etc. The way to represent this information in FHIR is using the Provenance resource. The QuestionnaireResponse would be referenced using Provenance.entity.what where the Provenance.entity.role would be 'source'. Software information would be captured in Provenance.agent. If the produced records are Observations, the resulting Observation(s) can also directly point to the QuestionnaireResponse using Observation.derivedFrom.
  • A single Questionnaire can sometimes include data from multiple subjects. The extraction process needs to be aware of what the subject for a given section of the Questionnaire is and take that into account when creating resources based on the QuestionnaireResponse. (E.g. don't create a heart-rate observation on the mother if that part of the Questionnaire is capturing information about their child. In SDC, the fact that a particular 'group' in a Questionnaire has a distinct subject is communicated using the isSubject extension. An equivalent extension flags the element within the QuestionnaireResponse.

Extraction service

Like Questionnaire population, extracting data from a QuestionnaireResponse is a complex process involving querying existing FHIR data and using more advanced technologies such as FHIRPath and StructureMap. It's therefore a function that systems may also wish to offload to a separate system. The QuestionnaireResponse extract has been created for this purpose. It takes in a completed QuestionnaireResponse and returns either an individual FHIR resource or a Bundle of resources, depending on the type of Questionnaire. The operation does not post the created resources to a server. It's up to the client system to determine what action(s) to take with the created content.

NOTE: It's the responsibility of the client system to ensure that any generated resources are valid against necessary profiles, etc. before using content produced by this operation.

Designing Questionnaires to support data extraction

This specification defines three different mechanisms to embed information in Questionnaires to support subsequent resource extraction:

Systems are free to experiment with other extraction mechanisms but cannot expect support for those from other SDC-conformant systems.

Each mechanism has its own profile that includes the additional resource elements or extensions relevant for supporting a particular mechanism: SDC Questionnaire Extract - Observation, SDC Questionnaire Extract - Definition, and SDC Questionnaire Extract - Structure Map profiles. Each profile identifies specific 'must support' elements and extensions that systems that claim to support a specific SDC extraction mechanism SHALL be capable of extracting data, as befits the CapabilityStatement(s) they claim conformance to. Each system should choose which approach(es) it wishes to use and support based on the elements specified in that profile.

Some of these mechanisms make use of FHIR-based queries, FHIRPath and/or CQL as well as extensions that include expressions in one of these languages. Implementers should read the Using Expressions page for background and guidance on these technologies and extensions.

Observation-based extraction

This is the simplest of the extraction mechanisms. It leverages the same data elements as are used for the Observation-based population mechanism. It takes advantage of the fact that most questions in the healthcare space typically correspond to the value element of an Observation. It also takes advantage of the Questionnaire.item.code element that identifies what a concept each question or group corresponds to. The SDC Questionnaire Extract - Observation profile has been created to support this mechanism. An example for this profile can be found here.

To use this method:

  1. Include the item.code element on each question to be extracted. Typically, this will be a LOINC code, but in some jurisdictions/environments, SNOMED CT or other codes may be relevant.
  2. Groups can also have an item.code present - this might represent the code of the panel or the Observation.code of an Observation with no value but with multiple Observation.component elements. Child question items can then assert the item.code of the "member-of" Observations or the Observation.component.code values.
  3. To signal that the item.code is intended for use in extraction (as opposed to just providing metadata about the Questionnaire item), the questionnaire-observationExtract extension must also be included (and set to true). This extension can be specified either at the root Questionnaire or on an individual question or group item (not a display item) that indicates that the observation-based approach should be used to extract either that particular item (based on the code present) or all items in the questionnaire (if they have a code present).
  4. If the extension appears on specific item.code elements rather than the item as a whole, then only the specified codes should be propagated rather than all. For example, an item might list codes for "Body weight", "Body weight (clothed)" and "Body weight (unclothed)". In that case, only the "Body weight" code would be appropriate to include in the extracted content, even though all three might be appropriate for population. If any item.codes are tagged with 'true', then only the tagged Codings will propagate. I.e. any sibling item.codes with no extension are considered to have an extension of 'false'. If no extension appears on any of the item.codes but an extension appears on the item, an ancestor item or the Questionnaire as a whole, then all codes are considered to be roughly equivalent translations and to be appropriate to list as sibling Codings within Observation.code.

    If an item has the extension flag set to 'true', some descendant items may have the extension with a value of 'false'. If this occurs, then the item tagged with 'false' and its descendants will be excluded from the extraction process.

  5. Following is the conceptual algorithm for observation-based extraction mechanism:
    • Start at the root of the Questionnaire and progress down through the items.
    • For a particular item, if the item itself or any of its codings have observationExtract set to 'true', then that item is to be extracted (if possible) and the presumption is that descendant items will also be extracted.
    • The presumption of "will extract" will change to false only if an item explicitly sets observationExtract = false at the 'item' level. It is still possible that descendant items might turn extraction back on again. I.e. the full depth must be traversed to catch all extraction candidates.
    • If an item is presumed to be extracted, it will extract if there are item.codes for that item that aren't explicitly marked with observationExtract=false.
    • The observation-extract-category extension only comes into play if a determination has been made that extraction will occur. The observation-extract-category on an item will override that declared on the base questionnaire or ancestor items. If there is a desire to add multiple categories, all must be declared on an item.

For example:

    
	  <item>
		<extension url="http://hl7.org/fhir/uv/sdc/StructureDefinition/sdc-questionnaire-observationExtract">
		  <valueBoolean value="true" />
		</extension>
		<linkId value="code-pop-demo"/>
		<code>
		  <system value="http://loinc.org"/>
		  <code value="29463-7"/>
		  <display value="Body weight"/>
		</code>
		<code>
		  <system value="http://loinc.org"/>
		  <code value="3141-9"/>
		  <display value="Body weight Measured"/>
		</code>
		<code>
		  <system value="http://loinc.org"/>
		  <code value="8341-0"/>
		  <display value="Dry body weight Measured"/>
		</code>
		<text value="What is your current weight?"/>
		<type value="quantity"/>
		<answerOption>
		  <valueCoding>
			<system value="http://unitsofmeasure.org"/>
			<code value="kg"/>
		  </valueCoding>
		</answerOption>
		<answerOption>
		  <valueCoding>
			<system value="http://unitsofmeasure.org"/>
			<code value="[lb_av]"/>
		  </valueCoding>
		</answerOption>
	  </item>
	
  

When performing the extraction process, the system will create a batch that will contain creates or updates of Observation instances. It will go through the QuestionnaireResponse and identify all answers marked for extraction (if the corresponding Questionnaire item or the root Questionnaire has a questionnaire-observationExtract extension). For each of those it will then determine whether to create a new observation, update an existing observation or do nothing. Guidelines for making this decision are as follows:

Take no action if:
  • the answer was populated from an existing Observation;
  • the system rendering the QuestionnaireResponse can retain context and knows the 'id' of Observation to update;
  • the author of the original Observation is the same as the current author of the QuestionnaireResponse;
  • the context of the questionnaire and the use of it is one where updates are appropriate - as opposed to asserting a new Observation with a new performer and date; and
  • the answer has not changed from the populated value;
Update if:
  • all of the conditions above apply with the exception that the value has changed.
Create a new Observation if: the conditions in the preceding two rows are not met

If updating, the original Observation SHALL be adjusted to have the new value or component.value and the status changed to "amended", then PUT to the source system. If creating, data elements SHOULD be populated as follows:

  • Observation.basedOn and Observation.partOf - copy from QuestionnaireResponse elements of the same name
  • Observation.status - set to 'final'
  • Observation.category - if this can be inferred from any of the Questionnaire.item.code values or from known context of the Questionnaire itself, then fill it in. It can also be asserted using the observation-extract-category extension.
  • Observation.code - add all the Questionnaire.item.code values as Observation.code.coding instances
  • Observation.subject - set to QuestionnaireResponse.subject
  • Observation.encounter - set to QuestionnaireResponse.encounter (if an Encounter)
  • Observation.effectiveDateTime - set to QuestionnaireResponse.authored.

    Note, this is an inference. It is important that the question text implies that the value is 'current' not 'historical' for this to be safe - otherwise do not include the questionnaire-observationExtract extension that marks the question as appropriate for extraction.
  • Observation.issued - set to QuestionnaireResponse.authored
  • Observation.performer - set to QuestionnaireResponse.author
  • Observation.value[x] - set to QuestionnaireResponse.item.answer.value[x]
  • Observation.derivedFrom - set to a reference to the QuestionnaireResponse
  • Observation.interpretation and Observation.referenceRange - if these can be inferred from the QuestionnaireResponse.item.code (and for interpretation the answer value too), they can be populated, otherwise omit

If the Questionnaire.item that is linked to an Observation contains child items that are also linked to Observations, then things get more complex as a determination will need to be made on whether to link the parent to child as Observation.component or as Observation.hasMember. In the ideal situation, the system will recognize the Observation.item.code and know which approach is correct for that type of Observation. If not, then the system could query for other records of the child type and see if they appear as components anywhere. If unsure, systems should use "hasMember".

Considerations and rules when using this approach:

  • If a questionnaire item has the questionnaire-unit extension, the Observation.value SHOULD be a valueQuantity rather than integer or decimal and the units should be taken from the extension value.
  • If a question is skipped (no answer) or cleared, no Observation should be created. Existing Observations SHALL NOT be deleted.
  • If a question has multiple answers, each answer SHALL be a separate Observation instance.
  • There is no mechanism to support items that are mapped to Observation codes which then have nested items without codes - e.g. to capture the text description for an "other - please specify" code - one of the other extraction mechanisms will need to be used.
  • Implementers are free to try combining this mechanism with the Definition-based approach. If they do, they should take care that a given item (and its children) are only handled by one approach or the other - not both.
  • This approach does not allow for observations where Observation.focus is relevant or for capturing Observation.dataAbsentReason.
  • Where an Observation is known to directly correlate to another resource element value (e.g. LOINC 21112-8 corresponds to Patient.birthDate), systems MAY take advantage of this knowledge to update the value of resources other than Observations, however such use is discouraged - using one of the other extraction techniques is likely better and safer.
  • Obviously, this mechanism only works for questionnaire items that correspond to Observation values.

Definition-based extraction

This approach to extraction is more generic. It supports extracting data into any type of FHIR resource rather than being limited to only Observation. It also supports more Observation data elements than can be gathered using the Observation element - for example, explicit effective time ranges, interpretations, comments, etc. The SDC Questionnaire Extract - Definition profile has been created to support this mechanism.

To use this method:

  1. Include the questionnaire-itemExtractionContext extension either on the Questionnaire root or on 'group' items within the Questionnaire to identify the resource that will serve as the context for any extraction. The itemExtractionContext is used to set the context for the item.definition paths. If the itemExtractionContext is empty, then the Questionnaire is being used to create a resource. If the itemExtractionContext has a resource (or set of resources), then the Questionnaire is being used to update the resource(s).
  2. On descendant items of that element, fill in the Questionnaire.item.definition to point to the resource or profile element that Questionnaire item corresponds to. (Profiles may be relevant for data that is sliced or has fixed values for some properties.). The definition SHALL have the full canonical URL of the resource (or profile) followed by '#' followed by the snapshot.element.id of the element the Questionnaire item corresponds to.
    NOTE: Rather than specifying the id of an element that actually appears in a snapshot, it is also possible to 'extend' an id to walk further into the data types of the specified element. For example, a definition might be http://hl7.org/fhir/Patient#Patient.name.given even though the referenced profile's snapshot only contains Patient.name, not Patient.name.given. However, for more complex references (e.g. referring to a particular extension or a particular repetition), it will be necessary to define a distinct profile where slicing ensures that a single element id corresponds to the desired element. I.e. you would need to use http://somewhere.org/SomePatient#Patient.name.given:foo where 'foo' was a slice that referred to the first middle name in order to tie to a definition that specific.
    If a definition is specified on a Group or a Question with nested questions, then nested items might have definitions that refer to a deeper path along the same id structure. For example, a group element might have a definition of http://hl7.org/fhir/Patient#Patient.name while nested items would have definitions of http://hl7.org/fhir/Patient#Patient.name.given and http://hl7.org/fhir/Patient#Patient.name.family.
  3. If necessary, define questionnaire-hidden items that have Questionnaire.item.initial.value[x] or that use the questionnaire-initialExpression extension to define their content to use to populate resource elements that the user will not be filling in. (The initialExpressions might in turn depend on variable and questionnaire-launchContext extensions, used as described in the Expression-based population section.

To perform the extraction process, first determine whether the context resource should be updated or created:

  • If the element has a questionnaire-itemExtractionContext that corresponds to an existing resource or set of resources, then the intention is that the extracted content will replace those existing resources. (Typically in this situation, the same expression would have appeared on the same element as a questionnaire-itemPopulationContext expression. On the other hand if the itemExtractionContext simply refers to a resource type, then the intention is to create new instances. When updating, the Questionnaire will typically 'populate' hidden questions with the 'id' elements of the relevant respources for update to ensure that updates are applied to the appropriate instances. If some repetitions end up not having id elements (e.g. because the user has added additional group repetitions), those additional repetitions will also be handled as creates.
  • In other cases, the record will be created.

If updating, the answer values from the questionnaire (including hidden, calculated answers) will be propagated to the context resource. If creating, then for each occurrence of each item asserting a questionnaire-itemExtractionContext, a resource of the context type will be defined. It will be populated with answers to questions (hidden or not) that have a definition that matches the resource type of the context. As well, if the definitions refer to a profile, any child elements of the profile that have fixed values or patterns declared will also be included in the instance with values matching the fixed or pattern values.

An example of a questionnaire using this approach can be found here.

Considerations and rules when using this approach:

  • FHIR queries found in the questionnaire-initialExpression, questionnaire-calculatedExpression and variable extensions may contain embedded FHIRPath expressions (surrounded by double curly-braces). Systems SHALL evaluate and substitute the results of such queries before executing them
  • If the result of evaluating the FHIRPath expressions is an invalid query, that is an error. Systems SHOULD log it and continue with extraction as if the query had returned no data.
  • When crafting queries, be sure to filter all relevant elements. For example, ensuring status excludes entered-in-error elements, practitioners are active, etc.
  • In some cases, the context of an element will switch. For example, an item might have a definition of "http://hl7.org/fhir/MedicationStatement#MedicationStatement.medicationReference" and a questionnaire-itemExtractionContext extension that shifts the context to "Medication". Subsequent children would then have definitions based on the "http://hl7.org/fhir/Medication" resource.
  • Note that only one context can be in play at the same time. When a new context is declared, it takes the place of the old context.
  • If the same context are used for both questionnaire-itemPopulationContext and questionnaire-itemExtractionContext, then the value will be repeated for both extensions.
  • The items in the Questionnaire might not have the same order as those in the resource. When serializing into XML, the official order must still be respected.

StructureMap-based extraction

The StructureMap approach is the most sophisticated approach of the three - and the most powerful. It allows significant transformation of data, including code translations when generating output resources. It also allows the conversion process between data and Questionnaire to be maintained independently and to draw on shared sources across Questionnaires. This can be an advantage in certain environments where the content of the questionnaire may need tight control, but the data environment can be more dynamic. This comes at the cost of requiring expertise in the FHIR mapping language, which is not (yet?) a common skill. The SDC Questionnaire Extract - StructureMap profile has been created to support this mechanism. An example for this profile can be found here.

To use this method:

  1. Include the questionnaire-targetStructureMap extension. This SHALL define a transform between the QuestionnaireResponse and either a single resource or a transaction Bundle containing the set of resources extracted from the QuestionnaireResponse.

To extract data from the completed QuestionnaireResponse, simply invoke the StructureMap on it. A sample Questionnaire and associated StructureMap that shows this approach can be found here.

Considerations when using this approach:

  • This mode has the drawback that if the StructureMap execution fails, there will generally not be any data extracted from the Questionnaire. With the other approaches, if one Observation or context fails, the others might still work. As a result, the StructureMap must be designed to be very robust in the face of missing or potentially 'bad' data.
  • The ability of StructureMaps to reference other StructureMaps allows for the possibility of re-use if certain sections of multiple questionnaires are consistent.

Output expectations

Following are the output expectations of $extract:

  • Observation-based will always produce a 'transaction' Bundle, even if it's only creating or updating a single Observation.
  • Definition-based will always produce a 'transaction' Bundle that indicates what resources are to be created or updated. The resources within the transaction can be anything (including Bundle resources such as FHIR documents)
  • StructureMap-based may produce either a 'transaction' Bundle or a single resource instance (though the instance might itself be a different type of Bundle, such as a document or collection). It is possible for StructureMaps to map to non-FHIR data structures, however there's no "built-in" mechanism to serialize such models, so additional information would need to be passed into the $extract process to guide such serialization. Support for non-FHIR extracts is outside the scope of this specification.