R6 Ballot (2nd Draft)

Publish-box (todo)

FHIR Infrastructure icon Work GroupMaturity Level: N/AStandards Status: Informative

This tutorial introduces the FHIR mapping language.

To start with, we're going to consider a very simple case: mapping between two structures that have the same definition, a single element with the same name and the same primitive type:

Source Structure Target Structure
    TLeft
      a : string [0..1]
  
    TRight
      a : string [0..1]
  
The left instance is transformed to the right instance by copying a to a

Note that for clarity in this tutorial, all the types are prefixed with T.

The first task to do is to set up the mapping context on a default group. All mappings are divided up into a set of groups. For now, we just set up a group named "tutorial" - the same as the name of the mapping. For this tutorial, we also declaring the source and target models, and specify that an application invokes this with a copy of the left (source) instance, and also an empty copy of the right (target) instance:

/// url = "http://hl7.org/fhir/StructureMap/tutorial"
/// name = "Tutorial"

uses "http://hl7.org/fhir/StructureDefinition/tutorial-left" as source
uses "http://hl7.org/fhir/StructureDefinition/tutorial-right" as target

group tutorial(source src : TLeft, target tgt : TRight) {

// rules go here

}

Note that the way the input variables are set up is a choice: we choose to provide the underlying type definitions on which both source and target models are based, and we choose to specify that the invoking application most provide both the source and the target instance trees. Other options are possible; these are discussed further below. The rest of the tutorial examples use the same setup for the group.

Having set up the context, we now need to define the relationships between the source and target structures:

src.a as a -> tgt.a = a "rule_a";

This simple statement says that:

  • for every source src (there'll only be one)
  • for any element 'a' in the source
  • if there isn't any element 'a', then don't do anything
  • if there is one, call it variable 'a'
  • the value of property 'a' of the target will be a copy of variable a - that is, src.a

"rule_a" is a purely arbitrary name associated with the rule that appears in logs, error messages, trace files, etc. It has no other meaning in the mapping statements. Mostly, in fact, it is simply automatically generated by the engine. It will not be specified anymore in this tutorial.

Note that there is no types explicitly in this mapping statement, but if the underlying system has types, then the types will have to be correct. If the underlying source and target trees are strongly typed, and the mapping groups have explicit types, then a short hand form is possible:

src.a -> tgt.a;

How this works is described below.

Now consider the case where the elements have different names:

Source Structure Target Structure
    TLeft
      a1 : string [0..1]
  
    TRight
      a2 : string [0..1]
  
The left instance is transformed to the right instance by copying a1 to a2

This relationship is a simple variation of the last:

src.a1 as b -> tgt.a2 = b;

Note that the choice of variable name is purely arbitrary. It does not need to be the same as the element name.

Still sticking with very simple mappings, let's consider the case where there is a length restriction on the target model that is shorter than the one on the source model - in this case, 20 characters.

Source Structure Target Structure
    TLeft
      a2 : string [0..1]
  
    TRight
      a2 : string [0..1] {maxlength = 20}
  
The left instance is transformed to the right instance by copying a2 to a2, but tgt.a2 can only be 20 characters long

There are 3 different ways to express this mapping, depending on what should happen when the length of src.a is > 20 characters:

src.a2 as a -> tgt.a2 = truncate(a, 20); // just cut it off at 20 characters
src.a2 as a where a2.length <= 20 -> tgt.a2 = a; // ignore it
src.a2 as a check a2.length <= 20 -> tgt.a2 = a; // error if it's longer than 20 characters

Note that it is implicit here that the transformation engine is not required to expected to validate the output against that underlying structure definitions that may apply to it. An application may - and usually should - validate the outputs after the transforms, but the transform engine itself does not automatically validate the output (e.g. it does not assume that it's the final step in the process).

Now for the case where there is a simple type conversion between the primitive types on the left and right, in this case from a string to an integer.

Source Structure Target Structure
    TLeft
      a21 : string [0..1]
  
    TRight
      a21 : integer [0..1]
  
The left instance is transformed to the right instance by copying a21 to a21, but a21 is converted to an integer

There are 3 different ways to express this mapping, depending on what should happen when a is not an integer:

src.a21 as a -> tgt.a21 = cast(a, "integer"); // error if it's not an integer
src.a21 as a where a.convertsToInteger() -> tgt.a21 = cast(a, "integer"); // ignore it
src.a21 as a where at1.convertsToInteger().not() -> tgt.a21 = 0; // just assign it 0

More than one of these mapping rules may be present to handle all possible cases - e.g. rule_a21b combined with rule_a21c.

Note that the mapping language does not itself define which primitive types exist. Typically, primitive types are defined by the underlying type system for the source and target trees, and the implementation layer makes these types available to the mapping language using the FHIRPath primitive types. The mapping language uses the FHIRPath syntax for primitive constants.

Back to the simple case where src.a22 is copied to tgt.a22, but in this case, a22 can repeat (in both source and target):

Source Structure Target Structure
    TLeft
      a22 : string [0..*]
  
    TRight
      a22 : string [0..*]
  
The left instance is transformed to the right instance by copying a22 to a22, once for each copy of a22

The transform rule simply asserts that a22 maps to a22. The engine will apply the rule once for each instance of a22:

src.a22 as a -> tgt.a22 = a;

This will create one a22 in TRight for each a22 in TLeft.

A more difficult case is where the source allows multiple repeats, but the target doesn't:

Source Structure Target Structure
    TLeft
      a23 : string [0..*]
  
    TRight
      a23 : integer [0..1]
  
The left instance is transformed to the right instance by copying a23 to a23, but there can only be one copy of a23

Again, there are multiple different ways to write this, depending on out desired outcome if there is more than one copy of a23:

src.a23 as a -> tgt.a23 = a;  // leave it to the transform engine
src.a23 only_one as a -> tgt.a23 = a;  // transform engine throws an error if there is more than one
src.a23 first as a -> tgt.a23 = a;  // Only use the first one
src.a23 last as a -> tgt.a23 = a;  // Only use the last one

Leaving the outcome to the transform engine is not recommended; it might not always know whether a property is confined to a single value, and exactly what happens is unpredictable. However, there are some circumstances where the appropriate action is to defer resolution, so this is allowed.

Most transformations involve nested content. Let's start with a simple case, where element aa contains ab:

Source Structure Target Structure
    TLeft
      aa : [0..*]
        ab : string [1..1]
  
    TRight
      aa : [0..*]
        ab : string [1..1]
  
The left instance is transformed to the right instance by copying aa to aa, and within aa, ab to ab

Note that there is no specified type for the element aa. Some structure definitions (FHIR resources) do leave these elements as anonymously typed, while others explicitly type them. However, since the mapping does not refer to the type, its literal type is not important.

src.aa as s_aa -> tgt.aa as t_aa then { // make aa exist
  s_aa.ab as ab -> t_aa.ab = ab; // copy ab inside aa
};

This situation is handled by a pair of rules: the first rule establishes that relationship between src.aa and tgt.aa, and assigns 2 variable names to them. Then, the rule contains an additional set of rules (though only one in this example) to map with the context of s_aa and t_aa.

An alternate approach is to move the dependent rules to their own group:

src.aa as s_aa -> tgt.aa as t_aa then ab_content(s_aa, t_aa); // make aa exist

group ab_content(source src, target tgt) {
  src.ab as ab -> tgt.ab = ab; // copy ab inside aa
}

Note that variables are divided into source and target; source variables are read-only, and cannot have their properties changed. Variable names may be reused in different contexts - they are only valid within the group or rule that defines them, and any dependent rules or groups.

A common translation pattern is to perform a translation e.g. from one set of codes to another

Source Structure Target Structure
    TLeft
      d : code [0..1]
  
    TRight
      d : code [0..1]
  
The left instance is transformed to the right instance by translating src.d from one set of codes to another

The key to this transformation is the ConceptMap resource, which actually specifies the mapping from one set of codes to the other:

src.d as d -> tgt.d = translate(d, 'uri-of-concept-map', 'code');

This asks the mapping engine to use the $translate operation on the terminology server to translate the code using a specified concept map, and then to put the code value of the return translation in tgt.d.

Another common translation is where the target mapping for one element depends on the value of another element.

Source Structure Target Structure
    TLeft
      i : string [0..1]
      m : integer [1..1]
  
    TRight
      j : [0..1]
      k : [0..1]
  
How the left instance is transformed to the right instance depends on the value of m: if m < 2, then i maps to j, else it maps to k

This is managed using FHIRPath conditions on the mapping statements:

src.i as i where m < 2 -> tgt.j = i;
src.i as i where m >= 2 -> tgt.k = i;

Many/most trees are fully and strongly typed. In these cases, the mapping language can make use of the typing system to simplify the mapping statements.

Source Structure Target Structure
    TLeft
      aa : TLeftInner [0..*]

    TLeftInner
      ab : string [1..1]
  
    TRight
      aa : : TRightInner [0..*]

    TRightInner
      ab : string [1..1]
  
The left instance is transformed to the right instance by copying aa to aa, and within aa, ab to ab

This is the same case as Step 7 above, but the mapping statements take advantage of the types:

src.aa -> tgt.aa;

group ab_content(source src : TLeftInner, target tgt : TRightInner) <<types>> {
  src.ab -> tgt.ab;
}

There are 2 different things happening in this short form:

  1. group for types - the "for types" indicates that this group is the default group to apply any time an element of type TLeftInner is mapped to a TRightInner
  2. Both the rules take advantage of the fact that the types of both source and target are known, and compatible, and instruct the mapping execution engine to make the target appropriately

In the case of the first rule (rule_aa), the engine finds a need to map aa to aa and determines that it must map from TLeftInner to a TRightInner. Since a group is defined for this purpose, it creates a TRightInner in tgt.aa, and then applies the discovered rule as a dependency rule. Inside that rule that instructs the mapping engine to make tgt.ab from src.ab. It knows that both are primitive types, and compatible, and can apply this correctly. This short form is only applicable when there is only one source and target, when the types of both are known, and when no other dependency rules are nominated.

If the target element is polymorphic (can have more than one type), then the correct type of the target can only be inferred from the source type:

group ab_content(source src : TLeftInner, target tgt : TRightInner) <<type+>> {
  src.ab -> tgt.ab;
}

Not only is this group the default for (TLeftInner:TRightInner), if the engine has a TLeftInner with an unknown target type, it should create a TRightInner, and proceed as above.

It is an error if the engine locates more than one group of rules claiming to be the correct group for a type pair of a single source type.

It's now time to start moving away from relatively simple cases to some of the harder ones to manage mappings for. The first mixes list management, and converting from a specific structure to a general structure:

Source Structure Target Structure
    TLeft
      e : string [0..*]
      f : string [1..1]
  
    TRight
      e : [0..*]
        f : string [1..1]
        g : code [1..1]
  
The left instance is transformed to the right instance by adding one instance of tgt.e for each src.e, where the value goes into tgt.e.f, and the value of tgt.e.g is 'g1'. src.f is also transformed into the same structure, but the value of tgt.e.g is 'g2'. As an added complication, the value for src.f must come first

This leads to some more complex mapping statements:

src.e as s_e -> tgt.e as t_e then {
  for s_e -> t_e.f = s_e, t_e.g = 'g1';
};

src.f as s_f -> tgt.e as t_e first then {
  s_f -> t_e.f = s_f, t_e.g = 'g2';
};

The second example for reworking structure moves cardinality around the hierarchy. in this case, the source has an optional structure that contains a repeating structure, while the target puts the cardinality at the next level up:

Source Structure Target Structure
    TLeft
      az1 :[0..1]
        az2 : string [1..1]
        az3 : string [0..*]
  
    TRight
      az1 :[0..*]
        az2 : string [1..1]
        az3 : string [0..1]
  
The left instance is transformed to the right instance creating on tgt.az1 for every src.az1.az3, and then populating each az1 with the matching value of az3, and copying the value of az2 to each instance

The key to setting this mapping up is to create a variable context for src.az1, and then carry it down, performing the actual mappings at the next level down:

// setting up a variable for the parent
src.az1 as s_az1 then {

  // one tgt.az1 for each az3
  s_az1.az3 as s_az3 -> tgt.az1 as t_az1 then {
    // value for az2. Note that this refers to a previous context in the source
    s_az1.az2 as az2 -> t_az1.az2 = az2;

    // value for az3
    s_az3 -> tgt_az1.az3 = src_az3;
  };
};

Simple mappings, such as we've dealt with so far, where the source and target structure both have the same scope, and there is only one of each, are all well and good, but there are many mappings where this is not the case. There is a set of complications when dealing with multiple instances:

  • If there are multiple source inputs, how does the application know what they are? Sometimes, they are just independent inputs, but more often, the inputs are dependent on references in the source input, and therefore which source inputs are required depends on the mapping rules
  • If there are multiple output instances, how are they identified as they are created, and how do the target models reference each other? Mostly, the answer is that it depends on the context; the actual identification details are not part of the mapping
  • It may even be the case that the kind of output structure to produce depends on the mapping rules, so the application can't create the target structure before invoking the map

For our first example, we're going to look at creating multiple output structures from a single input structure.

Source Structure Target Structure
    TLeft
      f1 : String [0..*];
  
    TRight
      ptr : Resource(TRight2) [0..*]

    TRight2
      f1 : String [1..1];
  
The left instance is transformed to the right instance creating a copy of TRight2 for each f1 in the source, and then putting the value of src.f1 in TRight2.f1

The key to setting this mapping up is to create a variable context for src.az1, and then carry it down, performing the actual mappings at the next level down:

src.f1 as s_f1 -> create("TRight2") as rr, tgt.ptr = reference(rr) then {
  s_f1 -> rr.f2 = srcff;
};

This mapping statement makes use a special known value "null" for the target context to indicate that the created element/object of type "TRight2" doesn't get added to any existing target context. Instead, it will only be available as a context in which to perform further mappings - as rule f1a does.

The mapping engine passes the create request through to the host application, which is using the mapping. It must create a valid instance of TRight, and identify it as appropriate for the technical context in which the mapping is being used. The reference transform is also passed back to the host application for it to determine how to represent the reference - but this is usually some kind of URL.

For our second example, we're going to look at the reverse: where multiple input structures create a single input structure.

Source Structure Target Structure
    TLeft
      ptr : Resource(TLeft2) [0..*]

    TLeft2
      f2 : String [0..*];
  
    TRight
      f2 : String [1..*];
  
The left instance is transformed to the right instance finding each ptr reference, getting its value for f1, and adding this in tgt.f2

The first task of the map is to ask the application host to find the structure identified by src.ptr, and create a variable for it

  src.ptr as t then {
    t.f2 -> tgt.f2
  }

This specification includes transforms that map between version R4B and this version (R5). These map files exercise quite a bit of the mapping language grammar, and can be found at Transforms between R4 and R5