Genomics Reporting Implementation Guide
2.0.0 - trial-use

This page is part of the Genetic Reporting Implementation Guide (v2.0.0: STU 2) based on FHIR R4. This is the current published version in its permanent home (it will always be available at this URL). For a full list of available versions, see the Directory of published versions

OperationDefinition: Find Subject Variants

Description

Use this operation to retrieve variants with precise endpoints from a specified genomic region for a specified patient. If the range in question has been studied, the operation returns a FHIR Parameters resource containing variants overlapping the region. If the patient or the specified region has not been studied, the operation returns a 404 error.

IN Parameters

  • The 'region' parameter is specified as a 0-based interval range of integers. Variants that overlap the range (variants 1-5 in the picture) would be returned, as would a point mutation at position x, but not a point mutation at position y. (Note that an insertion that starts before x, regardless of length, would not overlap x..y, and would therefore not be returned).
  • The 'genomicRefSeq' parameter is a genomic reference sequence specified as a valid NCBI chromosome-level ('NC_') build 37 or build 38 identifier, or a valid mitochondrion identifier (NC_012920.1, NC_001807.4).
region of interest

OUT Parameters

  • Response shall include 0..1 observation (RegionStudied), 0..* observation (Variant), 0..* observation (SequencePhaseRelationship).
  • The 'regionStudied' parameter is an instance of the observation (RegionStudied) profile, and can optionally be used to reflect back those studied regions that overlap with the query range. For WGS, the entire region of the request may have been studied and can be represented as a single component:ranges-examined. For WES, each examined exon overlapping the query range can be represented in its own component:ranges-examined. For targeted panels, each examined region overlapping the query range can be represented in its own component:ranges-examined.
  • The 'variant' parameter instantiates the observation (Variant) profile, there being one instance for each identified variant. Variants must be represented using a combination of all of these components: genomic-ref-seq; ref-allele; alt-allele; coordinate-system (valued with '0-based interval counting'); exact-start-end. Additional components can optionally be included.
  • Implicit in this service is that variants in the requested range, regardless of how they are formatted/represented/stored in a server, are returned. See 'Variant normalization' section below for guidance on variant normalization. Furthermore, where a server is storing variants aligned to multiple builds, it may be necessary for the server to translate or 'lift over' the specified region into corresponding regions in other builds. See 'Variant liftover' section below for more details.
  • If the region hasn't been studied, return a 404 response code.

FindSubjectVariants

OPERATION: FindSubjectVariants

The official URL for this operation definition is:

http://hl7.org/fhir/uv/genomics-reporting/OperationDefinition/find-subject-variants

Parameters

UseNameCardinalityTypeBindingDocumentation
INsubject1..1string
(reference)

The subject of interest.

INregion1..1Range

Region of interest is specified as a 0-based integer interval range. Variants that overlap the range are returned.

INgenomicRefSeq1..1string
(token)

Genomic reference sequence is a valid NCBI chromosome-level ('NC_') build 37 or build 38 identifier, or a valid mitochondrion identifier (NC_012920.1, NC_001807.4)

OUTregionStudied0..*canonical

[Profile: http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/region-studied] Must include 1..* component:ranges-examined; 1..1 component:coordinate-system (valued with '0-based interval counting'); 1..1 component:genomic-ref-seq.

OUTvariant0..*canonical

[Profile: http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant] Must include valueCodeableConcept; component:genomic-source-class; component:genomic-ref-seq; component:allelic-state; component:ref-allele; component:alt-allele; component:coordinate-system (valued with '0-based interval counting'); component:exact-start-end.

OUTsequencePhaseRelationship0..*canonical

[Profile: http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/sequence-phase-relationship] Must include valueCodeableConcept; 2..2 derivedFrom:variant.

Notes:

Variant normalization

Optionality in the Variant profile allows for different implementations to represent variants in different ways. For instance, the following variant representations are synonymous:

  • component:variation-code: ClinVar ID = 237860
  • component:coding-hgvs: HGVS = NM_001195798.2:c.12G>A
  • component:genomic-hgvs: HGVS = NC_000019.9:g.11200236G>A
  • component:genomic-hgvs: HGVS = NC_000019.10:g.11089560G>A
  • Multiple components:
    • component:genomic-ref-seq: NC_000019.10
    • component:ref-allele: G
    • component:alt-allele: A
    • component:coordinate-system: 0-based interval counting
    • component:exact-start-end: start = 11089559

While the specific implementation of an Operation is outside the scope of HL7, we do provide guidance on how the above representations might be normalized such that any would be found and returned in a request for variants overlapping, say, the region of the LDLR gene (region=11,089,362-11,133,830; genomicRefSeq=NC_000019.10). There are likely other effective normalization strategies beyond what is described here.

One approach to normalization is to convert all representations to a canonical form, such as the NCBI's Sequence Position Deletion Insertion (SPDI) format. Variant queries then only need to query a single format.

'SPDI' is the NCBI's variation notation for variants with known breakpoints. The notation represents an observed variant sequence using deleted and inserted sequences at a given position in a reference sequence. SPDI notation uses four fields and is written out as four elements delimited by colons S:P:D:I, where S=SequenceId; P=Position, a 0-based coordinate for where the Deleted Sequence starts; D=DeletedSequence, and I=InsertedSequence. Variation Services only support variants where the coordinates of both the upstream and downstream breakpoints are known (e.g. single nucleotide change, deletions at precise coordinates). Such variants can be encoded precisely using the SPDI notation.

NCBI Variation Services provide a rich set of APIs that can be used to normalize variants from many formats (e.g. HGVS, VCF) into SPDI, and to normalize variants in SPDI into a canonical SPDI format. The variant above, in canonical SPDI format, resolves to this: NC_000019.10:11089559:G:A, where it can easily be determined that it overlaps the requested region (NC_000019.10:11,089,362-11,133,830).

Variant liftover

LiftOver is a process whereby a genome position is converted from one genome assembly to another genome assembly. It is the process that, among other things, allows us to determine that these two variants are the same:

  • component:genomic-hgvs: HGVS = NC_000019.9:g.11200236G>A (build 37 representation)
  • component:genomic-hgvs: HGVS = NC_000019.10:g.11089560G>A (build 38 representation)

Several groups have identified edge cases that pose genome assembly conversion challenges (e.g. see PharmGKB's PharmCAT posting; Biostars posting). For example, NC_000001.11:145923295:C:C (build 38 representation) does not convert to a corresponding build 37 representation using NCBI Variation Services. As a result, there is no requirement that servers normalize all variants against a single build.

Rather, where a server is storing variants aligned to multiple builds (and hasn't normalized all variants against a single build), it will be necessary for the server to lift over the query region into corresponding regions in other builds. For example, a query for variants in NC_000001.11:145507556-145513536 (build 38 range) will also need to query for variants in NC_000001.10:145921556-145927537 (build 37 range) in order to gather variants expressed against build 38 and build 37, respectively.

Many efficient and open source lift over tools exist (e.g. many are listed here). As with variant lift over, translating a region between builds can also fail. For example, attempting to liftover NC_000001.11:145923295-145923296 (build 38 range) into a build 37 range with the UCSD Lift Over tool fails, because the region is partially deleted in build 37. In the (very uncommon) case of a failed lift over, a server should widen the query region as necessary in order to have a successful lift over. For example, the widened build 38 range NC_000001.11:145923285-145923306 will successfully translate into the build 37 range NC_000001.10:145511787-145511807.

Error Codes

NOTE: Imprecise implementations are allowed, where results contain some records outside the requested range. This is necessary to support many bioinformatics indexing schemes.

Valid response codes are shown in the following figure and described further in the table. Additional response codes (e.g. 5xx server error) may also be encountered.

error codes
Response Code Description
200 Successfully executed request (region was studied, variants may or may not have been found)
400 ERROR: Invalid query parameters
404 ERROR: Data not found (e.g. no data found for patient, no data present for requested region for patient)

Examples

Scenario: Retrieve all variants for patient HG00403 that overlap NC_000001.10:1-500.


IN parameters

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "subject",
      "valueString": "HG00403"
    },
    {
      "name": "region",
      "valueRange": {
        "low": {
          "value": 1
        },
        "high": {
          "value": 500
        }
      }
    },
    {
      "name": "genomicRefSeq",
      "valueString": "NC_000001.10"
    }
  ]
}

OUT parameters

{
  "resourceType": "Parameters",
  "parameter": [
    {
      "name": "regionStudied",
      "resource": {
        "resourceType": "Observation",
        "id": "rs-a43751ad52c94",
        "meta": {
          "profile": [
            "http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/region-studied"
          ]
        },
        "status": "final",
        "category": [
          {
            "coding": [
              {
                "system": "http://terminology.hl7.org/CodeSystem/observation-category",
                "code": "laboratory"
              }
            ]
          }
        ],
        "code": {
          "coding": [
            {
              "system": "http://loinc.org",
              "code": "53041-0",
              "display": "DNA region of interest panel"
            }
          ]
        },
        "subject": {
          "reference": "Patient/HG00403"
        },
        "component": [
          {
            "code": {
              "coding": [
                {
                  "system": "http://loinc.org",
                  "code": "51959-5",
                  "display": "Range(s) of DNA sequence examined"
                }
              ]
            },
            "valueRange": {
              "low": {
                "value": 1
              },
              "high": {
                "value": 500
              }
            }
          },
          {
            "code": {
              "coding": [
                {
                  "system": "http://loinc.org",
                  "code": "92822-6",
                  "display": "Genomic coord system"
                }
              ]
            },
            "valueCodeableConcept": {
              "coding": [
                {
                  "system": "http://loinc.org",
                  "code": "LA30100-4",
                  "display": "0-based interval counting"
                }
              ]
            }
          },
          {
            "code": {
              "coding": [
                {
                  "system": "http://loinc.org",
                  "code": "48013-7",
                  "display": "Genomic Reference Sequence"
                }
              ]
            },
            "valueCodeableConcept": {
              "coding": [
                {
                  "system": "http://www.ncbi.nlm.nih.gov/nuccore",
                  "code": "NC_000001.10"
                }
              ]
            }
          }
        ]
      }
    },
    {
      "name": "variant",
      "resourceType": "Observation",
      "id": "dv-5a7f781e83514",
      "meta": {
        "profile": [
          "http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/variant"
        ]
      },
      "status": "final",
      "category": [
        {
          "coding": [
            {
              "system": "http://terminology.hl7.org/CodeSystem/observation-category",
              "code": "laboratory"
            }
          ]
        }
      ],
      "code": {
        "coding": [
          {
            "system": "http://loinc.org",
            "code": "69548-6",
            "display": "Genetic variant assessment"
          }
        ]
      },
      "subject": {
        "reference": "Patient/HG00403"
      },
      "valueCodeableConcept": {
        "coding": [
          {
            "system": "http://loinc.org",
            "code": "LA9633-4",
            "display": "Present"
          }
        ]
      },
      "component": [
        {
          "code": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "48002-0",
                "display": "Genomic source class [Type]"
              }
            ]
          },
          "valueCodeableConcept": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "LA6683-2",
                "display": "Germline"
              }
            ]
          }
        },
        {
          "code": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "62374-4",
                "display": "Human Reference Sequence Assembly"
              }
            ]
          },
          "valueCodeableConcept": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "LA14029-5",
                "display": "GRCh37"
              }
            ]
          }
        },
        {
          "code": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "48013-7",
                "display": "Genomic Reference Sequence"
              }
            ]
          },
          "valueCodeableConcept": {
            "coding": [
              {
                "system": "http://www.ncbi.nlm.nih.gov/nuccore",
                "code": "NC_000001.10"
              }
            ]
          }
        },
        {
          "code": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "53034-5",
                "display": "Allelic State"
              }
            ]
          },
          "valueCodeableConcept": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "LA6706-1",
                "display": "heterozygous"
              }
            ]
          }
        },
        {
          "code": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "69547-8",
                "display": "Genomic Ref Allele [ID]"
              }
            ]
          },
          "valueString": "A"
        },
        {
          "code": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "69551-0",
                "display": "Genomic Alt Allele [ID]"
              }
            ]
          },
          "valueString": "G"
        },
        {
          "code": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "92822-6",
                "display": "Genomic coord system"
              }
            ]
          },
          "valueCodeableConcept": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "LA30100-4",
                "display": "0-based interval counting"
              }
            ]
          }
        },
        {
          "code": {
            "coding": [
              {
                "system": "http://loinc.org",
                "code": "81254-5"
              }
            ]
          },
          "valueRange": {
            "low": {
              "value": 300
            }
          }
        }
      ]
    },
    {
      "name": "sequencePhaseRelationship",
      "resourceType": "Observation",
      "id": "sid-dc2a23f0322c4",
      "meta": {
        "profile": [
          "http://hl7.org/fhir/uv/genomics-reporting/StructureDefinition/sequence-phase-relationship"
        ]
      },
      "status": "final",
      "category": [
        {
          "coding": [
            {
              "system": "http://terminology.hl7.org/CodeSystem/observation-category",
              "code": "laboratory"
            }
          ]
        }
      ],
      "code": {
        "coding": [
          {
            "system": "http://loinc.org",
            "code": "82120-7",
            "display": "Allelic phase"
          }
        ]
      },
      "subject": {
        "reference": "Patient/HG00403"
      },
      "valueCodeableConcept": {
        "coding": [
          {
            "system": "http://hl7.org/fhir/uv/genomics-reporting/CodeSystem/seq-phase-relationship",
            "code": "Cis",
            "display": "Cis"
          }
        ]
      },
      "derivedFrom": [
        {
          "reference": "#dv-5a7f781e83514"
        },
        {
          "reference": "#dv-5a7f781e83514"
        }
      ]
    }
  ]
}