HL7.FHIR.UV.GENOMICS-REPORTING\Appendix D: External Coding Systems

This page is part of the Genetic Reporting Implementation Guide (v1.1.0: STU 2 Ballot 1) based on FHIR R4. The current version which supercedes this version is 2.0.0. For a full list of available versions, see the Directory of published versions

Appendix D: External Coding Systems

The following is an alphabetical table of external coding systems often used in the genomics space, with FHIR CodeSystem URIs and OIDs where available. See the core specification guidance for more details.

Coding System Name (Source)	Description	Homepage (FHIR CodeSystem URI)	OID	LRI Code

ClinicalTrials.gov (ClinicalTrials.gov/NLM)		http://clinicaltrials.gov (http://clinicaltrials.gov)	2.16.840.1.113883.3.1077
ClinVar Variant ID (NCBI/NLM)	ClinVar processes submissions reporting variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar includes simple and complex variants composed of multiple small variants. However, it now also includes large structural variants, which have a known clinical implication. So now simple, complex and many structural variants can all be found in ClinVar. The ClinVar records have a field for Allele ID and for Variant ID. We focus on the variant ID in this guide. This coding system uses the variant ID as the code and the variant name from NCBI’s "variant_summary.txt.gz" file as the code’s print string. The "variant_summary.txt.gz" file caries more than 20 useful fields, including the separate components of the variant name, the cytogenetic location, the genomic reference, etc. So based on the Variant ID, you can use ClinVar to find most you would ever want to know about the variant. In the LHC Clinical Table Search Service and LHC-Forms, we have indexed many of these attributes to assist users and applications that need to find the ID for a particular variant.	https://www.ncbi.nlm.nih.gov/clinvar (http://www.ncbi.nlm.nih.gov/clinvar)	not assigned yet	CLINVAR-V
ClinVar Accesion ID (RCV) (NCBI/NLM)	RCV' refers to the first 3 letters of the accession calculated by ClinVar to aggregate information from all submissions interpreting the same phenotype relative to the same variant or set of variants.	https://www.ncbi.nlm.nih.gov/clinvar (not assigned yet)	not assigned yet
COSMIC (Wellcome Trust Sanger Institute)	This table includes only simple somatic (cancer) mutations, one per unique mutation ID. The code is the COSMIC mutation ID, and the name is constructed from Ensembl transcript reference sequences and p.HGVS that use the single letter codes for amino acids. It carries fields analogous to most of the key fields in ClinVar, but its referencesequences are Ensembl transcript reference sequences with prefixes of ENST; it specifies amino acid changeswith the older HGVS single letter codes and it carries examples of primary cancers and primary tissues - fields thatare not in ClinVar.COSMIC's source table includes multiple records per mutation - one per submission. The COSMIC- SimpleVariants table that we have extracted from the original file includes only one record per unique mutation – a total ofmore than 3 million records.These contents are copyright COSMIC (http://cancer.sanger.ac.uk/cosmic/license). LHC has produced a look uptable for these records, and for users to look up particular mutation IDs, both with permission from COSMIC. However, interested parties must contact COSMIC directly for permission to download these records.	http://cancer.sanger.ac.uk/cancergenome/projects/cosmic (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic)	2.16.840.1.113883.3.912	COSMIC-Smpl, COSMIC-Stru
Cytogenetic (chromosome) location (NCBI/NLM)	Chromosome location (AKA chromosome locus or cytogenetic location), is the standardized syntax for recording the position of genes and large variants. It consists of three parts: the Chromosome number (e.g. 1-22, X, Y), an indicator of which arm – either “p” for the short or “q” for the long, and then generally a series of numbers separated by dots that indicate the region and any applicable band, sub-band, and sub-sub-band of the locus (e.g. 2p16.3). There are other conventions for reporting ranges and locations at the ends of the chromosomes. The table of these chromosome locations was loaded with all of the locations found in NCBI’s ClinVar variation tables. It will expand as additional sources become available. This does not include all finely grained chromosome locations that exist. Users can add to it as needed.	https://ghr.nlm.nih.gov/primer/howgeneswork/genelocation (not assigned yet)	2.16.840.1.113883.6.335	Chrom-Loc
dbSNP (DBSNP : Single Nucleotide Polymorphism database )	The Short Genetic Variations database (dbSNP) is a public-domain archive maintained by NCBI for a broad collection of short genetic polymorphisms. The SNP ID is unique for each position and length of DNA change. For example, a change of 3 nucleotides willhave a different SNP ID than a change of 4 nucleotides at the same locus, but the code will be the same for all changes at the same locus and with the same length. So to specify a variation in relation to a referece sequence, both the alt allele and the SNP code must be included.	http://www.ncbi.nlm.nih.gov/projects/SNP (http://www.ncbi.nlm.nih.gov/projects/SNP)	2.16.840.1.113883.6.284	dbSNP
dbVar - Germline (NCBI/NLM)	dbVar is NCBI's database of genomic structural variations (including copy number variants) that are larger than 50 contiguous base pairs. It is the complement of dbSNP, which identifies variants occurring in 50 or fewer contiguous base pairs. dbVar contains insertions, deletions, duplications, inversions, multi-nucleotide substitutions, mobile element insertions, translocations, and complex chromosomal rearrangements. dbVar carries structured Germline and Somatic variants in separate files. Accordingly, we have divided the coding system for dbVar the same way. This coding system represents the Germline dbVar variants. Its record ID may begin with one of four prefixes: nsv, nssv, esv and essv. These are accession prefixes for variant regions (nsv) and variant calls (or instances, nssv), respectively. Typically, one or more variant instances (nssv – variant calls based directly on experimental evidence) are merged into one variant region (nsv – a pair of start-stop coordinates reflecting the submitters’ assertion of the region of the genome that is affected by the variant instances). The “n” preceding sv or indicates that the variants were submitted to NCBI (dbVar). The prefix, “e” for esv and essv represent variant entities (corresponding to NCBI’s nsvand nssv) that were submitted to EBI (DGVa). The relation between variant call, and variant region, instances is many to one. The LHC lookup table for dbVar germline variations includes both variant instances (essv or nssv) and the variant region records (nsv, esv). Users can sub-select by searching on the appropriate prefix.	https://www.ncbi.nlm.nih.gov/dbvar/ (not assigned yet)		dbVar-GL, dbVar-Som
ENSEMBL genomic reference sequence (European Bioinformatics Institute )	Set of Ensembl gene reference sequences whose identifiers have a prefix of "ENSG." It only includes genomic sequences associated with genes and uses the whole build plus the chromosome number to identify chromosome reference sequences, rather than a separate set of reference sequence identifier as NCBI does. LHC has not yet produced a convenient look up table for these files.	http://www.ensembl.org (http://www.ensembl.org)	not assigned yet	Ensembl-G
ENSEMBL protein reference sequence (European Bioinformatics Institute )	Set of Ensembl gene reference sequences whose identifiers have a prefix of "ENSP" and correspond to NCBI's "NP_" reference sequence identifiers. LHC has not yet produced a convenient look up table for these files.	http://www.ensembl.org (not assigned yet)		Ensembl-P
ENSEMBL transcript reference sequence (European Bioinformatics Institute )	Set of Ensembl protein reference sequences. Their identifiers are distinguished by the prefix of "ENST," and correspond to NCBI's "NM_" reference sequence identifiers. LHC has not yet produced a convenient look up table for these files.	http://www.ensembl.org (not assigned yet)		Ensembl-T
Gene Code (NCBI/NLM)	When applicable, this variable identifies the gene on which the variant is located. The gene identifier is also carriedin the transcript reference sequence database, and is part of a full HGVS expression. Not all genes have HGNCnames and codes so NCBI has created gene IDs that cover the genes that are not registered by HGNC.	https://www.ncbi.nlm.nih.gov/gene/ (not assigned yet)		NCBI-gene code
GL String	Genotype List String grammar is a way of formatting strings for describing HLA and KIR genotyping results.	https://glstring.org (not assigned yet)
HGNC - GeneID (HGNC: HUGO Gene Nomenclature Committee )	The HGNC gene table carries the gene ID, gene symbol and full gene name. Gene IDs must be sent as codes, and begin with "HGNC:". gene symbols should be sent as display. GENE IDs are specific to the species, whereas gene symbol and name are shared by all species with the same gene. The HGNC-Symb table provided by HUGO carries only human genes and is available in a table by LHC. The code for this coding system is the HGNC gene code, the "name" or print string is the HGNC gene symbol. More than 28,000 human gene symbols and names have been assigned so far, including almost all of the protein coding genes. NCBI creates what might be thought of as interim codes. HGNC codes may be sent without the "HGNC:" prefix in older systems so caution must be taken to confirm alignment.	http://www.genenames.org (http://www.genenames.org/geneId)		HGNC-Symb
HGNC - GeneGroup (HGNC: HUGO Gene Nomenclature Committee )	HGNC also provides an index on gene families/groups. GeneGroup IDs do not begin with "HGNC:", so care must be made to ensure alignment of concepts when viewing an HGNC ID from an older system that may be referring to the GeneID and not a gene group.	http://www.genenames.org (http://www.genenames.org/genegroup)		HGNC-Symb
HGVS - Genomic syntax (HGVS: Human Genome Variation Society)	HGVS syntax that describes the variations (mutations) at the genome level (the DNA before it is spliced to removeintrons). The genomic syntax statements which can describe simple or structural variants are distinguished by a leading "g."	http://varnomen.hgvs.org (http://varnomen.hgvs.org)		HGVS.g, HGVS.p, HGVS.c
HLA Nomenclature (European Bioinformatics Institute )	Human leukocyte antigen (HLA) complex contains more than 220 genes that encode for the proteins of the immune system. HLA alleles are most commonly used for histocompatibility testing for stem cell and solid organ transplantation. The WHO Nomenclature Committee for Factors of the HLA System is responsible for a commonnomenclature of HLA alleles, allele sequences, and quality control, to communicate histocompatibility typing information to match donors and recipients. An HLA allele is defined as any set of variations found on a sequence of DNA comprising a HLA gene. So, if thereare five variations found in this one gene sequence, this set is defined as one allele (vs. the definition of an allele being the variation found between the test specimen and the reference along a contiguous stretch of DNA). In the case of HLA, the contiguous stretch of DNA represents the entire gene, and the variations do not need to be contiguous within the gene sequence. Each HLA allele name has a unique name consisting of the gene name followed by up to four fields, each containing at least two digits, separated by colons. There are also optional suffixes added to indicate expressionstatus. For the full specification, please go to this website: http://hla.alleles.org/nomenclature/naming.html.HLA nomenclature can also be used to represent sets of alleles that share sequence identity in the Antigen Recognition Site (ARS). G-groups are alleles that have identical DNA sequences in the ARS, while P-groups arealleles that have identical protein sequences in the ARS. These are described, respectively, inhttp://hla.alleles.org/alleles/g_groups.html and http://hla.alleles.org/alleles/p_groups.html .	http://www.ebi.ac.uk/ipd/imgt/hla (http://www.ebi.ac.uk/ipd/imgt/hla)	2.16.840.1.113883.6.341	HLA-Allele
Human Phenotype Ontology (HPO - Human Phenotype Ontology)	The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as atrial septal defect.	https://hpo.jax.org/app/ (not assigned yet)		HPO
ISCN (Interational System for Human Cytogenetic Nomenclature)	Like HGVS, The International System for Human Cytogenetic Nomenclature (ISCN) is a syntax. It came out of cytopathology and deals with reporting karyotypes down to the chromosome fusions and many types of small copy number variants. However, cytogenetics is out of the scope in this guide. ISCN syntax is often used to report large deletion-duplications in structural variants and to include other variants that have been observed.	https://www.karger.com/Book/Home/271658 (not assigned yet)		ISCN
LOINC (Regenstrief Institute)	Logical Observation Identifiers Names and Codes (LOINC®) provides a set of universal codes and names for identifying laboratory and other clinical observations. One of the main goals of LOINC is to facilitate the exchange and pooling of results for clinical care, outcomes management, and research. LOINC was initiated by Regenstrief Institute research scientists who continue to develop it with the collaboration of the LOINC Committee.	https://loinc.org (https://loinc.org)		LN
LRG : Locus Reference Genomic Sequences (LRG : Locus Reference Genomic)	LRG is a manually curated record that contains stable, and thus un-versioned, reference sequences designed specifically for reporting sequence variants with clinical implications. It provides a genomic DNA sequence representation of a single gene that is idealized, has a permanent ID (with no versioning), and core content that never changes. Their database includes maps to NCBI, Ensembl and UCSC reference sequences. It contained sequences for a total of 1073 genes as of April 2016, with identifiers of the form: "LRG_####", where#### can be from 1 to N, and N is the last gene processed. See PMIDs: 24285302, 20398331, and 20428090 for more information.	http://www.lrg-sequence.org (http://www.lrg-sequence.org)	2.16.840.1.113883.6.283	LRG-RefSeq
OMIM : Online Mendelian Inheritance in Man (OMIM : Online Mendelian Inheritance in Man)		http://www.omim.org (http://www.omim.org)	2.16.840.1.113883.6.174
PHARMGKB : Pharmacogenomic Knowledge Base (PHARMGKB : Pharmacogenomic Knowledge Base )		http://www.pharmgkb.org (http://www.pharmgkb.org)	2.16.840.1.113883.3.913
PubMed (PubMed )		http://www.ncbi.nlm.nih.gov/pubmed (http://www.ncbi.nlm.nih.gov/pubmed)	2.16.840.1.113883.13.191
RefSeq (NCBI/NLM)	Subset of NCBI Human RefSeqs with prefix of NC_ or NG_. Those prefixed with "NC_" represent the whole genomic RefSeq for individual chromosomes. Those prefixed with"NG_" represent genes with all of their introns and flanking regions and other larger or smaller genomic sequences.These are available separately in the NCBI source data file, which includes all human RefSeqs (including those with prefix of NR_ or XM_): ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens	http://www.ncbi.nlm.nih.gov/refseq (http://www.ncbi.nlm.nih.gov/refseq)	2.16.840.1.113883.6.280	RefSeq-G, RefSeq-P, RefSeq-T
Sequence Ontology (Sequence Ontology)		http://sequenceontology.org (http://sequenceontology.org)	not assigned yet
Pharmacogenomic Star Alleles (PharmVar Pharmacogene Variation Consortium)	The star allele nomenclature is commonly used in pharmacogenomics as shorthand to specify one or more specific variants in a gene that is known to impact drug metabolism or response. A star allele can identify either a single variant or a group of variants found in cis, and therefore it usually represents a haplotype. A star allele name is composed of the gene symbol and an allele number, separated by an asterisk, e.g. TPMT2. By convention, the 1 allele represents the allele that contains the "reference" sequence, although that is not true in all cases. Closely related alleles may be assigned a common number and be differentiated by a unique letter that specifies the suballele (e.g., TPMT3A, TPMT3B). Pharmacogenomics tests commonly report patient phenotypes as diplotypes, i.e. 1/3A.The star nomenclature system is inadequately defined and inconsistently adopted. Therefore, although the system we are proposing supports the inclusion of pharmacogenomics star alleles as a legacy syntax, we strongly encourage messages that include star alleles to rigorously define those alleles through other means.	https://pharmvar.org/genes		Star-Allele