This page is part of the Application Data Exchange Assessment Framework and Functional Requirements for Mobile Health (v0.1.0: STU 1 Ballot 1) based on FHIR R4. . For a full list of available versions, see the Directory of published versions
This section describes how an Actor should be assessed for conformance to this guide.
The conformity assessment must be readily reproducible across different assessors. It should be measurable against each requirement and groups of requirements functional areas and/or categories. However, pass/fail conformity assessment (e.g., as is done for UL Listed or CE Mark) is generally insufficient when there is great diversity in end-user requirements. This guide takes the approach that there are multiple levels of “conformance”, as has been done with other specifications.
Some Internet Engineering Task Force (IETF) specifications (known as Requests for Comments or RFC) have distinguished between conforming and fully conforming to the specification. A conforming system implements all SHALL requirements, but a fully conforming system implements all SHALL and SHOULD requirements. The distinction is useful, but the terminology used is subtle and may not be easily understood for users not familiar with this distinction. It shows up in a few commonly used RFCs, but is not common, and the distinction is lost on those not familiar with standards conformity assessment in general.
This guide describes a user-friendly stars rating system based on an ordinal scale of 1 to 5. Unlike some stars rating systems which use aggregated ratings of subjective assessors, this guide defines a rating system based on objective criteria readily reproduced by different assessors.
This section explains the scoring metrics where 0 stars is fully non-conforming, 1 star meets some requirements, 2 stars meets most of the requirements, 3 stars meets all required criteria, 4 stars meets all required criteria and some degree of recommended criteria, and 5 stars meets all required criteria, and most or all of the recommended criteria.
- (0 stars) The device or system under test (SUT) does not meet any SHALL criteria.
- (1 star) The SUT meets some (but not most) of the SHALL criteria.
- (2 stars) The SUT meets most of the SHALL criteria.
- (3 stars) The SUT meets all of the SHALL criteria.
- (4 stars) The SUT meets all of the SHALL criteria, and some (but not most) of the SHOULD criteria.
- (5 stars) The SUT meets all of the SHALL criteria and most or all of the SHOULD criteria.
For the purposes of this guide, most is defined as 50% or more. [should this be higher? –KWB]
Each feature is individually measured on a pass/fail basis for a SUT and includes one or more scenarios containing GIVEN/WHEN/THEN statements describing how to perform the test. Only SHALL and SHOULD criteria used in this assessment. Criteria using MAY or NEED NOT may be present to clarify allowed behavior and describe optional behaviors for which there is no value judgment. These criteria are often provided to clarify to implementors nuances that might be confusing without further explanation.
Features, Rules and Scenarios are referred to collectively as requirements.
A feature can have one or more rules or scenarios. Rules will have one or more scenarios. Each scenario represents an individual test. If the test fails, the enclosing rule and/or feature fails. A feature fails if one or more of its applicable tests fails.
Requirements are applicable to Apps, Devices or Infrastructure based on the presence of the @App, @Device or @Infra tags in the requirement. When a requirement begins with IF this is a precondition that must be true for the requirement to be applicable. For example, an App may allow for manual entry of data. When it does, there are certain requirements that become applicable.
Only applicable tests are performed, and only applicable features are reported in a result.
The results of conformance testing for a given SUT include:
A system is tested against all applicable requirements.
A given requirement may not be applicable to a given SUT, or the assessor may not have performed a specified test on the SUT (e.g., because test sponsor did not require those tests be performed).
For example, there is a requirement that the results of a blood pressure measurement be displayed to the user. This is a requirement of an App, but it is NOT a requirement of a Device. Thus, the results for this test would not be reported as they are Not Applicable for the device.
In another case, there are requirements on devices to be able to report blood pressure, heart rate, and respiration rate. However, these three observations are not reported by all devices, and aren’t always needed for every use case. Thus, a clinic evaluating devices for use in blood pressure monitoring may elect to Not Test a device against the sub-category containing requirements on respiration rate. In this case, the test was not performed.
For more clarity, the inability of a device to record respiration rate does not make the test “not applicable” when the category for basic device operations is chosen. This is a failure of the device to support that function. This does NOT indicate a flaw in the device. It merely reports the device’s inability to support that requirement.
As in laboratory diagnostics, tests provide objective evidence. Assessments perform computations on those results to enable interpretation of test results. The 0-5 star rating of a system is the assessment, the pass/fail status for each requirement are the tests.
The procedure for computing assessments is as follows. For a group of requirements (e.g., a category or sub-category):
To report the assessment, the following values should be provided:
These values provide the interpretation of the result (the star ranking) and allow sub-category results to be aggregated upwards to category results. Assessments can be computed at a category even when some of the sub-category tests have not been performed. In this case, the testing for the category must be recorded as being incomplete (and this status will propagate upwards to the next category and so on).
Technical specifications such as FHIR or CDA Implementation Guides, regulatory and other requirements, and even clinical guidelines can be assessed against this guide. However, because these are specifications, not systems, they require additional considerations when computing reporting assessments and results. What is important for implementers of a specification is to be able to determine the difference between what is minimally required, what can be supported with additional effort, and what simply cannot be done.
Two sets of results must be computed and assessed. The first set of results reports what the specification under test requires (the minimal results). The second set of results reports what the specification allows (the maximal results). An assessment is reported for each of these possible results. The minimal results show how the assessment would perform against an actual implementation conforming to the specification without any additional work. The maximal results show how the assessment would perform against an actual implementation that started off by conforming to the specification being tested, but was augmented to ensure conformance to this guide.
Consider evaluating the FHIR Observation Vital Signs and the AMA IHMI Observation Blood Pressure requirements, against both the Basic and Clinical Blood Pressure Observation requirements of this guide.
A FHIR Observation Resource conforming to either of these two guides will get 5 stars when evaluated against the Basic Blood Pressure Observation. One can safely choose an implementation meeting either one and expect that it will meet those requirements.
However, a FHIR Observation resource meeting only the minimum necessary requirements of the FHIR Observation Vital Signs profile will NOT meet the requirements of the Clinical Blood Pressure requirements of this guide, it will likely get two stars. But, one conforming to the AMI IHMI Observation Blood Pressure 3 requirements will meet the more stringent guidelines in the Clinical Blood Pressure requirements (it will receive 3 or more stars). Thus, if your application needs to meet the clinical blood pressure requirements of this guide, and you have access to an implementation meeting the AMI IHMI Observation Blood Pressure 3 requirements, then that should be the way to go (when other considerations are not relevant).
Even so, the FHIR Observation Vital Signs profile does not prohibit one from creating a FHIR Observation resource that also conforms to the Clinical Blood Pressure Observation requirements of this guide. The distance between these two represents a certain amount of effort that must be expended to go that final step. That effort may need to be weighed against the cost of acquiring an implementation supporting the more demanding profile.
Given two requirements S and G, where S describes the implementation guide being tested, and G describes a scenario this guide:
The assessment results for the maximal case will always be at least as good as the assessment results for the minimal case. Reporting both assessments enables users who may have access to an implementation supporting S to also support G, and to assess how much additional work is needed to make the output of the implementation of S support G.
For a given criteria group in this guide, the reporting recommendations are as follows:
Display the category name, followed by a number of filled stars given by the assessment ranking in gold (or light gray for B&W images). Follow that by a number of open stars outlined in black necessary to ensure that there are always 5 stars displayed. After the stars, include four numbers reported in the following form: {SHALL-passed}/{SHALL-total}+{SHOULD-passed}/{SHOULD-total}
Indent categories under subcategories if they are displayed in the same area.
An example report is given below:
|
|
|
Physical Activity and Sleep (0 stars) 0/3+1/3 |
Display the category name, followed by a number of stars filled in gold (or light-gray screen for B&W images) given by the minimal assessment ranking. Follow that by a number of open stars in gold (or light-gray screen in B&W images) necessary to bring the total up to the maximal assessment ranking (this may be 0). Follow that by a number of open stars outlined in black necessary to ensure that there are always 5 stars displayed. After the stars include four - six numbers reported in the form: {SHALL-min-passed}-{SHALL-max-passed}/{SHALL-total}+{SHOULD-min-passed}-{SHOULD-max-passed}/{SHOULD-total}
When min-passed and max-passed are the same value, they should be reported as a single number.
Indent categories under subcategories if they are displayed in the same area.
An example report is given below:
IG Name | |
---|---|
|
|
Physical Activity and Sleep (1 star) 1/6+2/6 |
NOTE: The colors of Gold and Black provide a color-blind safe palette.
Gherkin describes a language that is used for testing
applications. The core of this language is made up of three keywords GIVEN
, WHEN
and
THEN
and the conjunction AND
structured into scenarios (or examples) to test a specific feature.
This guide describes a requirement using the Feature: keyword in Gherkin.
Requirements can further be broken down into specific business rules using the Rule: keyword to describe a business rule under test. When rules are used, if any rule fails, the entire feature fails.
Each test is provided using the Scenario: keyword to describe how the feature can be tested.
Descriptive text will follow the Feature, Rule, or Scenario to provide more detailed information. This guide uses Rationale: as a keyword to explain the reason why a particular feature is important. Sometimes requirements are obvious, and include the rationale for a particular requirement aids in communicating the need for a feature.
The keywords have the following meanings:
AND
at the start of the next line (and subsequent lines).AND
or to reverse the logic of the
condition BUT
.This guide uses tags to identify the actors for which a feature is applicable. The form of these tags is @{Actor-Name}[-{Shall|Should}]
When only the actor name is given, the feature has Shall and Should requirements and these are described in more detail for rules or scenarios within the feature.
Specific rules or scenarios can also be marked with these tags to create rules specific to the actors to which they are applied.
An example requirement is provided below.
@App @Device-Should
Feature: User data SHALL be hidden after a period of inactivity.
This example illustrates the form of a Functional Requirement. Each requirement will be recorded as a feature in Gherkin. The heading preceding the requirement will provide the requirement identifier and name. Requirement identifiers are numbered in sequence, and are preceded by a short mnemonic that identifies the requirement category. This example requirement has the identifier EX-1, and the name “Example Feature”. Rationale: User data should not be exposed when a user is not interacting with the device or application. Hiding the screen prevents user data from being exposed.
Rule: A screen saver must be present that SHALL hide the user’s data after a configurable period of time has elapsed.
Rule: If the user does not configure the screen saver, then the default timeout period SHALL be used.