Introduction

In the clinical diagnosis of dementia, structural MRI plays a key role in excluding other pathologies, as well as revealing patterns of brain atrophy [1, 2]. These patterns can act as imaging biomarkers to assist nosological diagnosis and differentiation between subtypes of dementia [3]. In clinical neuroradiology, visual assessment of brain atrophy patterns is commonly supported through the use of visual rating scales, such as the global cortical atrophy (GCA) or medial temporal atrophy (MTA) scale [4]. These semi-quantitative measures have shown good diagnostic accuracy to distinguish dementia from normal ageing and can help mediate the subjectivity of visual assessment [5]. However, they are sensitive to the experience and perspective of the clinician and can be limited by their relatively coarse measurement of atrophy and floor and/or ceiling effects [6, 7]. These qualities make it difficult to use such scales to identify subtle volumetric abnormalities in younger patients. Also, sensitivity to abnormalities in prodromal dementia patients is still limited [7]. With the focus on developing prophylactic and disease-modifying treatments for dementia, the need for robust methods of distinguishing between healthy ageing and dementia in its early stages is increasingly important [8].

These needs can potentially be addressed through the implementation of automated quantitative image analysis in the clinic. Volumetry is widely used in the research setting and has been used to effectively index morphological change from a variety of clinical interventions in phased and randomized controlled trials [9,10,11,12,13,14,15,16,17]. Quantitative volumetric reporting tools (QReports), which automatically quantify an individual patient’s regional brain volumes and compare them to healthy, age-specific reference populations, can potentially help neuroradiologists interpret the severity and distribution of brain atrophy and contextualize their findings by referencing normative brain volumes in healthy populations [18,19,20,21,22,23]. The limitations of routine visual assessment reveal the area of clinical need in which such tools can be integrated. Quantitative assessment of MRIs can provide more objective imaging biomarkers, contribute to the earlier identification of atrophy [24,25,26] and might improve the accuracy of radiological diagnosis of Alzheimer’s disease (AD) and other subtypes of dementia [18,19,20,21,22,23]. However, there remains a large discrepancy between the use of visual rating scales and the availability of QReports in the clinic. In a study of dementia imaging practices in Europe, 81.3% of the 193 centres surveyed reported routine use of the MTA scale, compared to only 5.7% regularly implementing QReports [27]. Respondents identified limited availability and concerns about time and interpretation difficulties as the barriers for use of these tools. Importantly, the survey also recognized the additional obstacles to implementation, including lack of standardization or clinical validation of proprietary tools, and the difficulty translating normative group-level quantitative data to the interpretation of individual patient data.

With the surge of commercial QReports for application in dementia clinics, general radiologists and neuroradiologists must decide whether to start implementing these methods in their clinical practice. However, there is a scarcity of evidence regarding the clinical application of QReports, especially relating to the impact on clinical management. It is important to clarify their technical and clinical validity as well as the best practices for responsibly integrating these tools into the existing clinical workflow. To this end, the quantitative neuroradiology initiative (QNI) was developed as a framework for the technical and clinical validation necessary to embed automated image quantification software into the clinical neuroradiology workflow. The QNI framework comprises the following steps: (1) establishing an area of clinical need and identifying the appropriate proven imaging biomarker(s) for the disease in question; (2) developing a method for automated analysis of these biomarkers, by designing an algorithm and compiling reference data; (3) communicating the results via an intuitive and accessible quantitative report; (4) technically and clinically validating the proposed tool pre-use; (5) integrating the developed analysis pipeline into the clinical reporting workflow and (6) performing in-use evaluation [2].

The aim of this review is to increase transparency by assessing the evidence surrounding the use of QReports according to these six steps. Evidence of step 1 has been outlined above; the area of clinical need we are addressing is dementia and the analysis of its associated volumetric biomarkers. Using steps 2–6 of the QNI framework as guidance, we present a systematic search methodology for finding (i) vendors of dementia and MRI-specific QReports that are either Conformité Européenne (CE) marked or certified by the Food and Drug Administration (FDA) and (ii) published evidence covering their technical/clinical evaluation and workflow/in-use evaluation. Furthermore, we present an unbiased narrative synthesis of the available evidence regarding the validation of volumetric tools applied in the memory clinic. In doing so, we aim to help neuroradiologists make informed decisions regarding these tools in their clinic.

Methods

The methods used to find relevant companies and QReports are outlined below. The vendor and product names identified were subsequently used as the search terms for an extensive search of the technical/clinical validation and workflow/in-use evaluation studies in the literature. We have followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [28,29,30] and our methodology has been registered in with the Prospective Register of Systematic Reviews (PROSPERO): number CRD42021233510.

Vendor and product search

Inclusion and exclusion criteria

The following inclusion criteria for proprietary QReports were used: (i) FDA or CE clearance, i.e. tool meets regulatory standards to be used clinically; (ii) target disorder of dementia/neurodegeneration, specified by companies for use in dementia MRI assessment; (iii) uses automated brain segmentation software (step 2 of the QNI framework); (iv) uses normative reference data for single-subject comparison; (v) MRI-based input and (vi) visualizes volumetry and atrophy-specific results presented in a structured report format (step 3 of the QNI framework).

Our exclusion criteria for proprietary products were (i) research tools that are not currently certified for clinical use via CE or FDA approval; (ii) non-MRI-based tools, e.g. for PET, EEG or CT only; (iii) generates a QReport focusing on results other than volumetry/atrophy, e.g. white matter lesions, vasculature, electrophysiology, tractography, brain tumour analysis or PET/spectroscopy; (iv) lack of normative reference data for single-subject comparison.

Search methodology: FDA-cleared product identification

Key word screening

We used the FDA database search function to download basic information for each approved application (https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfPMN/pmn.cfm). A total of 82,003 premarket 510(k) FDA notification clearances dating from 1996-present were downloaded in a text file from https://www.fda.gov/medical-devices/510k-clearances/downloadable-510k-files. By searching within this list using the keywords mentioned below, 828 “medical devices” were established for further review. Please note the words with an * are “wild-cards”, covering relevant suffixes of each word stem, for example “Radiolog*” covers “Radiology”, “Radiologist” and “Radiological”:

  • Neuro*

  • Brain

  • Quant*

  • MRI

  • Hippocamp*

  • Radiolog*

  • Atroph*

  • Cortical

  • Cortex

  • Dementia

  • Volume

  • Alzheimer*

  • Memory

  • Lobar

  • Lobe

  • Structur*

  • Segment*

  • Automat*

Eligibility screening

After manual checks of company name, date of approval, product name and description, 86 tools were deemed relevant for further examination. Several tools were excluded at this stage if their description mentioned other body parts, for example “wrist array coil”, or were considered hardware. After investigating their intended uses on the FDA application and company website, 28 tools required further checking. After removing older versions of the same software, 16 relevant tools were assessed against our inclusion criteria, after which 9 companies/QReports remained (see Fig. 1 for PRISMA flowchart).

Fig. 1
figure 1

Research flowchart showing a systematic and extensive search for CE marked and FDA cleared QReports. Websites of companies exhibiting at the most recent ISMRM, ESMRMB, RSNA, ECR, ESR AIX, ASNR, SIIM and ESNR were searched, and the website https://grand-challenge.org/aiforradiology/ was cross-checked

Search methodology: CE-marked product identification

Unfortunately, there is no freely available and searchable database of CE-marked medical devices yet, although plans are underway to deploy one this year (EUDAMED) [31]. Therefore, the same comprehensive method used by the FDA could not be applied. In lieu of this, detailed review of the websites of companies exhibiting at the most recent relevant medical imaging conferences (ISMRM, ESMRMB, RSNA, ECR, ESR AIX, ASNR, SIIM and ESNR) were used to find CE-marked quantitative tools. The website https://grand-challenge.org/aiforradiology/ was also used to cross-check the results. One hundred and nine companies were identified for further investigation; after checking the information on their websites against our inclusion criteria and following up with direct email contact where necessary, 8 were included.

Company and product features

Given a large number of companies and wide range of features, one aim of this review is to provide an unbiased repository of technical features and characteristics to help clinicians and researchers select the most appropriate QReports for their individual investigations. After establishing a list of companies that met our inclusion criteria, all vendors were contacted to provide relevant information that was unavailable on their websites. The following features, deemed to be most relevant to clinicians and researchers, were decided in advance and then sought through website research and direct vendor contact:

  • CE/FDA approval status

  • Date of approval

  • Target disorder

  • Segmentation/volumetry method

  • Lobar and sub-lobar parcellation/volumetry

  • Cross-sectional only or also longitudinal analyses available

  • Report processing time

  • Details of a normative reference population

  • Provision of segmentation overlays/atrophy heat maps

  • Strategies to account for inter-scanner variability

  • Image quality control method

  • Report deployment/PACS integration procedure

When all information had been collected, we contacted vendors again for final confirmation of their individual details prior to publication.

Literature search on technical and clinical validation of identified products

The results of this systematic review are intended to help inform potential users of QReports, assumed to mainly be clinicians. Given the health-related implications of the results and in the interest of reproducibility, the methodology has been registered with the PROSPERO — Registration Number: CRD42021233510. In line with the PRISMA guidelines [28,29,30], a detailed search was conducted using the identified company and associated QReport names as search terms. Both names were searched in order to cover the full breadth of technical and clinical validation papers in the literature and to cover research conducted pre-branding or product naming. PubMed, Scopus and Ovid Medline “All fields” were accessed (latest search on 15 March 2021) using the search terms below; brackets are used to indicate that a term consisting of multiple words was used as a single search term:

  1. 1.

    (ADM diagnostics) OR (Corinsights MRI)

  2. 2.

    Brainminer OR diadem

  3. 3.

    Brainreader OR neuroreader

  4. 4.

    Combinostics OR cNeuro

  5. 5.

    CorTechs OR NeuroQuant

  6. 6.

    Corticometrics OR THINQ

  7. 7.

    Icometrix OR (Icobrain dm)

  8. 8.

    (JLK Inc.) OR JAD-02 K OR Atroscan

  9. 9.

    (Jung diagnostics) OR biometrica

  10. 10.

    mediaire OR mdbrain

  11. 11.

    Pixyl OR Neuro.BV

  12. 12.

    Quantib OR (Quantib ND)

  13. 13.

    Quibim OR (Quibim Precision)

  14. 14.

    Qynapse OR QYscore

  15. 15.

    (Siemens Healthineers) OR (AI-Rad Companion)

  16. 16.

    SyntheticMR OR (syMRI neuro)

  17. 17.

    Vuno OR (Vuno Med)

In conjunction, further relevant papers were searched through PubMed’s “related articles” function and cross-checking references from the initially identified studies and company websites. Finally, in order to capture studies published pre-branding, all vendors were contacted to provide further technical and clinical validation publications covering their QReports.

Study inclusion criteria

Following steps 2–6 of the QNI six-step framework, the search terms described above were used to find peer-reviewed research covering technical and clinical validation, workflow integration and in-use evaluation for each QReport. Papers were reviewed for relevance and inclusion in our analysis on the basis that (i) they involve automated brain segmentation and volumetry results (ii) were published as original research in peer-reviewed academic journals or conference proceedings (conference posters were excluded) and (iii) fit into one of these four categories:

Technical validation

Papers presenting validation of the technical performance of brain segmentation technique and subsequent volumetric results, for example test–retest studies, standalone receiver operating characteristics or those comparing results (spatially and/or volumetrically) to manual segmentation and/or other state-of-the-art segmentation software, such as Freesurfer [32] or FSL-FIRST [33], regardless of disease area.

Clinical validation (dementia)

Testing the use of a QReport (tool meeting our inclusion criteria in “Vendor and product search” section) by clinicians (including but not limited to radiologists, neurologists, psychiatrists, neuropsychologists) on a dementia/memory clinic population within one or more of the following settings: (i) aiming to assess the QReport’s effect and impact on clinical management (i.e. usability and prognostic value); (ii) determining diagnostic accuracy, confidence, differential diagnoses vs. “ground truth” clinician-rated diagnoses, i.e. using receiver operating characteristics; (iii) percentage agreement or inter-rater reliability metrics; (iv) determining the correlation between automated volumetry and clinician-rated visual rating scales (e.g. MTA/Scheltens scale) and (v) clinical drug trials using the QReport’s results as an outcome measure in dementia trials.

Clinical validation (other neurological disease)

As above, but testing the use of a quantitative diagnostic report by clinicians in neurological diseases other than dementia or clinical drug trials using the QReport’s results as an outcome measure in trials of other neurological diseases.

While the focus of this review is dementia, it is also relevant to document the other instances where volumetric analysis methods from the vendors identified have been tested by clinician end-users, as this is ultimately the most critical phase of validation. Therefore, a few such examples found in the literature have been included in our analyses. It is of also interest to see how the various QReports have been used for research purposes alongside technical and clinical validation. However, these have not been included in the final results of our literature search because the focus of this review is validation, which should be most relevant to their clinical use, rather than examining the current range of their applicability in research.

Workflow integration and in-use evaluation

Papers analysing any of (i) benefit to patients; (ii) the effect on radiologist reporting time; (iii) clinical and population perception or (iv) the overall socioeconomic effect of using QReports in the clinic.

Data extraction

All full-text articles evaluated that met the inclusion criteria were split into “Technical Validation”, “Clinical Validation—Dementia”, “Clinical Validation—Other” and “Workflow integration and in-use evaluation”, and were blindly assessed by two raters. The search and categorizing were replicated and verified by an independent researcher and no critical issues were detected. All relevant studies were categorized along with general information such as title, authors, year of publication, journal, associated tool and website. The technical information and features of the tools were also data based and are documented in Table 1.

Table 1 A high-level database of the vendors and various features in each of their QReports, presented in alphabetical order of vendor name. We have outlined information from publications and direct contact with vendors for readers to assess according to their individual needs. All information was checked and confirmed with vendors in advance of publication. Differing amounts of information between vendors is due to variation in how much the vendors were willing/able to share. Due to the proprietary nature of reports, it was not possible to independently verify all details from vendors but they were confirmed against sample reports where possible

Results

Company and product search

Following the methods described above, 17 companies were identified that met our inclusion criteria. Each company had one QReport that met our inclusion criteria, see Fig. 1 for a research flow diagram summarizing the search for relevant products.

Excluded tools

According to PRISMA guidelines, exclusion criteria were decided in advance of the systematic search and are listed in the “Methods” section. The various brain-related software tools that were excluded at the eligibility screening phase have been summarized below.

Tools not currently certified for clinical use were Imagilys (https://www.imagilys.com/), which is a previously CE-marked tool but their license recently expired. VEObrain produces a visual neuroradiological volumetry report but they have not yet been FDA/CE approved (https://www.veobrain.com/). Veganbagel (https://github.com/BrainImAccs/veganbagel) and volBrain (https://www.volbrain.upv.es/) are open-source software for estimation of regional brain volume changes and have been tested alongside visual rating scales [18, 21, 81]; veganbagel also has a PACS and workflow-integrated user interface. Freesurfer [32], FSL [33], VBM [66, 67] and SIENAX [82] are all well established and widely used brain research software but without clinical certification.

Tools requiring non-MRI input were eVox uses EEG to provide a map of brain function (https://evoxbrainmap.com/evox-brain-map/), Syntermed (https://www.syntermed.com/neuroq) and DOSISOFT (https://www.dosisoft.com/products/planet-neuro/) use FDG-PET to provide amyloid deposition maps.

Tools producing either non-volumetric reports or those focused on other neurological diseases were Advantis (https://advantis.io/) which offers 2D/3D visualization and post-processing workflows of DTI/tractography, DSC perfusion and fMRI.

Tools lacking normative reference data included QMENTA (https://www.qmenta.com/), a cloud-based application which accepts a broad range of MRI modalities and performs various statistical analyses. However, it provides no structured report or procedure for single-subject comparison to a normative reference population.

Included tools

The companies and QReports identified through the search strategy detailed in the Methods section and illustrated in Fig. 1 are summarized in Table 1 along with technical details and features.

Company and product features

Relevant information was compiled into Table 1, a structured database of the various information and features in each report. To complement Table 1, a general summary and some insight into the range of features recorded are outlined below.

CE/FDA approval status

All companies included in this review have received either CE class I/II marking or FDA 510 (k) clearance, as “software as a medical device”.

Date of approval

The first company (CorTechs.ai) received FDA clearance in 2006 and the most recent was certified in December 2020 (ADM diagnostics). Unsurprisingly, the older companies have generally published more peer-reviewed validation studies. It should be noted that all vendors have carried out internal technical validation processes, including the necessary steps for CE and/or FDA clearance. All companies contacted, and especially the younger ones, claimed to be planning further peer-reviewed validation studies.

Report processing time

A wide array of QReport processing times were reported across the vendors ranging from a few seconds to a few hours, which is highly dependent on local vs cloud-based deployment. It should be noted that we were unable to verify the reported times without access to each of the software packages.

Segmentation/volumetry method

The vast majority of companies use proprietary methods developed “in house”, of which five claim to use deep learning. Several companies have used modified versions of previously reported research methods, such as geodesic information flows (GIF) [34, 83], Freesurfer [32] and VBM [66].

Sub-regional volumetry

All vendors provide lobar and hippocampal volumetry as a minimum. Beyond these regions, companies range from adding only ventricular information to providing over 100 sub-lobar regions as part of their structured reports. Some companies reported excluding various sub-lobar regions due to reproducibility issues and others claimed extensive reporting of such regions was not of interest to their users.

Cross-sectional and longitudinal analyses

Ten companies provide both cross-sectional and longitudinal analyses. Longitudinal comparisons were broadly indirect approaches, i.e. the difference in volume/percentile per structure between two visits, rather than a direct approach such as the boundary shift integral [84,85,86] or SIENA [82].

Details of a normative reference population

Some of the most notable variations across companies is seen in the number, age range and breadth of subjects/data used in the normative reference population. The vast majority of vendors reported a mix of gender, scanner type and field strength achieved through the use of both private and public datasets. However, the size of the dataset varied greatly from ~ 100 to ~ 8000. The age ranges were more consistent and broadly covered the 20–90 years range.

Target disorder

All companies reported dementia as a target disorder. Eleven tools were said to be aimed at multiple disorders, including epilepsy, traumatic brain injury and MS, in addition to dementia.

Provision of cortical overlays/atrophy heat maps

All companies provide some form of cortical overlay back to the user. These were either segmentation examples for accuracy confirmation, atrophy-based heat maps or both.

Image quality control (QC) method

Techniques for image QC before report processing varied greatly, ranging from specific acquisition protocol requirements to automated artefact checks and automated flagging for manual QC.

Strategies to account for inter-scanner variability

All companies informed us that harmonization measures were in place, although some declined to provide proprietary details. The type of strategy varies considerably, including an equal mix of field strength, scanner vendor and acquisition parameters in the reference dataset; vendor-specific acquisition parameters and site qualification procedures; and adopting validated variation-agnostic segmentation algorithms.

PACS integration/report deployment procedure

All companies claimed to provide PACS integration of their tools, some offer web-based, cloud-based or separate hardware solutions.

Peer-reviewed technical and clinical validation

The number and category of studies found during this systematic literature review are presented in Fig. 2 and the “Literature Search” section.

Fig. 2
figure 2

PRISMA flowchart documenting the studies searched and selected for inclusion in this review

Literature search

The literature search, screening, final selection and categorization were conducted in line with the PRISMA guidelines [28,29,30]; the results are outlined in a PRISMA workflow diagram (Fig. 2) and documented further below. A total of 62 original studies covering technical (39) or clinical validation (23, dementia = 15, other neurological diseases = 8) were identified from 11 of the 17 companies/products assessed. For 6 products, no publications meeting our inclusion criteria were identified. Only 4 vendors have published clinical validation of their reports in a dementia population.

The distribution of studies identified is shown in Fig. 3. As expected, there was considerable variation amongst the vendors in the number and type of validation studies performed. However, all companies claimed to be planning further peer-reviewed validation studies.

Fig. 3
figure 3

The distribution of papers meeting our inclusion criteria for each of the companies identified. The vendors are listed in chronological order according to the date of their first CE/FDA approval

Validation studies identified

Of the 17 companies assessed, 11 have published some form of technical validation on their segmentation methods; only 4 have published clinical validation of their QReport in a dementia population and 3 when using the same report in other neurodegenerative disorders, totalling 62 studies. It should be noted that all QReports identified have satisfied the validation requirements for FDA clearance and/or CE marking. However, these markings do not guarantee diagnostic value; further rigorous independent validation studies should be conducted and published in peer-reviewed journals to assist potential users’ decision-making between available tools. In order to remain unbiased, a narrative synthesis of the various studies searched for each company is provided and referenced below (in alphabetical order). In general, more technical than clinical validation has been published by companies and research groups using proprietary QReports. Technical validation studies broadly reported strong correlation between automated segmentations and that of manual raters or state-of-the-art research tools, such as Freesurfer. Clinical validation studies of quantitative reports on dementia patients, albeit scarce, conveyed improved diagnostic accuracy [38, 58], prognostic value [39, 57], differential diagnosis [19] and confidence [42] amongst clinicians or vs. clinician diagnoses, as well as strong correlation with the diagnostic potential of visual rating scales [43, 59, 87].

Brainminer:

DIADEM uses the geodesic information flows (GIF) methodology for brain segmentation and volumetry, which has been tested [34] against the MAPER segmentation technique [88]. GIF has also previously been tested against manual segmentations [35, 36].

Brainreader:

Volumetry results from the Neuroreader report have been compared to manual segmentations [37]. Clinical: Automated hippocampal volumes were compared to NeuroQuant’s in terms of predicting conversion from mild cognitive impairment (MCI) to AD [39]. Radiologists have tested the validity of Neuroreader for detecting mesial temporal sclerosis in epilepsy patients [89] and dementia diagnosis in a memory clinic cohort [38].

Combinostics:

Combinostics’ segmentation method has been compared to manual segmentations [40] and tested for standalone disease classification [90]. Clinical: The performance of their automatically generated MTA and GCA rating scales has been compared to radiologists’ assessment [43]. The PredictND tool for prognostic assessment has been tested by a clinician [42].

CorTechs.ai:

Automated segmentations have been both manually checked and compared to manual segmentations [44, 45, 47, 52, 55], FreeSurfer [46, 50,51,52, 56, 57], FSL-FIRST [47, 53], SIENAX [48] and other FDA/CE-marked tools: MSmetrix [48]. One study also assessed the difference in results following a version update [49]. Furthermore, a new MR volumetry software (Inbrain—https://www.inbrain.co.kr/) recently compared their results to NeuroQuant [54]. Clinical: NeuroQuant has been used by radiologists in the context of traumatic brain injury [25, 91], temporal lobe epilepsy [92,93,94] and AD [58, 59, 87]. The prognostic value of NeuroQuant has been assessed in MCI patients [39, 57]. NeuroQuant’s volumetry results have been used as an outcome measure in a number of dementia-related clinical trials, covering immunoglobulin [12], Ab immunotherapy CAD106 [13], resveratrol [14], 8-OH quinoline [15] and adipose-derived stromal vascular fraction [16].

Corticometrics:

The THINQ report uses the segmentation and volumetry method samseg, which has been tested in one study [60] alongside multi-atlas likelihood fusion (PICSL-MALF) [95], Brainfuse [96], majority voting [97] and Freesurfer.

Icometrix:

Volumetric results from icobrain dm were recently compared to Freesurfer [62]. The longitudinal comparison tool, icobrain long, has also been tested against SIENAX with real-world MS data [98]. Their MS-specific report, MSmetriX, which uses the same volumetry technique, has been tested intercontinentally [65] and validated against SIENA on MS [63] and AD patients [64].

jung diagnostics:

The Biometrica platform uses the widely validated SPM for volumetry [70] and has been compared to the SIENA and FSL tools [71, 72]. Hippocampal segmentations have previously been verified by radiologists [69]. Clinical: The Biometrica report’s effect on dementia diagnosis has also been tested by neuroradiologists [19].

Quantib:

Quantib’s segmentation method has previously been compared with manual segmentations [73, 99].

Qynapse:

The Qynapse segmentation method has been tested against manual segmentations [75, 76].

SyntheticMR:

SyMRI’s volumetry results have been assessed in a repeatability studies and manual segmentation study [100]. The automated brain parenchymal fraction generator has been compared with manual techniques, VBM8 and SPM12, in MS patients [79] and healthy controls [78]. Clinical: The SyMRI report results were used in a clinical trial of rituximab on MS patients [17].

Vuno:

Vuno’s deep learning segmentation methods have been tested for standalone disease classification [80].

Discussion

In this systematic review, we have identified a broad range of companies offering CE-marked or FDA-cleared QReports for use in dementia populations. The available publications concerning technical and clinical validation of these tools were categorized to increase the transparency of evidence. However, product ranking or recommendations have been avoided due to variations in the needs of each purchaser and user. Beyond regulatory body approval, QReports on the market vary widely in how they have been technically and clinically validated for use in clinical practice. Of the 17 companies assessed, 11 have published some form of technical validation on their segmentation methods; only 4 have published clinical validation of their QReports in a dementia population and 3 when using the same report in other neurodegenerative disorders. For 6 products, no publications were found that met our inclusion criteria. We found no published evidence for any regulatory approved QReports on workflow integration or in-use evaluation, as recommended in steps 5 and 6 of the QNI framework. However, all vendors informed us that they are planning (further) validation studies. It is worth noting that the European Medical Devices Regulation has recently implemented a “post-market clinical follow-up” in conjunction with their “post-market surveillance” and “clinical evaluation reporting” (https://ec.europa.eu/health/md_sector/overview_en). This will require vendors to gather, record and analyse their clinical performance and safety data throughout the lifecycle of their product in order to achieve certification or re-certification. Hopefully, this will stimulate the publication of external peer-reviewed validation studies by vendors.

Previously published reviews covering quantitative radiological tools have either focused purely on AI-driven image analysis software for broader radiology [101,102,103] or only covered a limited number of tools available on the market focused on neuropsychiatry [104, 105]. In recent years, there has been a considerable rise in companies providing both AI and non-AI-based automated quantitative analysis methods: 12 of the 17 identified in this study are less than 3 years old. This growth recently prompted the FDA to produce an “action plan” for AI/machine learning-based software as a medical device—https://www.fda.gov/media/145022/download. In this paper, they outline plans to update current regulatory frameworks, strengthen the harmonized development of “good machine learning practice”, support a patient-centred approach and, most relevant to this review, support the development of methods for evaluating and improving machine learning algorithms and promote real-world performance studies, in other words, technical and clinical validation. The ECLAIR guidelines were also published very recently aiming to provide guidance and informed decision-making when evaluating commercial AI solutions in radiology before purchase [106].

Using structured and validated QReports could provide considerable improvements in diagnostic accuracy, reliability, confidence and efficiency across a neuroradiological service but is predicated upon technical and clinical validation [2, 8, 21, 107, 108]. Previous research has shown that these diagnostic improvements could be achieved by providing region-specific volumetric differences between single-subjects and an age-matched normative population [18,19,20,21,22,23, 91, 109,110,111]. Work to this effect has been underway for some time but there is currently no rigorously validated platform for automated quantification and display of volumetric data in widespread use for radiology reporting. There are several hurdles for clinical implementation of volumetric analysis, such as a discrepancy in the quality of research and clinical data, need for automated detection of image artefacts, inter-scanner variability and the requirement of full automation. Indeed, only 23% of 193 centres assessed in a recent European survey performed volumetric analysis, and only 5.7% reported using it regularly [27]. Of the 23% using volumetry, only around half used normative reference data for single-subject comparison. The majority of centres reported using FreeSurfer (43.5%) for volumetric processing, followed by CorTechs.ai’s NeuroQuant (17.4%), AppMRI hippocampus volume analyser (15.2%) and Icometrix (4.3%). It is notable that the highest percentage of reported use of a clinical proprietary tool (17.4%) was exhibited by NeuroQuant, which is also the tool that has been most widely validated thus far. It follows that extensive technical and clinical validation of the tools described in this review will likely increase user confidence and facilitate the adoption of quantitative methods in the clinic.

The features offered by the QReports identified vary widely, see Table 1. No “one-size-fits-all” approach exists for the complex requirements of each clinician, department or patient population. The same applies to the degree and type of validation in the peer-reviewed literature: studies relevant to one population may be less so to another. In order to remain unbiased, a summary of QReport features and validation studies in the literature has been provided but detailed study results and product recommendations are avoided due to the variation in the needs of each purchaser and user. Indeed, the selection of QReports depends on several factors, such as resources, experience and expertise already available in a clinical group, product regulation, technical and clinical validation, generalisability to the patient population seen in clinic, integration of software into the clinical workflow, customer support, data security requirements and cost/return on investment/reimbursement eligibility. It was not possible to gather purchase costs for this review but a recent overview of volumetric quantification in neurocognitive disorders reported costs on average to be USD82.68 per patient [105]. However, the actual costs of implementing these tools in a clinic may vary by a country where the healthcare system, reimbursement regulations and healthcare costs all playing a role.

What evidence would an ideal QReport exhibit on the way to clinical integration?

A six-step framework for the translation of clinical reporting tools has been previously set out by the QNI [2]. Here we discuss some of the most important milestones in the development of a dementia-specific QReport. The main aspect and the focus of this review is the transparency of technical and clinical validation as this should be of the utmost importance to end-users and critical to ensuring patient benefit.

Technical validation vs industry standards

Any QReports intended for use as a diagnostic aid in neurodegenerative diseases should communicate both patient and normative volumetric results via a visually intuitive and clinically relevant report. Ideally, we suggest that this should include automated quality control metrics, cortical overlays of the segmentation for sanity checking by the end-user and visual representation of the quantitative data in a graph or chart and/or atrophy-based heat maps for easy reference. The automated segmentation method should undergo rigorous technical validation in repeatability studies and versus industry standards such as expert manual segmentation, Freesurfer, FSL or VBM, and the results published in peer-reviewed journals. All the vendors assessed in this review have produced quantitative reports to assist volumetric MRI analysis. However, the younger companies are generally have not published technical validation of their reports, although all claimed to be planning.

Clinical validation by end-users

Several papers assess the predictive capability of tools for automated group-level differential diagnoses amongst dementia subtypes in a research setting [59, 80, 90, 112,113,114,115]. However, the purpose of this review is to help clinicians select the most appropriate tools for their individual investigations in everyday clinical practice. Automated group-level diagnosis studies without intervention and testing by end-users are far less relevant to the clinic. QReports should be tested by the end-users, usually clinicians, on multi-centre clinical data from patient populations that are expected to benefit most from more accurate and faster diagnoses. For example, screening for subjective memory concerns and diagnoses for younger onset dementia patients. These patient populations may have more subtle patterns of atrophy and QReports are likely to provide the greatest benefit to raters by flagging patients who require more regular follow-ups and reducing inter-rater variability. The results of diagnostic accuracy studies are ideally published in peer-reviewed journals [19,20,21, 116,117,118]. Several companies provide lists of publications on their website. While this is both positive and helpful, direct references to technical and clinical validation of QReports are scarce. For the greatest impact and widest adoption of these tools, peer-reviewed validation studies should be clearly highlighted and championed by vendors. While technical validation has been covered by 11 of 17 vendors, only 4 have published clinical validation of their tools on a dementia or memory clinic population. We have identified a major lack of clinical validation studies for volumetric neuroradiological tools in the literature.

Proven generalisability

Analysis methods should ideally be robust to variation in acquisition parameters, scanner/vendor differences and field strength, although this is a difficult standard to achieve in reality. Single-subject results should be contextualized against a large and generalizable reference population of mixed field strengths, scanner vendors and age and gender-matched controls, ideally transferrable to the demographic of patients that will be seen in each clinic. For example, a tool using a reference population comprised of data purely from an Asian hospital might not translate well for use at a clinic based in Europe or the Americas. Limited evidence so far suggests that mean subcortical volumes in normative cohorts have proven to be reasonably interchangeable across reference populations [111], though this needs further support from studies with multi-ethnic populations and covering more brain regions. In general, vendors have compiled sufficiently large and diverse normative reference populations and should continue to be transparent about the source and composition of these cohorts. However, as documented in the Results section, there is wide variation in generalisability procedures adopted by companies. There is no single universally accepted or correct method but companies should be fully transparent regarding the measures they have in place to account for the variability of input data.

Full automation and workflow integration

This covers step 5 in the QNI framework. Vendors should be able to provide clear methods for PACS and workflow integration and ideally full automation of sending scans for processing and receiving results. Furthermore, a system for integrating QReport results into the radiologist’s report would save time and reduce copying errors. Customer support operations must also be in place to deal with errors in sending and processing. While many tools reviewed here do include methods to accommodate workflow integration, we found no research evidence regarding the integration of QReports into the clinical reporting workflow.

In-use evaluation

This covers step 6 in the QNI framework but, like step 5, the literature review did not uncover any evidence of in-use evaluation of the QReports included in this paper. However, work has been presented to map out the relevance of automated software for radiology in general [119,120,121]. While the benefit to patients should be the key factor in using automated volumetry to assist diagnosis, the socioeconomic impact, while heavily associated with patient benefit, should also be assessed. Multi-centre studies evaluating clinical and population perception and cost-effectiveness of quantitative report use should be conducted in clinics that have been regularly using reports for a sufficient period of time.

Limitations

Some limitations of the current review need to be considered. In order to find as many companies providing QReports, an extensive FDA/CE approval search was conducted. However, without a fully searchable database of CE-marked products, this approach may not be fully exhaustive and some vendors could have been missed. Furthermore, some products may have received regulatory approval during the publication process of this manuscript or have been approved for other markets. Despite that, our overall conclusion remains unchanged that there is a need for more clinical validation for such tools to facilitate optimal clinical adoption. Especially since we found that the younger vendors were most lacking in both technical and clinical validation and in-use evaluation. Finally, much of the information on the features of each company (see Table 1) was provided by the vendors themselves. As such, these details could not all be independently verified by the authors or the reviewers.

Future developments

While we have focused primarily on evidence of technical and clinical validation of QReports, we also observed wide variation in capabilities across tools and in the information presented. Conducting in-use evaluations, as recommended in step six of the QNI framework, will help optimize the functions, features and design of QReports based on how they foster clinical efficacy. Another natural progression from this conclusion would be to present a side-by-side comparison of each of the reports and their results including interpretation by radiologists and their clinical impact using a test set of subjects from the same dataset, such as ADNI or a real-world dataset reflecting everyday clinical practice. Eleven of the 17 companies covered in this study told us that they would be willing to participate in such a project.

Conclusions

In this review, we reveal a significant evidence gap in the clinical validation of QReports for use in dementia diagnosis and memory clinic settings. Only 4 of the 17 companies assessed have so far published some kind of clinical validation and there is not yet any evidence of workflow integration nor in-use evaluation. From this, we conclude and recommend that more research can be done to validate these QReports in clinical settings to develop a more robust understanding of how each tool contributes to the diagnostic workflow in memory clinics. This will not only support optimal clinical integration of quantitative tools but will also help neuroradiologists to make informed decisions regarding the use of quantitative assessment in their clinics. For clinicians interested in incorporating quantitative reporting software into their diagnostic workflow, note that while 4 companies have published clinical validation studies, owing to large variation in the quantitative reporting features available and a lack of comparative validation on standardized imaging cohort data, there is little scope for recommendation between them with regard to their utility as diagnostic tools in the clinic. We hope this review encourages such validation studies from the developers of these quantitative tools and recommend caution from clinicians when examining claims of the tools’ clinical performance.