INTRODUCTION

When choosing a test or treatment, clinicians and patients consider its potential benefits and harms.1 Ideally, these are known from studies in similar patients in comparable care settings with sufficient follow-up. The clinician assesses this evidence, incorporates patient values and preferences, and decides with the patient about proceeding with testing or treatment.2 This process of patient-centered evidence-based care relies on evidence availability, preferably at the point of care, to inform decisions.

While evidence about an intervention’s efficacy or effectiveness is frequently available, rigorous evidence about the broad range of harms that a patient might experience is scant. Fewer than half of randomized trial reports include all captured harm data3 and captured data may be limited. Further, data from trials are often insufficient as evidence of rarer harms often emerge over time.4 This one-sided evidence impedes fully informed, shared decision-making. In this paper, we explore the problem of the limited scope of considered harms in assessments of tests and treatments. Our goal was to generate recommendations for how researchers might more comprehensively evaluate potential harms of healthcare interventions.

CURRENT UNDERSTANDING OF HARMS OF HEALTH SERVICES

While patient safety is addressed by many groups including Departments of Health and the Joint Commission, the assessment of harms of drugs and devices is generally the domain of medical product regulators. The US Food and Drug Administration (FDA) refers to harms of interventions as “risks,” which are product-associated adverse events and unfavorable effects. The FDA also recognizes that treatment burden from an intervention can impact patients’ health, functioning, or well-being.5 Similarly, health technology assessment principles adopted by many countries include safety as a prime consideration for supporting coverage, alongside effectiveness, economic impact, and ethical and social considerations.6 However, current framing of safety issues is insufficient to capture the full breadth of potential harms.

The harms that should be considered when designing research have not been comprehensively described. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines include discussion about appropriate methods for reporting both benefits and harms, but does not specifically categorize harms,7 noting that harms information is under-reported in systematic reviews, even when available in the original studies.8 The US Agency for Healthcare Research and Quality (AHRQ) has described best practices for collecting harms in systematic reviews, including comparative effectiveness reviews,9, 10 emphasizing the importance of identifying, selecting and prioritizing harms based on severity and frequency, and focusing on harms that are important to patients. While the AHRQ recommendations allude to non-physical harms, such as overdiagnosis and labeling, its framework focuses primarily on physical effects despite increasing emphasis on non-physical harms within the patient safety movement.11 Academic groups have created frameworks focusing on harms of screening, a taxonomy of harms from diagnostic testing, and a conceptual model of documented harms of overuse;12,13,14 we sought to build on this work.

In August 2019, 32 diverse experts gathered for a one-day meeting to discuss harms specifically related to healthcare overuse. During the meeting, a subgroup of health services researchers with experience working to reduce low-value care discussed the broader topic of harms of interventions; the authors continued working together afterward to develop recommendations. We proceeded with iterative literature review, conducted follow-up virtual meetings, and obtained input from four patient representatives on the broader topic of harms of interventions. We compiled information and developed this paper based on our findings, with disagreements resolved through discussion. Here we propose categories of harms to consider when evaluating any healthcare service and provide examples of data sources and measurement tools to enable evidence generation about harms in each domain.

DOMAINS OF HARMS

We recognize seven domains of potential harms to a patient receiving a test or treatment. We include harms that may be experienced beyond the immediate period of receipt of the service. These are (1) physical impairment; (2) psychological distress; (3) social disruption, defined as interference with relationships or altered social identity or status; (4) disruption in connection to healthcare; (5) labeling—the impact of being assigned a diagnosis; (6) financial impact; and (7) treatment burden—the workload from managing healthcare conditions (Table 1).

Table 1 Key Domains of Harms to Measure in Studies of Tests or Treatments

These domains should be considered by clinicians reviewing evidence about harms and by researchers designing studies and establishing clinical registries to evaluate the impact of a test or treatment on patients. The intensity of harms should be measured across domains according to their frequency, timing, duration, and severity. Within each domain, intensity may range from largely inconsequential (bruising from phlebotomy or small prescription copay) to devastating (a life-threatening complication or bankruptcy) and harms may be short-lived or long-lasting. Researchers should recognize that harms may arise as a direct consequence of the intervention or from the cascade of care following the original service.15 Clinical cascades can be critically important. For example, computed tomography (CT) of the lung with intravenous contrast might result in physical harms like an allergic reaction to the contrast agent. That test might also incidentally reveal a nodule that is later biopsied as part of a cascade of care, resulting in a pneumothorax (an additional physical harm), social disruption, and psychological distress. This variability in the relationship of harms to the culprit intervention, in terms of both timing and directness, suggests the importance of deliberate specification of the timing of measurement of harms and the challenges of appropriately attributing harms to the original healthcare service. The question of the appropriateness of the culprit intervention adds further nuance to the discussion of harms.

SOURCES OF DATA FOR UNDERSTANDING HARMS ACROSS DOMAINS

Researchers evaluating tests and treatments have choices regarding data collection for detecting and quantifying harms in these seven domains. While all domains should be considered during study design, researchers may prioritize some domains over others in a single study. Data sources for generating evidence about harms of healthcare services can be primary (collected actively by the investigator) or secondary (existing data leveraged for research, including registries); both may be needed to capture the full range of potential harms (Table 2). Evidence captured during usual care, possibly within a pragmatic trial, may be the best source of data for patients receiving the intervention outside of a tightly controlled efficacy trial or cohort study. This type of evidence may be critical, as harms experienced in usual care may differ substantially from those experienced by patients enrolled in studies. Patients outside of studies are often more diverse and medically complex, the population of clinicians providing care is broader, there is greater likelihood of treatment interactions, and monitoring for early predictors of harm may be less complete. Even for interventions evaluated in clinical trials, in which many harms are directly measured, the use of registries or other longitudinal data collection may be necessary for understanding the delayed effects of interventions or those resulting from care cascades. Understanding these late-occurring harms is critical.

Table 2 Examples of Data Sources that Include Information on Harm Outcomes for Measurement

The seven harm domains may vary in importance at different points along the translational pathway from phase I drug studies to evaluations of implementation efforts. For example, financial and labeling harms are typically unimportant or unmeasurable in early-phase clinical trials, but may be essential for understanding harmful effects of interventions used in practice. Different research designs may be needed to adequately capture the harms of an intervention as compared with its benefits; appropriate designs should be informed by the availability of outcomes in the data.

MEASUREMENT TOOLS FOR PRIMARY DATA COLLECTION

Many measurement tools are available for primary data collection about the harms of interventions. Learning directly from patients is essential. Recent emphasis on patient-reported outcomes has led to international efforts to define important disease outcomes and validate measurement tools. Examples include the Patient-Reported Outcomes Measurement Information System (PROMIS) from the US National Institutes of Health16 and the Patient-Reported Indicators Surveys (PaRIS), developed by the Organisation for Economic Co-operation and Development (OECD) for the collection of patient-reported health indicators.17 Other potentially valuable measures are those from the International Consortium for Health Outcomes Measurement (ICHOM), which has created tools to measure standard sets of outcomes associated with particular diseases.18 Many of these tools capture outcomes in several, although not all, of the seven harm domains. For many outcomes in these proposed domains, there are disease-specific measurement tools and disease-agnostic tools that are widely applicable (Supplemental Table). Qualitative approaches may also be helpful for exploring outcomes in some domains, particularly if no validated tools exist.

HETEROGENEITY CONSIDERATIONS AND REPORTING

Just as there is heterogeneity in treatment benefit among patient subgroups, there is likely to be heterogeneity in experiences of harms. Researchers should anticipate the need to evaluate harms in sub-populations, since they may differ importantly by age, race, sex, social determinants, preferred language, comorbid illness, medication usage, psychological state, or other characteristics. Further, harms reporting and quantification should parallel the reporting of benefits, generally using both absolute (e.g., absolute risk increase, number-needed-to-harm) and relative numbers (e.g., relative risk, hazard ratio). In addition, harms are sometimes best understood as composite outcomes (e.g., the number of patients with any serious adverse event) and must be compared with those from other treatment options.19 Notably, a more robust literature on harms may uncover important biases that will require novel criteria to quantify.8

STAKEHOLDER INVOLVEMENT

Patients must be central to prioritizing harms outcomes across domains and, with other stakeholders, to fine-tuning the domain names for optimal acceptance. Informed decision-making about health services requires an understanding of how patients value potential benefits and harms; research design should incorporate that understanding. Better understanding of harms across domains benefits multiple stakeholders. Knowledge of the breadth of potential harms will enable policy makers to make better, fully informed decisions about priority setting, funding, and availability of services. For example, harms information might inform decisions to restrict some services to centers of excellence, to sub-specialists who can deliver the intervention more safely, or to select patient sub-populations. At the organizational level, awareness of harms across domains enables more complete understanding of patient safety through measurement of specific harms-related outcomes. Finally, at the level of the clinical encounter, knowledge of harms enables clinicians to appropriately counsel patients about benefits and harms to optimize and individualize care.

APPLICATIONS OF THE SEVEN DOMAINS OF HARMS

Implementing and operationalizing these recommendations is possible through several approaches. First, researchers should consider the broad range of harms when designing studies, including early clinical trials, comparative effectiveness studies, and systematic reviews, and when establishing registries. Attention to the range of domains could be encouraged through several mechanisms. Medical journals might require investigators publishing primary research to submit a checklist indicating which domains, and which outcomes within those domains, were included in the study. The CONSORT Harms Extension (for clinical trials) and the PRISMA Harms Checklist (for systematic reviews) already recommend best reporting practices, although without describing which types of harms should be included.7, 20 Harms domains could be incorporated into such checklists or applied similarly to the requirement of some journals for transparency about patient partners.21 Being asked about the scope of considered harms may prompt investigators to consider the seven domains when defining outcomes of interest. Regulators and funders of all types can play a role by requiring either reporting of harms in each domain or a justification for exclusion of domains. These groups can provide explicit a priori guidance for researchers designing research or registries.

The domains have several applications outside of research. Implementation experts could use them to design data collection instruments that are attentive to late-occurring harms and inform efforts to design interventions that minimize harms. Patient support organizations and funders could leverage awareness of these domains to encourage patients to report harms, especially those related to drugs, and target limited resources. Health professionals reading the literature should consider the domains when evaluating the evidence about the breadth of harms across domains. Perhaps most importantly, clinicians and patients making decisions about health services should be mindful of all domains of harms. Patient-physician conversations that acknowledge the wide range of harms, and the wide range of their severity within domains, will normalize a broad view of the impact of health services and ultimately allow for more fully informed decision-making.

CONCLUSIONS

Physicians are biased toward action and tend to focus on physical issues, and overestimate benefits and underestimate harms of health services.22, 23 Improving the measurement and reporting of the full range of patient harms from tests and treatments is vital to counter these biases and enable more informed decision-making and more patient-centered care.