Nice overview pertaining to some of the issues which can influence the DSMB decision-making process:
February 5, 1998
The Role of a Data and Safety Monitoring Board in Phase III Clinical Trials
John M. Lachin
The George Washington University
Drug Information Association 6th Annual Biotechnology Workshop
February 5-6, Dana Point CA
History of Interim Monitoring
The impetus for interim monitoring of clinical trials evolved principally from some of the early multi-center clinical trials conducted in the 1960's by the National Institutes of Health in the US and the Medical Research Council in the UK The University Group Diabetes Program (UGDP), one of the early NIH sponsored clinical trials, was one of the first clinical trials to employ an independent data and safety monitoring board or DSMB. This was also one of the first clinical trials in which the DSMB recommended termination of one of the arms of the study prematurely based on interim results. Other major clinical trials which followed, such as the Coronary Drug Project, also employed an independent data monitoring committee.
These and other trials stimulated the development of statistical methods for the sequential interim monitoring of emerging results. From the beginning, it was well known that the problem of repeated significance tests would distort the operating characteristics of basic statistical tests and confidence intervals. Various statistical approaches have been developed over the years to address this problem. Among the earliest was the work by Cornfield using a Bayesian approach, followed by the RST plans of Armitage and colleagues, which evolved into the now common group sequential procedures of Pocock, O'Brien-Fleming, and Slud-Wei, among others, and then the more general spending functions of Lan and DeMets and the stochastic curtailment procedures of Lan, Simon and Halperin. These methods have now been applied to numerous statistical procedures. We are now at the stage where it is possible to implement a statistical monitoring procedure for virtually any type of statistical analysis one might envision for a clinical trial.
In 1978 the NIH issued guidelines which require that all NIH-funded clinical research should employ a procedure for safety monitoring. In 1988 the FDA issued guidelines which addressed issues related to the statistical analysis of clinical trials, including interim analyses. Largely in response to the FDA guidelines, the PMA issued guidelines for the implementation of interim analyses in industry-sponsored trials. More recently, the 1997 International Conference on Harmonization (ICH) issued its Draft Guidelines on Statistical Principles for Clinical Trials which included recommendations on the implementation of interim monitoring in clinical trials.
In this presentation I would like to contrast the objectives of interim monitoring in these settings and to offer some recommendations as to when it is appropriate to consider interim monitoring in a pharmaceutical industry-sponsored trial, and how it might best be implemented.
Interim Monitoring Objectives in Public and Industry-Sponsored Trials
The basic objectives of interim monitoring focus on 3 principal issues:
1. to protect the safety of the patients enrolled in the trial;
2. to terminate the trial as early as possible so that the best treatment may then be made available to all subjects. This is sometimes called the ethical argument for interim monitoring. And of course:
3. to reduce the cost of a study by terminating that study early if there is overwhelming evidence that the treatment is effective or that it is ineffective.
Interim monitoring has also been used to select one among many doses or different drugs, or also to re-evaluate the sample size. In my presentation, however, I am going to address only the 3 main objectives.
An NIH-sponsored trial is very different from an industry-sponsored trial. For an NIH-sponsored trial there is only one audience: the scientific and the clinical community. Such trials involve studies of non-pharmacologic interventions such as intensive treatment in diabetes, new surgical procedures such as laser treatment for diabetic eye disease, new uses of established agents such as ACE inhibition in diabetic kidney disease, the evaluation of competing agents such as various anti-arrhythmic drugs in the CAST trial, orphan drugs such as chenodiol for the dissolution of gallstones, and occasionally studies of novel new agents. In most NIH studies, the mechanism is to publish the results in a major medical journal and then implement treatments in eligible patients.
The industry model, however, is quite different. The audience consists foremost of the FDA and then the clinical community. The trials are designed to evaluate new pharmacologic agents or new devices, or new indications for established agents or devices. The mechanism requires that the results of the studies undergo FDA review and approval, in addition at times to publication of the results, prior to the treatment of appropriate patients. In this context, the role of interim monitoring in meeting the three principal objectives is different under these two models.
Both the NIH and industry models place a premium on patient safety. This is largely not a statistical issue. However, the mechanisms used to monitor adverse effects differ. In NIH trials, this responsibility is vested completely with the independent DSMB. In industry-sponsored trials, there are rigorous safeguards for patient due to the extensive laboratory screening and due to the immediate reporting of a wide range of potential adverse effects directly to the clinical monitor, who has access to the treatment code.
In my opinion, the ethical objective of offering the best treatment to all patients as soon as possible applies to the NIH model but does not necessarily apply to the industry model. If an industry-sponsored trial is stopped early due to overwhelming signs of effectiveness in the opinion of the DSMB, the drug still may not be made generally available outside of that clinical trial, especially to the general public, until after the FDA has had the opportunity to review and approve the complete New Drug Application. In fact, early termination in this case may backfire and lead to a delay of FDA approval if the FDA has concerns over the process of interim monitoring itself, or later concludes that the evidence provided by the trial is found lacking in some respect.
Similar considerations also apply to cost savings. In a government or industry-sponsored trial, there would be substantial savings in total costs if the study is stopped prematurely due to adverse events or if it is stopped early due to lack of effectiveness. However, in an industry-sponsored trial the potential for cost savings is questionable if a study is stopped early due to interim signs of effectiveness. There will be no cost savings if early termination in fact eventually delays FDA approval of the new drug.
Mechanism for Interim Monitoring of Public and Industry Sponsored Trials
As stated previously, the mechanisms for interim monitoring largely differ between NIH and industry sponsored trials. These differences also involve other features of trial management. In the NIH trial, all trial data are promptly entered into the data management system as it is collected in the clinic, often daily but no less than weekly. All data are immediately edited for errors using computerized editing procedures. NIH trials generally do not employ CRAs to harvest the data. The principal advantage of this process is that all of the data are immediately available for analysis and review. Periodically, every 6 to 12 months, the DSMB reviews analyses of all outcome data to assess the differences between groups in overall effectiveness, and adverse effects so as to derive an judgment of the overall benefit/risk ratio. These analyses generally present risk ratios based on both numerators and denominators for each outcome. This approach relies heavily on an empirical evaluation of all outcome data, which requires statistical adjustments for the multiple sequential analyses of the data. Many of these procedures provide a “stopping boundary,” so called because termination of the trial may be justified when the boundary is crossed which indicates that statistical significance has been achieved. However there is some flexibility in acting on these boundaries. Basically, statistical procedures for computing “stopping rules” or boundaries have the same purpose of any statistical procedure: they allow one to assess the strength of evidence in the data. The multiple analyses are then used by the committee to form an overall opinion or Gestalt regarding the overall benefit to risk ratio.
Based on these comprehensive analyses, the DSMB may recommend termination of the trial. In general, some of the considerations which enter into that decision are whether the accumulated data provide compelling, conclusive results with respect to the principal outcomes in an intention-to-treat analysis. Here this is not purely a matter of statistical significance for any one outcome. Rather, it is a matter of whether all of the objectives of the trial have been met, or alternately whether no further gains are expected to accrue if the trial were to continue to its originally designed conclusion. The first trial which was modified on the recommendation of the DSMB was the University Group Diabetes Program. Here the DSMB met to review all accumulating data, but there were no formal stopping rules yet developed. The DSMB recommended termination of the tolbutamide arm of the trial when an excess of deaths was observed compared to the placebo group. As in all trials, mortality was monitored, but there was no advance concern that any of the treatments under study would affect mortality. When the results were nominally significant, without adjustment for repeated sequential analyses, the DSMB recommended termination. An outcry followed because many did not believe the result. Some have suggested that the tolbutamide group was in fact stopped too early. Another major NIH trial was the Beta-Blocker Heart Attack Trial which was terminated when a statistically significant reduction in mortality was manifest. This trial is of academic interest because it is through the process of developing monitoring boundaries for this trial that the idea of the alpha spending function emerged. Because there was a single dominant outcome, the considerations in stopping were simple. This is not always the case. One of the most complex trials in my experience was the Diabetes Control and Complications Trial (DCCT). This trial had one primary outcome but numerous secondary outcomes, some in fact being more serious clinical outcomes than the primary outcome. Thus, the trial was terminated over a year after the primary outcome analysis had “crossed the boundary” in order to accrue additional evidence that treatment had an impact on all of the complications of diabetes. In all of these instances, the mechanism was to publish, then treat. One of the few industry-sponsored clinical trials to have been terminated early was the study of AZT for AIDS in pregnancy. This actually was a co-sponsored trial by the NIH and industry, but it was designed and monitored from the perspective of the regulatory requirements for an NDA.
All of this is substantially different from the traditional interim monitoring procedure for safety in industry sponsored trials centered about the role of a clinical monitor. First, the data are often not entered in a timely manner. The data forms are periodically harvested by a CRA, usually every few months, and later the forms often are entered into a data base management system, in some cases towards the end of the study. The safety net is the clinical monitor appointed by the sponsor who monitors the adverse event reports submitted by the study investigators. However, there is no ongoing statistical analysis of the relative risks of adverse events between groups, in the sense of a computation of a risk ratio or a p-value. Also, there is no ongoing analysis of the benefits of therapy to allow the assessment of a benefit to risk ratio.
In part due to the successes of interim monitoring of NIH sponsored trials, and the proliferation of statistical methods for interim monitoring, the use of a DSMB or independent data monitoring committee (IDMC) has become more common in industry sponsored trials. To do so requires some changes in the way the data is collected and managed by the sponsor. In order to conduct accurate interim analyses for the DSMB, the data must be collected, entered into the data base management system, and edited for errors in a timely manner. It is critical that a clean data base be “locked” or closed out as of a fixed date prior to the preparation of analyses for the DSMB. However, in an industry-sponsored trial, the DSMB has a more limited role and less flexibility in the way it approaches its task. In an industry-sponsored trial, there must be precise assessment of the type I error probability for each analysis which may later be pivotal in the sponsor's application for marketing. This requires that the operations of the DSMB be pre-specified as much as possible with respect to the outcomes to which formal sequential boundaries are to be applied and the statistical methods to be employed. Among these, the principal decisions with regulatory implications concern the choice of the primary outcome and the choice of the primary analysis strategy, both for the final analysis and also for the interim analyses.
Some of these issues are illustrated by the recent clinical trial comparing zidovudine (AZT) Vs placebo, administered to a pregnant woman with HIV and her infant, in order to prevent the transmission of HIV to the infant. The trial was designed to assess the effect of AZT on the rate of transmission of HIV to the infant up to one year after birth. The usual approach to the analysis of such a study might be an analysis of cumulative incidence. However, the investigators wisely reasoned that since the study was designed to capture information up to one year in the majority of patients, then the monitoring procedure should be based upon a landmark analysis of the cumulative incidence at one year. In this case, it would be undesirable to monitor the results using the log rank test for the hazard function of the events. In fact, in a 1984 paper, Fleming, Green and Harrington (Controlled Clinical Trials) pointed out that the log rank test or any rank test may in fact be misleading in the context of interim monitoring. In this case it would be more appropriate to base the monitoring procedure on a Z-test for proportions at a fixed point in time, such as the overall proportion of events observed by one year in the cohort of patients so followed, or estimated from the survival curves at one year.
Although the results were described by a Kaplan-Meier cumulative incidence curve, the test statistic was based on the difference between the 72 week cumulative incidences, estimated from this curve, using the Greenwood-estimated variances. The difference between the 72 week cumulative incidence was 8.3% in the AZT group Vs 25.5% in the placebo group, with z=4.03, greater than that specified by the monitoring boundary. At the FDA review, (as a guest of the agency), this analysis was considered compelling. Had the study been monitored by a sequential logrank test, the results might have "crossed the boundary" sooner, not yielding a compelling difference at 72 weeks.
General Issues in Interim Monitoring
With this background, the following are some general issues which should be considered in establising a DSMB for a Phase III clinical trial. The most important is to emphasize that the judgment to terminate a trial should not be based only on achieving a given p-value, but rather requires careful consideration of a variety of issues, some clinical, some statistical, and some regulatory.
The first issue concerns the nature of the outcome assessments. In all clinical trials, whether under the NIH or the industry model, a major consideration is the primacy of a single outcome versus the importance of other outcomes. Perhaps the only instance in which there is no ambiguity is where the principal outcome is all cause mortality. In almost any other situation, the importance of various study outcomes can be debated. If there is a difference of opinion as to the importance of the outcome which leads to early termination of a trial, then the impact of that trial may be jeopardized.
An equally important consideration is whether the safety or toxicity profile of the therapy has been adequately assessed. Phase III trials are conducted to evaluate the potential adverse effects of an agent, and to do so usually requires many more patient years of exposure to the agent than is required to establish clinical effectiveness. Therefore, any decision to terminate a trial due to a demonstration of effectiveness must be considered in terms of the adequacy of the safety profile so far established from this and other trials. The central issue is whether the benefit:risk ratio will still be adequately assessed if a trial is stopped prematurely.
The DSMB must consider that data collection is a dynamic process, and that yet-to-be observed features of the data may impact on the ultimate credibility of the study. Among the most important considerations is the impact of potential losses to follow-up at the time of an interim monitoring. One never has complete ascertainment of all events at any interim look due to the built in lags of data reporting and collection. This should be a major consideration in any decision to terminate a study prematurely.
The DSMB must also assess the totality of the evidence from the trial. To the extent possible, the DSMB and the statistical center serving the DSMB, should conduct the same panels of analyses which would be employed in the final marketing application to describe the overall consistency of the trial results. These would include the consistency of the treatment effect among the various secondary or related outcome measures related to the natural history of the disease or to the mechanism of action of the agent. These would also include analyses of the consistency of the overall treatment effect across clinic populations and across subgroups of the population. The bottom line is that the DSMB must consider whether early termination would affect the overall precision with which the trial's results address the demonstration of effectiveness and of safety, and the overall credibility of the trial in the regulatory review process.
In fact, the ICH guidelines state that “Most clinical trials intended to support the efficacy and safety of an investigational product should proceed to full completion of planned sample size accrual; trials should be stopped early only for ethical reasons or if power is no longer acceptable.”
General Issues in FDA Review
Although the FDA and the ICH are open to the use of a DSMB in the monitoring of a clinical trial, from my experience there are a variety of issues that may be raised in the FDA review of a clinical trial in which interim monitoring was performed, whether or not it was terminated early. Some of these are addressed in the FDA, PMA and ICH guidelines, but many are not.
The first issue is the concern that the process of interim monitoring may appear to introduce bias into the study results. Here we must draw the distinction between clinical monitoring on the one hand versus data monitoring of the aggregate group data. By clinical monitoring I refer to monitoring the overall implementation of the protocol and the recording and classification of study events. If individuals involved in data monitoring are also involved in clinical monitoring, then it is possible that the entire study may be "tainted" due to concerns that the process of study management may introduce a bias in the study results. For this reason, there may be concerns raised if the sponsor plays any role in the interim monitoring process.
Another issue is the duration of exposure for the assessment of the safety or potential toxicity of an agent. If a trial is terminated early, then the duration of exposure in individual patients and the overall patient years of exposure in the cohort will be reduced. Both are important considerations, especially in indications where prolonged use of the agent is anticipated.
Another issue is whether or not an adequate number of outcome events have been observed, apart from considerations of p values or levels of significance. If the indication is a discrete event, such as total mortality, there may be concerns about approving an NDA in which fewer than 100 patients reach that outcome during the two or more pivotal trials, irrespective of the observed relative risks and p values.
A related consideration is the time course of the effects of the agent. Again to use mortality as an example, if the median survival time is one year, then it is reasonable to desire data showing the effects of the drug up to and beyond one year of treatment. If a study is terminated early based on the first six months of observation, then there will be few patients exposed for a year, thus jeopardizing the overall NDA. This issue must be addressed in the planning of the trial since the statistical approach to monitoring the trial will differ for a trial aimed at establishing a short term effect, for which a logrank test of significance may be appropriate, versus a trial aimed at establishing long-term effects, for which a proportions test at a specific point in time is more appropriate, as was the case in the AZT in pregnancy study mentioned earlier. Finally, all plans, procedures, analyses, meetings, data reviews and recommendations must be completely documented if the trial is to be credible to the regulatory agencies.
Recommendations
My overall recommendation is that interim monitoring should not be considered as routine practice in industry-sponsored trials. I feel that interim monitoring should principally be considered in instances where:
1. there are pre-existing safety concerns for which a data and safety monitoring board may provide an added measure of safety beyond that provided by the usual clinical monitor;
2. early termination for effectiveness would be so clear as to not jeopardize in any way the FDA review and approval of the resulting NDA; which in turn requires that
3. a single, dominant, unambiguous outcome measure is employed. In addition, due to potential concerns over the impact of the overall study management on the biases in the outcome results, the sponsor should not participate in the reviews by the DSMB in any way.
The DSMB should be completely external to the sponsor. The ICH guidelines state that “when there are sponsor representatives on the IDMC, their role should be clearly defined in the operating procedures of the committee... (and) the procedures should also address the control of dissemination of interim trial results within the sponsor organization.” If any employee of the sponsor participates in the review of the emerging data, it is difficult to provide complete assurance that the information was in fact not disseminated or that it had no effect on the conduct of the trial.
For this reason, I also recommend that the statistician member of the DSMB should not be associated with the study, and the operational statistician responsible for the conduct of the study and the final analyses of the study results should not attend the DSMB meetings. In order to maintain complete masking of the sponsor, it is preferable that the analyses for presentation to the DSMB should be conducted by an independent statistician outside of the company, possibly the statistician member of the DSMB. In this case, the data are provided to an external independent statistician who also has access to the unmasked treatment code and who then conducts the interim analyses for presentation only to the external DSMB in a closed meeting.
Finally, as stressed in the various guidelines, all criteria for early termination should be explicitly described in the protocol including the number of planned looks, the approximate information times, the outcomes to be monitored and the statistical techniques to be employed. The sponsor should also request that the DSMB maintain complete documentation of interim looks with archival of the interim database, the interim analyses, and the deliberations of the committee. With these recommendations, I believe it is possible to implement interim monitoring in appropriate industry-sponsored trials in a manner which will meet the trial objectives and also minimize the potential adverse effects that interim monitoring itself may have on the review of the study and its results.
DIA298tx.htm DIA meeting 2/98 in California
biostat.bsc.gwu.edu:8000/~jml/download/dia298tx.html |