|
|
||||||||
1From the Ocular Oncology Service, Department of Ophthalmology, Helsinki University Central Hospital, Helsinki, Finland; and the 2Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota.
| Abstract |
|---|
|
|
|---|
METHODS. Simulation was performed on a population-based data set of 133 patients who had an eye enucleated because of uveal melanoma. One thousand bootstrap samples of 80 patients were drawn, without replacement, according to three sampling strategies: a random draw (conventional strategy), a draw limited to patients who died of the tumor or survived at least 10 years without metastasis ("late-censoring" strategy), and a draw modified so that 40 patients died of melanoma and others survived at least 10 years without metastasis ("fifty-fifty" strategy). The bias in the Kaplan-Meier analysis and Cox proportional hazards regression was quantified.
RESULTS. The late-censoring strategy decreased the proportion of censored patients from 53% to 42%, whereas the fifty-fifty strategy assigned 50% of patients to this group. The former strategy overestimated mortality, the excess being 5.2% and 3.7% at 10 and 20 years, respectively. The latter strategy underestimated mortality, the bias being 1.6% and 4.6% at 10 and 20 years, respectively. The bias differed according to categories of explanatory variables so that the log-rank test statistic was inflated a median of 1.08 times (range, 0.731.87) and 1.14 times (range, 0.871.84), and the Wald
2 statistic of the Cox regression was inflated a median of 1.18 times (range, 0.792.13) and 1.16 times (range, 0.712.02), respectively, when the late-censoring and fifty-fifty strategies were applied.
CONCLUSIONS. Sampling strategies that exclude on purpose a proportion of patients who would be censored produce biased statistics, because they violate assumptions of survival analysis. Only random sampling from an underlying population produces unbiased survival estimates.
We used simulation to assess the bias that could be introduced by unconventional sampling into analysis of the mortality rate in uveal melanoma.
| Patients and Methods |
|---|
|
|
|---|
Independent Variables
A selection of categorical and continuous variables was used that were or were not significantly associated with melanoma-specific survival in the population data set.8 9 The variables chosen were gender; age at enucleation (divided in tertiles for Kaplan-Meier analysis); involvement of ciliary body by the tumor (not involved, involved); height and largest basal diameter of the tumor (LBD; divided in three categories for Kaplan-Meier analysis); presence of epithelioid cells (absent, present); grade of pigmentation (weak, strong); presence of microvascular loops and networks, consisting of at least three back-to-back loops,10 analyzed as an ordered categorical variable that considers networks to be an advanced stage of loops8 (no loops, loops without networks, loops forming networks); and microvascular density (MVD) obtained from the most highly vascularized area (hot spot) of the tumor (divided in quartiles for Kaplan-Meier analysis and square-root transformed for Cox regression).9 11 12
Bootstrapped Data Sets
Simulation was limited to 133 (80%) patients for whom there was complete data. Three bootstrap samples consisting of 1000 replications of data from 80 patients were drawn at random with replacement. The first set consisted of replications drawn regardless of outcome and follow-up time (conventional strategy). The second set consisted of replications drawn from a reduced sample of 108 patients, which excluded 25 patients who were lost to follow-up or had died of causes other than uveal melanoma within the first 10 years after enucleation ("late-censoring" strategy). The third set was also drawn from the reduced data set, but the bootstrapping algorithm was modified so that 40 (50%) patients in each replication had died of uveal melanoma and 40 (50%) had survived for 10 years or more ("fifty-fifty" strategy).
Statistical Methods
The data sets were constructed and the data were analyzed on computer (Stata statistical software, ver. 7.0; Stata Co., College Station, TX).
The number of events and censored patients during the early and late periods (before versus 10 years after enucleation) for each strategy were calculated as the mean of all replications. Survivor function was analyzed using the Kaplan-Meier product-limit method.13 An average survival curve representing each sampling strategy was obtained by plotting the survival for all 80,000 patients in each bootstrapped set. It can be shown that this curve closely approximates the survival curve that would be obtained by calculating point by point the average of the 1000 individual survival curves. Patients judged to die of causes unrelated to uveal melanoma were censored at the time of death. Confidence intervals were calculated according to Greenwood.14 Empiric confidence limits were obtained by plotting the individual survival curves for the first 100 replications in each bootstrapped set.
Survival curves by categories of each independent variable were constructed to assess bias in estimating survival proportions, as described for the entire data set. Categories were compared with the log-rank test.13 Bias related to hypothesis testing was quantified by calculating the ratio of the average log-rank test
2 statistic, obtained as the mean of all replications in each set, to that of the conventional strategy. The probability corresponding to the average
2 statistic was obtained by computer (StaTable, ver. 1.0.1; Cytel Co., Cambridge, MA).
Cox proportional hazards regression was used to calculate the average hazard ratio (HR) and average Wald
2 statistic13 15 for each independent variable, obtained as the mean of all replications in each set. In addition, a previously derived multivariate model was fitted.9 Bias related to HR was quantified by calculating the ratio of the corresponding HR estimate to the estimate obtained from data on the 133 patients in the starting data set. Bias related to hypothesis testing was quantified by calculating the ratio of the corresponding Wald
2 statistic to that obtained by the conventional strategy.
| Results |
|---|
|
|
|---|
|
The proportions of events and censored observations predicted by the simulation were consistent with those reported in published studies in which unconventional sampling was applied (Table 1) .
Survivorship Function
The Kaplan-Meier curve obtained by conventional sampling (Fig. 1A) was effectively indistinguishable from the observed survival of the 133 patients in the starting data set. The empiric bootstrapped confidence limits agreed with the 95% confidence interval calculated with the Greenwood method (Fig. 1A) .
|
The plot obtained by the fifty-fifty strategy initially slightly overestimated mortality, because fewer patients to be censored were enrolled than with the conventional design (50 vs. 52) and a few events more (31 vs. 29) occurred during the early period (Table 1) . Excess mortality was 0.4% (survival, 72.2% vs. 72.6%) by 5 years (Fig. 1C) . The number of patients to be censored soon became inflated, however, reaching a maximum by the start of the late period (Table 1 ; 40 vs. 27, respectively). A smaller decrease occurred with each subsequent event, resulting in progressive underestimation of mortality. Bias was 1.6% (survival, 61.2% vs. 59.6%) by 10 years, 3.0% (55.3% vs. 52.3%) by 15 years, and 4.6% (50.1% vs. 45.5%) by 20 years (Fig. 1C) . The empiric confidence interval converged toward 50% survival, dictated by the sampling design, and the Greenwood formula was invalid (Fig. 1C) .
Effect Size and Hypothesis Testing in Kaplan-Meier Analysis
Survival curves by categories of independent variables (Fig. 2) revealed that the excess mortality related to the late-censoring strategy was more pronounced in categories with poor survival (in which early deaths predominated and fewer patients remained to be censored late), whereas the mortality underestimate related to the fifty-fifty strategy was more pronounced in categories with good survival (in which early deaths were rare and many patients remained to be censored late). In both instances, the difference between categories was inflated (Figs. 2A 2B 2C 2D) . If the curves crossed, the difference between categories could decrease, however (Figs. 2E 2F 2G) .
|
The late-censoring strategy inflated the
2 statistic of the log-rank test by a median of 1.08 times (range, 0.731.87), and the fifty-fifty strategy inflated it by a median of 1.14 times (range, 0.871.84), with corresponding undue improvement in the probability in both cases (Table 2) .
|
|
2 statistic by a median of 1.18 times (range, 0.792.13), and the fifty-fifty strategy inflated it by a median of 1.16 times (range, 0.712.02), with corresponding undue improvement in the probability compared with the conventional strategy (Table 3) . With the multivariate model, the
2 statistics of explanatory variables were also generally inflated (Table 3) . | Discussion |
|---|
|
|
|---|
Based on the simulation, studies in which the late-censoring strategy was used have probably overestimated mortality by 4% to 5%.5 The bias was greater for subgroups with high mortality, and statistical significance probably was inflated. Similarly, studies that have used the fifty-fifty sampling design1 2 3 4 6 7 probably underestimated mortality by 3% to 5%. The bias for subgroups with low mortality was greater, which again inflated statistical significance to a variable extent. In addition, the fifty-fifty strategy is expected to increase the hazard ratio by a mean of 1.03 times. It is impossible to predict exact bias for individual covariates, however.
The use of unconventional sampling is not always clearly mentioned. In a study that evaluated insulin-like growth factor receptor as a predictor of metastasis, 36 patients were analyzed, of whom 18 (50%) died of uveal melanoma, 3 died of other diseases, and 15 were alive at the end of the follow-up.7 According to a table published in the report of that study, the follow-up times of those who died of tumor (1150 months) and those who were censored from the analysis (186245 months) were very disparate.7 These observations uncover a fifty-fifty sampling strategy.
Unconventional sampling is especially likely to lead to misinterpretation when the covariate is a weak predictor, sample size is small, or both. For example, the probable inflation of statistical significance associated with the fifty-fifty strategy casts doubt on the reported association between expression of insulin-like growth factor receptor and mortality in melanoma, which is of borderline significance (P = 0.035).7 The corresponding
2 statistic is 4.45. In the simulation, survival curves that similarly did not cross each other yielded
2 values that were 1.07 to 1.84 times larger than those obtained by conventional sampling. The correct
2 value consequently was probably between 2.42 and 4.16, corresponding to a true probability of 0.13 to 0.041.
In general, investigators who have used unorthodox sampling have not mentioned why these strategies were adopted,1 2 4 5 6 7 except once when they thought that variables associated with prognosis would be easier to spot and multiple covariates easier to compare.3 One might presume that they wanted to conduct a casecontrol type of study, which would be proper if the data were analyzed by logistic regression,16 a method that has been appropriately used to model survival in uveal melanoma.17 Another reason may be that a new assay was being tested on previously collected specimens and there were not enough resources to run the assay on every one.
Even though logistic regression is a proper way to compare survival by covariates between patients who had or had not died of melanoma by a given time point, it answers a different question than regression based on survival data.18 Logistic regression returns the log odds ratio of dying at a specified time point. Cox regression yields the relative hazard of death, which is assumed to be constant over time, unless time-dependent covariates are included in the model.15 18 Cox regression makes more efficient use of time-to-event data than does logistic regression, which disregards both data of patients who would be censored before the time point analyzed and the subsequent survival experience of patients who were alive at that point. Logistic regression is also indifferent to the exact time when death occurred. Survival analysis remains the most valid and the most efficient method to analyze time-to-event data.
A properly constructed Kaplan-Meier survival curve provides an unbiased estimate of the probability of survival, even in the extreme case of a single patient.19 Proper construction depends on three factors: correct recording of the time of entry, recording of the time of death or censoring, and the assumption that the chance of being censored is unrelated to the risk of dying.18 19 The survival curve is calculated on the basis of patients who are at risk of dying on each successive day.18 19 Censoring occurs when the time to death is unknown because of termination of the study, loss to follow-up, or withdrawal for other reasons.18 Censored observation should be incomplete only because of random factors, so that, conditionally on the values of all explanatory variables, the prognosis for any patient who has survived a certain time should not be affected if he or she is censored.15 20 In particular, censoring must be unrelated to future lifetime.21 Whether a patient is included in a survival study in the first place must also be determined before knowing the outcome.14
These principles are violated in the late-censoring and fifty-fifty strategies. In both strategies, enrollment and censoring depend on future lifetime. Moreover, both make censored patients initially "immortal," because they can die only during the late period. Only the patients who die of melanoma are truly at risk during the early period. The censored observations in these strategies bear resemblance to left-truncated data or delayed entry.15 21 These factors also distort the Cox regression analysis.
We hope that our simulation will aid those interested in survival statistics in general and mortality in uveal melanoma in particular and that it will help reviewers to recognize the unconventional sampling strategies described so that they can guide authors to use unbiased designs. Finally, we note that our treatise has not addressed the problem of competing risks in the analysis of melanoma-specific survival.22 23
| Footnotes |
|---|
Submitted for publication December 27, 2002; revised February 9, 2003; accepted February 18, 2003.
Disclosure: T. Kivelä, None; P. Grambsch, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Tero Kivelä, Ocular Oncology and Vitreoretinal Service, Department of Ophthalmology, Helsinki University Central Hospital, PL 220, Haartmaninkatu 4 C, FIN-00029 HUS, Helsinki, Finland; tero.kivela{at}helsinki.fi.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. Lopez-Pousa, J. Garre-Olmo, and J. Vilalta-Franch Reply Age Ageing, March 1, 2007; 36(2): 235 - 235. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |