Author: Annika Tillander, 2014-01-30
Edited: Andreas Karlsson, 2015-03-01, 2016-03-08
The aim of this exercise is to examine the effect of heavily grouped data (i.e., data with lots of ties) on estimates of survival made using the Kaplan-Meier method and the actuarial method.
For the patients diagnosed with localised skin melanoma, estimate the 10-year cause-specific survival proportion. Use both the Kaplan-Meier method and the actuarial method. Do this both with survival time recorded in completed years and survival time recorded in completed months. That is, you should obtain 4 separate estimates of the 10-year cause-specific survival proportion to complete the cells of the following table. The purpose of this exercise is to illustrate small differences between the two methods when there are large numbers of ties.
In order to reproduce the results in the printed solutions you’ll need to restrict to localised stage and estimate cause-specific survival (“Dead: cancer” indicates an event). Look at the code in the previous questions if you are unsure.
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
require(foreign) # for reading data set from Stata
require(survival) # for Surv and survfit
require(KMsurv)
require(dplyr) # for data manipulation
(a) Of the two estimates (Kaplan-Meier and actuarial) made using time recorded in years, which do you think is the most appropriate and why? [HINT: Consider how each of the methods handle ties.] (b) Which of the two estimates (Kaplan-Meier or actuarial) changes most when using survival time in months rather than years? Why?
melanoma_raw<- read.dta("http://biostat3.net/download/melanoma.dta")
melanoma <- melanoma_raw %>%
filter(stage=="Localised") %>%
mutate(year = floor(surv_yy),
month = floor(surv_mm),
death_cancer = ifelse( status == "Dead: cancer", 1, 0))
Actuarial method, using survival time in completed years.
melanomaByYear <- melanoma %>%
group_by(year) %>%
summarise(nevent = sum(death_cancer), nlost = length(death_cancer)-sum(death_cancer))
with(melanomaByYear, lifetab(c(year,tail(year,1)+1), nrow(melanoma), nlost, nevent))[,1:7]
## nsubs nlost nrisk nevent surv pdf hazard
## 0-1 5318 81 5277.5 71 1.0000000 0.013453340 0.013544449
## 1-2 5166 400 4966.0 228 0.9865467 0.045294531 0.046990932
## 2-3 4538 381 4347.5 202 0.9412521 0.043733854 0.047568586
## 3-4 3955 344 3783.0 138 0.8975183 0.032740556 0.037156704
## 4-5 3473 312 3317.0 100 0.8647777 0.026071080 0.030609122
## 5-6 3061 298 2912.0 80 0.8387066 0.023041391 0.027855153
## 6-7 2683 267 2549.5 56 0.8156652 0.017916162 0.022209003
## 7-8 2360 293 2213.5 35 0.7977491 0.012614058 0.015938069
## 8-9 2032 275 1894.5 34 0.7851350 0.014090573 0.018109188
## 9-10 1723 243 1601.5 16 0.7710445 0.007703223 0.010040791
## 10-11 1464 197 1365.5 18 0.7633412 0.010062352 0.013269443
## 11-12 1249 189 1154.5 17 0.7532789 0.011092023 0.014834206
## 12-13 1043 161 962.5 2 0.7421869 0.001542206 0.002080083
## 13-14 880 186 787.0 4 0.7406447 0.003764395 0.005095541
## 14-15 690 153 613.5 3 0.7368803 0.003603326 0.004901961
## 15-16 534 110 479.0 2 0.7332769 0.003061699 0.004184100
## 16-17 422 111 366.5 5 0.7302152 0.009962009 0.013736264
## 17-18 306 97 257.5 1 0.7202532 0.002797100 0.003891051
## 18-19 208 81 167.5 1 0.7174561 0.004283320 0.005988024
## 19-20 126 65 93.5 0 0.7131728 0.000000000 0.000000000
## 20-21 61 61 30.5 0 0.7131728 NA NA
Actuarial method, using survival time in completed months. Only showing 20 months around the 10th year.
melanomaByMonth <- melanoma %>%
group_by(month) %>%
summarise(nevent = sum(death_cancer), nlost = length(death_cancer)-sum(death_cancer))
with(melanomaByMonth, lifetab(c(month,tail(month,1)+1), nrow(melanoma), nlost, nevent))[110:130,1:7]
## nsubs nlost nrisk nevent surv pdf hazard
## 109-110 1699 27 1685.5 1 0.7701209 0.0004569095 0.0005934718
## 110-111 1671 16 1663.0 1 0.7696640 0.0004628166 0.0006015038
## 111-112 1654 26 1641.0 1 0.7692012 0.0004687393 0.0006095703
## 112-113 1627 27 1613.5 1 0.7687325 0.0004764379 0.0006199628
## 113-114 1599 19 1589.5 0 0.7682560 0.0000000000 0.0000000000
## 114-115 1580 21 1569.5 0 0.7682560 0.0000000000 0.0000000000
## 115-116 1559 26 1546.0 1 0.7682560 0.0004969315 0.0006470398
## 116-117 1532 20 1522.0 2 0.7677591 0.0010088819 0.0013149244
## 117-118 1510 14 1503.0 1 0.7667502 0.0005101465 0.0006655574
## 118-119 1495 14 1488.0 4 0.7662401 0.0020597852 0.0026917900
## 119-120 1477 12 1471.0 1 0.7641803 0.0005194971 0.0006800408
## 120-121 1464 11 1458.5 1 0.7636608 0.0005235933 0.0006858711
## 121-122 1452 9 1447.5 4 0.7631372 0.0021088420 0.0027672086
## 122-123 1439 13 1432.5 2 0.7610284 0.0010625178 0.0013971359
## 123-124 1424 15 1416.5 4 0.7599658 0.0021460384 0.0028278544
## 124-125 1405 25 1392.5 0 0.7578198 0.0000000000 0.0000000000
## 125-126 1380 15 1372.5 0 0.7578198 0.0000000000 0.0000000000
## 126-127 1365 16 1357.0 2 0.7578198 0.0011169046 0.0014749263
## 127-128 1347 25 1334.5 2 0.7567029 0.0011340620 0.0014998125
## 128-129 1320 15 1312.5 0 0.7555688 0.0000000000 0.0000000000
## 129-130 1305 16 1297.0 1 0.7555688 0.0005825511 0.0007713074
Kaplan-Meier estimates, using survival time in completed years.
mfit_years <- survfit(Surv(year, death_cancer) ~ 1, data = melanoma)
summary(mfit_years)
## Call: survfit(formula = Surv(year, death_cancer) ~ 1, data = melanoma)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0 5318 71 0.987 0.00157 0.984 0.990
## 1 5166 228 0.943 0.00320 0.937 0.949
## 2 4538 202 0.901 0.00420 0.893 0.909
## 3 3955 138 0.870 0.00483 0.860 0.879
## 4 3473 100 0.845 0.00530 0.834 0.855
## 5 3061 80 0.823 0.00571 0.811 0.834
## 6 2683 56 0.805 0.00603 0.794 0.817
## 7 2360 35 0.793 0.00627 0.781 0.806
## 8 2032 34 0.780 0.00657 0.767 0.793
## 9 1723 16 0.773 0.00675 0.760 0.786
## 10 1464 18 0.763 0.00703 0.750 0.777
## 11 1249 17 0.753 0.00737 0.739 0.768
## 12 1043 2 0.752 0.00743 0.737 0.766
## 13 880 4 0.748 0.00759 0.733 0.763
## 14 690 3 0.745 0.00779 0.730 0.760
## 15 534 2 0.742 0.00800 0.727 0.758
## 16 422 5 0.733 0.00882 0.716 0.751
## 17 306 1 0.731 0.00911 0.713 0.749
## 18 208 1 0.727 0.00972 0.709 0.747
Kaplan-Meier estimates, using survival time in completed months. Only showing 20 months around the 10th year.
mfit_months <- survfit(Surv(month, death_cancer) ~ 1, data = melanoma)
data.frame(summary(mfit_months)[c(2:4,6,8,10,9)])[110:130,]
## time n.risk n.event surv std.err lower upper
## 110 110 1671 1 0.7699985 0.006847733 0.7566935 0.7835375
## 111 111 1654 1 0.7695330 0.006859399 0.7562056 0.7830953
## 112 112 1627 1 0.7690600 0.006871471 0.7557094 0.7826465
## 113 115 1559 1 0.7685667 0.006884747 0.7551906 0.7821797
## 114 116 1532 2 0.7675634 0.006912219 0.7541345 0.7812313
## 115 117 1510 1 0.7670551 0.006926307 0.7535992 0.7807512
## 116 118 1495 4 0.7650027 0.006983376 0.7514373 0.7788131
## 117 119 1477 1 0.7644848 0.006997829 0.7508916 0.7783241
## 118 120 1464 1 0.7639626 0.007012505 0.7503412 0.7778312
## 119 121 1452 4 0.7618580 0.007071699 0.7481231 0.7758451
## 120 122 1439 2 0.7607992 0.007101397 0.7470072 0.7748457
## 121 123 1424 4 0.7586621 0.007161389 0.7447551 0.7728288
## 122 126 1365 2 0.7575505 0.007193902 0.7435811 0.7717823
## 123 127 1347 2 0.7564257 0.007227054 0.7423927 0.7707239
## 124 129 1305 1 0.7558461 0.007244723 0.7417792 0.7701797
## 125 130 1288 2 0.7546724 0.007280853 0.7405362 0.7690784
## 126 132 1249 1 0.7540682 0.007300052 0.7398952 0.7685126
## 127 133 1231 1 0.7534556 0.007319778 0.7392448 0.7679395
## 128 134 1214 1 0.7528350 0.007340013 0.7385854 0.7673594
## 129 135 1193 2 0.7515729 0.007381761 0.7372432 0.7661810
## 130 137 1165 2 0.7502826 0.007425255 0.7358696 0.7649779