Biostat III exercises in R

Laboratory exercise 4

Suggested solutions by

Author: Annika Tillander, 2014-01-30
Edited: Andreas Karlsson, 2015-03-01, 2016-03-08

Localised melanoma: Comparing actuarial and Kaplan-Meier approaches with discrete time data


The aim of this exercise is to examine the effect of heavily grouped data (i.e., data with lots of ties) on estimates of survival made using the Kaplan-Meier method and the actuarial method.

For the patients diagnosed with localised skin melanoma, estimate the 10-year cause-specific survival proportion. Use both the Kaplan-Meier method and the actuarial method. Do this both with survival time recorded in completed years and survival time recorded in completed months. That is, you should obtain 4 separate estimates of the 10-year cause-specific survival proportion to complete the cells of the following table. The purpose of this exercise is to illustrate small differences between the two methods when there are large numbers of ties.

In order to reproduce the results in the printed solutions you’ll need to restrict to localised stage and estimate cause-specific survival (“Dead: cancer” indicates an event). Look at the code in the previous questions if you are unsure.

You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

require(foreign)  # for reading data set from Stata
require(survival) # for Surv and survfit
require(KMsurv)
require(dplyr)    # for data manipulation

(a) Of the two estimates (Kaplan-Meier and actuarial) made using time recorded in years, which do you think is the most appropriate and why? [HINT: Consider how each of the methods handle ties.] (b) Which of the two estimates (Kaplan-Meier or actuarial) changes most when using survival time in months rather than years? Why?

melanoma_raw<- read.dta("http://biostat3.net/download/melanoma.dta")
melanoma <- melanoma_raw %>%
    filter(stage=="Localised") %>%
    mutate(year = floor(surv_yy),
           month = floor(surv_mm),
           death_cancer = ifelse( status == "Dead: cancer", 1, 0))

Actuarial method, using survival time in completed years.

melanomaByYear <- melanoma %>%
    group_by(year) %>%
    summarise(nevent = sum(death_cancer), nlost = length(death_cancer)-sum(death_cancer))
with(melanomaByYear, lifetab(c(year,tail(year,1)+1), nrow(melanoma), nlost, nevent))[,1:7]
##       nsubs nlost  nrisk nevent      surv         pdf      hazard
## 0-1    5318    81 5277.5     71 1.0000000 0.013453340 0.013544449
## 1-2    5166   400 4966.0    228 0.9865467 0.045294531 0.046990932
## 2-3    4538   381 4347.5    202 0.9412521 0.043733854 0.047568586
## 3-4    3955   344 3783.0    138 0.8975183 0.032740556 0.037156704
## 4-5    3473   312 3317.0    100 0.8647777 0.026071080 0.030609122
## 5-6    3061   298 2912.0     80 0.8387066 0.023041391 0.027855153
## 6-7    2683   267 2549.5     56 0.8156652 0.017916162 0.022209003
## 7-8    2360   293 2213.5     35 0.7977491 0.012614058 0.015938069
## 8-9    2032   275 1894.5     34 0.7851350 0.014090573 0.018109188
## 9-10   1723   243 1601.5     16 0.7710445 0.007703223 0.010040791
## 10-11  1464   197 1365.5     18 0.7633412 0.010062352 0.013269443
## 11-12  1249   189 1154.5     17 0.7532789 0.011092023 0.014834206
## 12-13  1043   161  962.5      2 0.7421869 0.001542206 0.002080083
## 13-14   880   186  787.0      4 0.7406447 0.003764395 0.005095541
## 14-15   690   153  613.5      3 0.7368803 0.003603326 0.004901961
## 15-16   534   110  479.0      2 0.7332769 0.003061699 0.004184100
## 16-17   422   111  366.5      5 0.7302152 0.009962009 0.013736264
## 17-18   306    97  257.5      1 0.7202532 0.002797100 0.003891051
## 18-19   208    81  167.5      1 0.7174561 0.004283320 0.005988024
## 19-20   126    65   93.5      0 0.7131728 0.000000000 0.000000000
## 20-21    61    61   30.5      0 0.7131728          NA          NA

Actuarial method, using survival time in completed months. Only showing 20 months around the 10th year.

melanomaByMonth <- melanoma %>%
    group_by(month) %>%
    summarise(nevent = sum(death_cancer), nlost = length(death_cancer)-sum(death_cancer))
with(melanomaByMonth, lifetab(c(month,tail(month,1)+1), nrow(melanoma), nlost, nevent))[110:130,1:7]
##         nsubs nlost  nrisk nevent      surv          pdf       hazard
## 109-110  1699    27 1685.5      1 0.7701209 0.0004569095 0.0005934718
## 110-111  1671    16 1663.0      1 0.7696640 0.0004628166 0.0006015038
## 111-112  1654    26 1641.0      1 0.7692012 0.0004687393 0.0006095703
## 112-113  1627    27 1613.5      1 0.7687325 0.0004764379 0.0006199628
## 113-114  1599    19 1589.5      0 0.7682560 0.0000000000 0.0000000000
## 114-115  1580    21 1569.5      0 0.7682560 0.0000000000 0.0000000000
## 115-116  1559    26 1546.0      1 0.7682560 0.0004969315 0.0006470398
## 116-117  1532    20 1522.0      2 0.7677591 0.0010088819 0.0013149244
## 117-118  1510    14 1503.0      1 0.7667502 0.0005101465 0.0006655574
## 118-119  1495    14 1488.0      4 0.7662401 0.0020597852 0.0026917900
## 119-120  1477    12 1471.0      1 0.7641803 0.0005194971 0.0006800408
## 120-121  1464    11 1458.5      1 0.7636608 0.0005235933 0.0006858711
## 121-122  1452     9 1447.5      4 0.7631372 0.0021088420 0.0027672086
## 122-123  1439    13 1432.5      2 0.7610284 0.0010625178 0.0013971359
## 123-124  1424    15 1416.5      4 0.7599658 0.0021460384 0.0028278544
## 124-125  1405    25 1392.5      0 0.7578198 0.0000000000 0.0000000000
## 125-126  1380    15 1372.5      0 0.7578198 0.0000000000 0.0000000000
## 126-127  1365    16 1357.0      2 0.7578198 0.0011169046 0.0014749263
## 127-128  1347    25 1334.5      2 0.7567029 0.0011340620 0.0014998125
## 128-129  1320    15 1312.5      0 0.7555688 0.0000000000 0.0000000000
## 129-130  1305    16 1297.0      1 0.7555688 0.0005825511 0.0007713074

Kaplan-Meier estimates, using survival time in completed years.

mfit_years <- survfit(Surv(year, death_cancer) ~ 1, data = melanoma)
summary(mfit_years)
## Call: survfit(formula = Surv(year, death_cancer) ~ 1, data = melanoma)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     0   5318      71    0.987 0.00157        0.984        0.990
##     1   5166     228    0.943 0.00320        0.937        0.949
##     2   4538     202    0.901 0.00420        0.893        0.909
##     3   3955     138    0.870 0.00483        0.860        0.879
##     4   3473     100    0.845 0.00530        0.834        0.855
##     5   3061      80    0.823 0.00571        0.811        0.834
##     6   2683      56    0.805 0.00603        0.794        0.817
##     7   2360      35    0.793 0.00627        0.781        0.806
##     8   2032      34    0.780 0.00657        0.767        0.793
##     9   1723      16    0.773 0.00675        0.760        0.786
##    10   1464      18    0.763 0.00703        0.750        0.777
##    11   1249      17    0.753 0.00737        0.739        0.768
##    12   1043       2    0.752 0.00743        0.737        0.766
##    13    880       4    0.748 0.00759        0.733        0.763
##    14    690       3    0.745 0.00779        0.730        0.760
##    15    534       2    0.742 0.00800        0.727        0.758
##    16    422       5    0.733 0.00882        0.716        0.751
##    17    306       1    0.731 0.00911        0.713        0.749
##    18    208       1    0.727 0.00972        0.709        0.747

Kaplan-Meier estimates, using survival time in completed months. Only showing 20 months around the 10th year.

mfit_months <- survfit(Surv(month, death_cancer) ~ 1, data = melanoma)
data.frame(summary(mfit_months)[c(2:4,6,8,10,9)])[110:130,]
##     time n.risk n.event      surv     std.err     lower     upper
## 110  110   1671       1 0.7699985 0.006847733 0.7566935 0.7835375
## 111  111   1654       1 0.7695330 0.006859399 0.7562056 0.7830953
## 112  112   1627       1 0.7690600 0.006871471 0.7557094 0.7826465
## 113  115   1559       1 0.7685667 0.006884747 0.7551906 0.7821797
## 114  116   1532       2 0.7675634 0.006912219 0.7541345 0.7812313
## 115  117   1510       1 0.7670551 0.006926307 0.7535992 0.7807512
## 116  118   1495       4 0.7650027 0.006983376 0.7514373 0.7788131
## 117  119   1477       1 0.7644848 0.006997829 0.7508916 0.7783241
## 118  120   1464       1 0.7639626 0.007012505 0.7503412 0.7778312
## 119  121   1452       4 0.7618580 0.007071699 0.7481231 0.7758451
## 120  122   1439       2 0.7607992 0.007101397 0.7470072 0.7748457
## 121  123   1424       4 0.7586621 0.007161389 0.7447551 0.7728288
## 122  126   1365       2 0.7575505 0.007193902 0.7435811 0.7717823
## 123  127   1347       2 0.7564257 0.007227054 0.7423927 0.7707239
## 124  129   1305       1 0.7558461 0.007244723 0.7417792 0.7701797
## 125  130   1288       2 0.7546724 0.007280853 0.7405362 0.7690784
## 126  132   1249       1 0.7540682 0.007300052 0.7398952 0.7685126
## 127  133   1231       1 0.7534556 0.007319778 0.7392448 0.7679395
## 128  134   1214       1 0.7528350 0.007340013 0.7385854 0.7673594
## 129  135   1193       2 0.7515729 0.007381761 0.7372432 0.7661810
## 130  137   1165       2 0.7502826 0.007425255 0.7358696 0.7649779