Biostat III exercises in R

Laboratory exercise 12

Suggested solutions by

Author: Andreas Karlsson, 2015-03-06

Cox model for cause-specific mortality for melanoma (all stages)

Use Cox regression to model the cause-specific survival of patients with skin melanoma (including all stages).


You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

require(foreign)  # for reading data set from Stata
require(survival) # for Surv and survfit
require(dplyr)    # for data manipulation

Load the melanoma data and explore it.

melanoma_raw<- read.dta("http://biostat3.net/download/melanoma.dta")
melanoma <- melanoma_raw %>%
    mutate(death_cancer = ifelse( status == "Dead: cancer", 1, 0))

(a) First fit the model with sex as the only explanatory variable. Does there appear to be a difference in survival between males and females?

# Cox regression with time-since-entry as the timescale
# Note that R uses the Efron method for approximating the likelihood in the presence
# whereas Stata (and most other software) use the Breslow method
cox1 <- coxph(Surv(surv_mm, death_cancer) ~ sex, method=c("breslow"), data=melanoma)
summary(cox1)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex, data = melanoma, 
##     method = c("breslow"))
## 
##   n= 7775, number of events= 1913 
## 
##               coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale -0.46632   0.62731  0.04612 -10.11   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##           exp(coef) exp(-coef) lower .95 upper .95
## sexFemale    0.6273      1.594    0.5731    0.6867
## 
## Concordance= 0.562  (se = 0.006 )
## Rsquare= 0.013   (max possible= 0.985 )
## Likelihood ratio test= 103.2  on 1 df,   p=0
## Wald test            = 102.2  on 1 df,   p=0
## Score (logrank) test = 104.1  on 1 df,   p=0

(b) Is the effect of sex confounded by other factors (e.g. age, stage, subsite, period)? After controlling for potential confounders, does there still appear to a difference in survival between males and females?

cox2 <- coxph(Surv(surv_mm, death_cancer) ~ sex + agegrp + stage + subsite + year8594, method=c("breslow"), data=melanoma)
summary(cox2)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + agegrp + 
##     stage + subsite + year8594, data = melanoma, method = c("breslow"))
## 
##   n= 7775, number of events= 1913 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.28893   0.74907  0.04865 -5.938 2.88e-09 ***
## agegrp45-59              0.23787   1.26854  0.06745  3.527 0.000421 ***
## agegrp60-74              0.54856   1.73077  0.06510  8.426  < 2e-16 ***
## agegrp75+                1.02455   2.78585  0.07640 13.411  < 2e-16 ***
## stageLocalised           0.03761   1.03833  0.06869  0.548 0.584016    
## stageRegional            1.56266   4.77152  0.09145 17.088  < 2e-16 ***
## stageDistant             2.60170  13.48664  0.08141 31.959  < 2e-16 ***
## subsiteTrunk             0.33157   1.39315  0.07064  4.694 2.69e-06 ***
## subsiteLimbs             0.03152   1.03202  0.07435  0.424 0.671604    
## subsiteMultiple and NOS  0.26645   1.30532  0.10232  2.604 0.009214 ** 
## year8594Diagnosed 85-94 -0.23981   0.78677  0.04790 -5.006 5.55e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.7491    1.33499    0.6809    0.8240
## agegrp45-59                1.2685    0.78831    1.1115    1.4478
## agegrp60-74                1.7308    0.57778    1.5234    1.9663
## agegrp75+                  2.7858    0.35896    2.3984    3.2358
## stageLocalised             1.0383    0.96309    0.9075    1.1880
## stageRegional              4.7715    0.20958    3.9885    5.7082
## stageDistant              13.4866    0.07415   11.4977   15.8197
## subsiteTrunk               1.3932    0.71780    1.2130    1.6000
## subsiteLimbs               1.0320    0.96897    0.8921    1.1939
## subsiteMultiple and NOS    1.3053    0.76610    1.0681    1.5952
## year8594Diagnosed 85-94    0.7868    1.27101    0.7163    0.8642
## 
## Concordance= 0.755  (se = 0.007 )
## Rsquare= 0.21   (max possible= 0.985 )
## Likelihood ratio test= 1836  on 11 df,   p=0
## Wald test            = 2542  on 11 df,   p=0
## Score (logrank) test = 4023  on 11 df,   p=0

(c) Consider the hypothesis that there exists a class of melanomas where female sex hormones play a large role in the etiology. These hormone related cancers are diagnosed primarily in women and are, on average, less aggressive (i.e., prognosis is good). If such a hypothesis were true we might expect the effect of sex to be modified by age at diagnosis (e.g., pre versus post menopausal). Test whether this is the case.

cox3 <- coxph(Surv(surv_mm, death_cancer) ~ agegrp + agegrp * sex, method=c("breslow"), data=melanoma)
summary(cox3)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ agegrp + agegrp * 
##     sex, data = melanoma, method = c("breslow"))
## 
##   n= 7775, number of events= 1913 
## 
##                           coef exp(coef) se(coef)      z Pr(>|z|)    
## agegrp45-59            0.17990   1.19710  0.08501  2.116  0.03433 *  
## agegrp60-74            0.40366   1.49730  0.08462  4.770 1.84e-06 ***
## agegrp75+              0.84250   2.32216  0.10341  8.147 3.33e-16 ***
## sexFemale             -0.78129   0.45782  0.10444 -7.481 7.41e-14 ***
## agegrp45-59:sexFemale  0.18821   1.20709  0.13870  1.357  0.17479    
## agegrp60-74:sexFemale  0.44343   1.55804  0.13112  3.382  0.00072 ***
## agegrp75+:sexFemale    0.38839   1.47460  0.14855  2.614  0.00894 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                       exp(coef) exp(-coef) lower .95 upper .95
## agegrp45-59              1.1971     0.8354    1.0134    1.4141
## agegrp60-74              1.4973     0.6679    1.2685    1.7674
## agegrp75+                2.3222     0.4306    1.8961    2.8439
## sexFemale                0.4578     2.1843    0.3731    0.5618
## agegrp45-59:sexFemale    1.2071     0.8284    0.9198    1.5842
## agegrp60-74:sexFemale    1.5580     0.6418    1.2049    2.0146
## agegrp75+:sexFemale      1.4746     0.6782    1.1021    1.9730
## 
## Concordance= 0.622  (se = 0.007 )
## Rsquare= 0.042   (max possible= 0.985 )
## Likelihood ratio test= 331.1  on 7 df,   p=0
## Wald test            = 306.2  on 7 df,   p=0
## Score (logrank) test = 332.1  on 7 df,   p=0

(d) Decide on a ‘most appropriate’ model for these data. Be sure to evaluate the proportional hazards assumption.

cox4 <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp + subsite + stage, method=c("breslow"), data=melanoma)
summary(cox4)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 + 
##     agegrp + subsite + stage, data = melanoma, method = c("breslow"))
## 
##   n= 7775, number of events= 1913 
## 
##                             coef exp(coef) se(coef)      z Pr(>|z|)    
## sexFemale               -0.28893   0.74907  0.04865 -5.938 2.88e-09 ***
## year8594Diagnosed 85-94 -0.23981   0.78677  0.04790 -5.006 5.55e-07 ***
## agegrp45-59              0.23787   1.26854  0.06745  3.527 0.000421 ***
## agegrp60-74              0.54856   1.73077  0.06510  8.426  < 2e-16 ***
## agegrp75+                1.02455   2.78585  0.07640 13.411  < 2e-16 ***
## subsiteTrunk             0.33157   1.39315  0.07064  4.694 2.69e-06 ***
## subsiteLimbs             0.03152   1.03202  0.07435  0.424 0.671604    
## subsiteMultiple and NOS  0.26645   1.30532  0.10232  2.604 0.009214 ** 
## stageLocalised           0.03761   1.03833  0.06869  0.548 0.584016    
## stageRegional            1.56266   4.77152  0.09145 17.088  < 2e-16 ***
## stageDistant             2.60170  13.48664  0.08141 31.959  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.7491    1.33499    0.6809    0.8240
## year8594Diagnosed 85-94    0.7868    1.27101    0.7163    0.8642
## agegrp45-59                1.2685    0.78831    1.1115    1.4478
## agegrp60-74                1.7308    0.57778    1.5234    1.9663
## agegrp75+                  2.7858    0.35896    2.3984    3.2358
## subsiteTrunk               1.3932    0.71780    1.2130    1.6000
## subsiteLimbs               1.0320    0.96897    0.8921    1.1939
## subsiteMultiple and NOS    1.3053    0.76610    1.0681    1.5952
## stageLocalised             1.0383    0.96309    0.9075    1.1880
## stageRegional              4.7715    0.20958    3.9885    5.7082
## stageDistant              13.4866    0.07415   11.4977   15.8197
## 
## Concordance= 0.755  (se = 0.007 )
## Rsquare= 0.21   (max possible= 0.985 )
## Likelihood ratio test= 1836  on 11 df,   p=0
## Wald test            = 2542  on 11 df,   p=0
## Score (logrank) test = 4023  on 11 df,   p=0
## Test proportional hazards assumption
cox.zph(cox4, transform="identity") #Stata appears to be using 'identity'
##                              rho   chisq        p
## sexFemale                0.03157  1.9330 1.64e-01
## year8594Diagnosed 85-94 -0.00805  0.1257 7.23e-01
## agegrp45-59             -0.00847  0.1387 7.10e-01
## agegrp60-74             -0.00901  0.1571 6.92e-01
## agegrp75+               -0.02301  1.0399 3.08e-01
## subsiteTrunk             0.01695  0.5764 4.48e-01
## subsiteLimbs             0.00398  0.0317 8.59e-01
## subsiteMultiple and NOS -0.00694  0.0901 7.64e-01
## stageLocalised           0.08211 12.8541 3.37e-04
## stageRegional           -0.01781  0.6033 4.37e-01
## stageDistant            -0.06603  7.9484 4.81e-03
## GLOBAL                        NA 82.2114 5.50e-13
## Stratified Cox model; separate baseline hazard functions are fit for each strata.
cox5 <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp + subsite + strata(stage), method=c("breslow"), data=melanoma)
summary(cox5)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 + 
##     agegrp + subsite + strata(stage), data = melanoma, method = c("breslow"))
## 
##   n= 7775, number of events= 1913 
## 
##                              coef exp(coef)  se(coef)      z Pr(>|z|)    
## sexFemale               -0.299474  0.741208  0.048745 -6.144 8.06e-10 ***
## year8594Diagnosed 85-94 -0.238634  0.787703  0.047835 -4.989 6.08e-07 ***
## agegrp45-59              0.233805  1.263398  0.067460  3.466 0.000529 ***
## agegrp60-74              0.550794  1.734631  0.065125  8.457  < 2e-16 ***
## agegrp75+                1.013941  2.756441  0.076424 13.267  < 2e-16 ***
## subsiteTrunk             0.290084  1.336540  0.070570  4.111 3.95e-05 ***
## subsiteLimbs            -0.004979  0.995034  0.074198 -0.067 0.946503    
## subsiteMultiple and NOS  0.223498  1.250443  0.102597  2.178 0.029376 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.7412     1.3491    0.6737    0.8155
## year8594Diagnosed 85-94    0.7877     1.2695    0.7172    0.8651
## agegrp45-59                1.2634     0.7915    1.1069    1.4420
## agegrp60-74                1.7346     0.5765    1.5268    1.9708
## agegrp75+                  2.7564     0.3628    2.3730    3.2018
## subsiteTrunk               1.3365     0.7482    1.1639    1.5348
## subsiteLimbs               0.9950     1.0050    0.8604    1.1508
## subsiteMultiple and NOS    1.2504     0.7997    1.0227    1.5290
## 
## Concordance= 0.645  (se = 0.01 )
## Rsquare= 0.035   (max possible= 0.969 )
## Likelihood ratio test= 274.4  on 8 df,   p=0
## Wald test            = 277.8  on 8 df,   p=0
## Score (logrank) test = 284.7  on 8 df,   p=0
## Test proportional hazards assumption
cox.zph(cox5, transform="identity") #Stata appears to be using 'identity'
##                              rho  chisq     p
## sexFemale                0.03384 2.2184 0.136
## year8594Diagnosed 85-94 -0.00468 0.0424 0.837
## agegrp45-59             -0.01010 0.1970 0.657
## agegrp60-74             -0.01413 0.3862 0.534
## agegrp75+               -0.02716 1.4492 0.229
## subsiteTrunk             0.01953 0.7614 0.383
## subsiteLimbs             0.00551 0.0604 0.806
## subsiteMultiple and NOS -0.00673 0.0842 0.772
## GLOBAL                        NA 5.1703 0.739
cox5 <- coxph(Surv(surv_mm, death_cancer) ~ sex * agegrp + year8594 + agegrp + subsite + strata(stage), method=c("breslow"), data=melanoma)
summary(cox5)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex * agegrp + 
##     year8594 + agegrp + subsite + strata(stage), data = melanoma, 
##     method = c("breslow"))
## 
##   n= 7775, number of events= 1913 
## 
##                              coef exp(coef)  se(coef)      z Pr(>|z|)    
## sexFemale               -0.491816  0.611515  0.105919 -4.643 3.43e-06 ***
## agegrp45-59              0.158709  1.171996  0.085332  1.860   0.0629 .  
## agegrp60-74              0.437778  1.549262  0.084960  5.153 2.57e-07 ***
## agegrp75+                0.895092  2.447562  0.104899  8.533  < 2e-16 ***
## year8594Diagnosed 85-94 -0.237908  0.788275  0.047824 -4.975 6.54e-07 ***
## subsiteTrunk             0.296690  1.345398  0.070678  4.198 2.70e-05 ***
## subsiteLimbs             0.002339  1.002342  0.074260  0.032   0.9749    
## subsiteMultiple and NOS  0.231783  1.260847  0.102802  2.255   0.0242 *  
## sexFemale:agegrp45-59    0.181340  1.198823  0.138902  1.306   0.1917    
## sexFemale:agegrp60-74    0.269202  1.308920  0.131729  2.044   0.0410 *  
## sexFemale:agegrp75+      0.266506  1.305395  0.149243  1.786   0.0741 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##                         exp(coef) exp(-coef) lower .95 upper .95
## sexFemale                  0.6115     1.6353    0.4969    0.7526
## agegrp45-59                1.1720     0.8532    0.9915    1.3854
## agegrp60-74                1.5493     0.6455    1.3116    1.8300
## agegrp75+                  2.4476     0.4086    1.9927    3.0062
## year8594Diagnosed 85-94    0.7883     1.2686    0.7177    0.8657
## subsiteTrunk               1.3454     0.7433    1.1714    1.5453
## subsiteLimbs               1.0023     0.9977    0.8666    1.1594
## subsiteMultiple and NOS    1.2608     0.7931    1.0308    1.5423
## sexFemale:agegrp45-59      1.1988     0.8342    0.9131    1.5739
## sexFemale:agegrp60-74      1.3089     0.7640    1.0111    1.6945
## sexFemale:agegrp75+        1.3054     0.7661    0.9743    1.7490
## 
## Concordance= 0.645  (se = 0.01 )
## Rsquare= 0.035   (max possible= 0.969 )
## Likelihood ratio test= 279.2  on 11 df,   p=0
## Wald test            = 273.9  on 11 df,   p=0
## Score (logrank) test = 285.7  on 11 df,   p=0