Author: Andreas Karlsson, 2015-03-06
Use Cox regression to model the cause-specific survival of patients with skin melanoma (including all stages).
You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest")
for each package you require.
require(foreign) # for reading data set from Stata
require(survival) # for Surv and survfit
require(dplyr) # for data manipulation
Load the melanoma data and explore it.
melanoma_raw<- read.dta("http://biostat3.net/download/melanoma.dta")
melanoma <- melanoma_raw %>%
mutate(death_cancer = ifelse( status == "Dead: cancer", 1, 0))
(a) First fit the model with sex as the only explanatory variable. Does there appear to be a difference in survival between males and females?
# Cox regression with time-since-entry as the timescale
# Note that R uses the Efron method for approximating the likelihood in the presence
# whereas Stata (and most other software) use the Breslow method
cox1 <- coxph(Surv(surv_mm, death_cancer) ~ sex, method=c("breslow"), data=melanoma)
summary(cox1)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex, data = melanoma,
## method = c("breslow"))
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.46632 0.62731 0.04612 -10.11 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.6273 1.594 0.5731 0.6867
##
## Concordance= 0.562 (se = 0.006 )
## Rsquare= 0.013 (max possible= 0.985 )
## Likelihood ratio test= 103.2 on 1 df, p=0
## Wald test = 102.2 on 1 df, p=0
## Score (logrank) test = 104.1 on 1 df, p=0
(b) Is the effect of sex confounded by other factors (e.g. age, stage, subsite, period)? After controlling for potential confounders, does there still appear to a difference in survival between males and females?
cox2 <- coxph(Surv(surv_mm, death_cancer) ~ sex + agegrp + stage + subsite + year8594, method=c("breslow"), data=melanoma)
summary(cox2)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + agegrp +
## stage + subsite + year8594, data = melanoma, method = c("breslow"))
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.28893 0.74907 0.04865 -5.938 2.88e-09 ***
## agegrp45-59 0.23787 1.26854 0.06745 3.527 0.000421 ***
## agegrp60-74 0.54856 1.73077 0.06510 8.426 < 2e-16 ***
## agegrp75+ 1.02455 2.78585 0.07640 13.411 < 2e-16 ***
## stageLocalised 0.03761 1.03833 0.06869 0.548 0.584016
## stageRegional 1.56266 4.77152 0.09145 17.088 < 2e-16 ***
## stageDistant 2.60170 13.48664 0.08141 31.959 < 2e-16 ***
## subsiteTrunk 0.33157 1.39315 0.07064 4.694 2.69e-06 ***
## subsiteLimbs 0.03152 1.03202 0.07435 0.424 0.671604
## subsiteMultiple and NOS 0.26645 1.30532 0.10232 2.604 0.009214 **
## year8594Diagnosed 85-94 -0.23981 0.78677 0.04790 -5.006 5.55e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.7491 1.33499 0.6809 0.8240
## agegrp45-59 1.2685 0.78831 1.1115 1.4478
## agegrp60-74 1.7308 0.57778 1.5234 1.9663
## agegrp75+ 2.7858 0.35896 2.3984 3.2358
## stageLocalised 1.0383 0.96309 0.9075 1.1880
## stageRegional 4.7715 0.20958 3.9885 5.7082
## stageDistant 13.4866 0.07415 11.4977 15.8197
## subsiteTrunk 1.3932 0.71780 1.2130 1.6000
## subsiteLimbs 1.0320 0.96897 0.8921 1.1939
## subsiteMultiple and NOS 1.3053 0.76610 1.0681 1.5952
## year8594Diagnosed 85-94 0.7868 1.27101 0.7163 0.8642
##
## Concordance= 0.755 (se = 0.007 )
## Rsquare= 0.21 (max possible= 0.985 )
## Likelihood ratio test= 1836 on 11 df, p=0
## Wald test = 2542 on 11 df, p=0
## Score (logrank) test = 4023 on 11 df, p=0
(c) Consider the hypothesis that there exists a class of melanomas where female sex hormones play a large role in the etiology. These hormone related cancers are diagnosed primarily in women and are, on average, less aggressive (i.e., prognosis is good). If such a hypothesis were true we might expect the effect of sex to be modified by age at diagnosis (e.g., pre versus post menopausal). Test whether this is the case.
cox3 <- coxph(Surv(surv_mm, death_cancer) ~ agegrp + agegrp * sex, method=c("breslow"), data=melanoma)
summary(cox3)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ agegrp + agegrp *
## sex, data = melanoma, method = c("breslow"))
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## agegrp45-59 0.17990 1.19710 0.08501 2.116 0.03433 *
## agegrp60-74 0.40366 1.49730 0.08462 4.770 1.84e-06 ***
## agegrp75+ 0.84250 2.32216 0.10341 8.147 3.33e-16 ***
## sexFemale -0.78129 0.45782 0.10444 -7.481 7.41e-14 ***
## agegrp45-59:sexFemale 0.18821 1.20709 0.13870 1.357 0.17479
## agegrp60-74:sexFemale 0.44343 1.55804 0.13112 3.382 0.00072 ***
## agegrp75+:sexFemale 0.38839 1.47460 0.14855 2.614 0.00894 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## agegrp45-59 1.1971 0.8354 1.0134 1.4141
## agegrp60-74 1.4973 0.6679 1.2685 1.7674
## agegrp75+ 2.3222 0.4306 1.8961 2.8439
## sexFemale 0.4578 2.1843 0.3731 0.5618
## agegrp45-59:sexFemale 1.2071 0.8284 0.9198 1.5842
## agegrp60-74:sexFemale 1.5580 0.6418 1.2049 2.0146
## agegrp75+:sexFemale 1.4746 0.6782 1.1021 1.9730
##
## Concordance= 0.622 (se = 0.007 )
## Rsquare= 0.042 (max possible= 0.985 )
## Likelihood ratio test= 331.1 on 7 df, p=0
## Wald test = 306.2 on 7 df, p=0
## Score (logrank) test = 332.1 on 7 df, p=0
(d) Decide on a ‘most appropriate’ model for these data. Be sure to evaluate the proportional hazards assumption.
cox4 <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp + subsite + stage, method=c("breslow"), data=melanoma)
summary(cox4)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 +
## agegrp + subsite + stage, data = melanoma, method = c("breslow"))
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.28893 0.74907 0.04865 -5.938 2.88e-09 ***
## year8594Diagnosed 85-94 -0.23981 0.78677 0.04790 -5.006 5.55e-07 ***
## agegrp45-59 0.23787 1.26854 0.06745 3.527 0.000421 ***
## agegrp60-74 0.54856 1.73077 0.06510 8.426 < 2e-16 ***
## agegrp75+ 1.02455 2.78585 0.07640 13.411 < 2e-16 ***
## subsiteTrunk 0.33157 1.39315 0.07064 4.694 2.69e-06 ***
## subsiteLimbs 0.03152 1.03202 0.07435 0.424 0.671604
## subsiteMultiple and NOS 0.26645 1.30532 0.10232 2.604 0.009214 **
## stageLocalised 0.03761 1.03833 0.06869 0.548 0.584016
## stageRegional 1.56266 4.77152 0.09145 17.088 < 2e-16 ***
## stageDistant 2.60170 13.48664 0.08141 31.959 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.7491 1.33499 0.6809 0.8240
## year8594Diagnosed 85-94 0.7868 1.27101 0.7163 0.8642
## agegrp45-59 1.2685 0.78831 1.1115 1.4478
## agegrp60-74 1.7308 0.57778 1.5234 1.9663
## agegrp75+ 2.7858 0.35896 2.3984 3.2358
## subsiteTrunk 1.3932 0.71780 1.2130 1.6000
## subsiteLimbs 1.0320 0.96897 0.8921 1.1939
## subsiteMultiple and NOS 1.3053 0.76610 1.0681 1.5952
## stageLocalised 1.0383 0.96309 0.9075 1.1880
## stageRegional 4.7715 0.20958 3.9885 5.7082
## stageDistant 13.4866 0.07415 11.4977 15.8197
##
## Concordance= 0.755 (se = 0.007 )
## Rsquare= 0.21 (max possible= 0.985 )
## Likelihood ratio test= 1836 on 11 df, p=0
## Wald test = 2542 on 11 df, p=0
## Score (logrank) test = 4023 on 11 df, p=0
## Test proportional hazards assumption
cox.zph(cox4, transform="identity") #Stata appears to be using 'identity'
## rho chisq p
## sexFemale 0.03157 1.9330 1.64e-01
## year8594Diagnosed 85-94 -0.00805 0.1257 7.23e-01
## agegrp45-59 -0.00847 0.1387 7.10e-01
## agegrp60-74 -0.00901 0.1571 6.92e-01
## agegrp75+ -0.02301 1.0399 3.08e-01
## subsiteTrunk 0.01695 0.5764 4.48e-01
## subsiteLimbs 0.00398 0.0317 8.59e-01
## subsiteMultiple and NOS -0.00694 0.0901 7.64e-01
## stageLocalised 0.08211 12.8541 3.37e-04
## stageRegional -0.01781 0.6033 4.37e-01
## stageDistant -0.06603 7.9484 4.81e-03
## GLOBAL NA 82.2114 5.50e-13
## Stratified Cox model; separate baseline hazard functions are fit for each strata.
cox5 <- coxph(Surv(surv_mm, death_cancer) ~ sex + year8594 + agegrp + subsite + strata(stage), method=c("breslow"), data=melanoma)
summary(cox5)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex + year8594 +
## agegrp + subsite + strata(stage), data = melanoma, method = c("breslow"))
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.299474 0.741208 0.048745 -6.144 8.06e-10 ***
## year8594Diagnosed 85-94 -0.238634 0.787703 0.047835 -4.989 6.08e-07 ***
## agegrp45-59 0.233805 1.263398 0.067460 3.466 0.000529 ***
## agegrp60-74 0.550794 1.734631 0.065125 8.457 < 2e-16 ***
## agegrp75+ 1.013941 2.756441 0.076424 13.267 < 2e-16 ***
## subsiteTrunk 0.290084 1.336540 0.070570 4.111 3.95e-05 ***
## subsiteLimbs -0.004979 0.995034 0.074198 -0.067 0.946503
## subsiteMultiple and NOS 0.223498 1.250443 0.102597 2.178 0.029376 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.7412 1.3491 0.6737 0.8155
## year8594Diagnosed 85-94 0.7877 1.2695 0.7172 0.8651
## agegrp45-59 1.2634 0.7915 1.1069 1.4420
## agegrp60-74 1.7346 0.5765 1.5268 1.9708
## agegrp75+ 2.7564 0.3628 2.3730 3.2018
## subsiteTrunk 1.3365 0.7482 1.1639 1.5348
## subsiteLimbs 0.9950 1.0050 0.8604 1.1508
## subsiteMultiple and NOS 1.2504 0.7997 1.0227 1.5290
##
## Concordance= 0.645 (se = 0.01 )
## Rsquare= 0.035 (max possible= 0.969 )
## Likelihood ratio test= 274.4 on 8 df, p=0
## Wald test = 277.8 on 8 df, p=0
## Score (logrank) test = 284.7 on 8 df, p=0
## Test proportional hazards assumption
cox.zph(cox5, transform="identity") #Stata appears to be using 'identity'
## rho chisq p
## sexFemale 0.03384 2.2184 0.136
## year8594Diagnosed 85-94 -0.00468 0.0424 0.837
## agegrp45-59 -0.01010 0.1970 0.657
## agegrp60-74 -0.01413 0.3862 0.534
## agegrp75+ -0.02716 1.4492 0.229
## subsiteTrunk 0.01953 0.7614 0.383
## subsiteLimbs 0.00551 0.0604 0.806
## subsiteMultiple and NOS -0.00673 0.0842 0.772
## GLOBAL NA 5.1703 0.739
cox5 <- coxph(Surv(surv_mm, death_cancer) ~ sex * agegrp + year8594 + agegrp + subsite + strata(stage), method=c("breslow"), data=melanoma)
summary(cox5)
## Call:
## coxph(formula = Surv(surv_mm, death_cancer) ~ sex * agegrp +
## year8594 + agegrp + subsite + strata(stage), data = melanoma,
## method = c("breslow"))
##
## n= 7775, number of events= 1913
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sexFemale -0.491816 0.611515 0.105919 -4.643 3.43e-06 ***
## agegrp45-59 0.158709 1.171996 0.085332 1.860 0.0629 .
## agegrp60-74 0.437778 1.549262 0.084960 5.153 2.57e-07 ***
## agegrp75+ 0.895092 2.447562 0.104899 8.533 < 2e-16 ***
## year8594Diagnosed 85-94 -0.237908 0.788275 0.047824 -4.975 6.54e-07 ***
## subsiteTrunk 0.296690 1.345398 0.070678 4.198 2.70e-05 ***
## subsiteLimbs 0.002339 1.002342 0.074260 0.032 0.9749
## subsiteMultiple and NOS 0.231783 1.260847 0.102802 2.255 0.0242 *
## sexFemale:agegrp45-59 0.181340 1.198823 0.138902 1.306 0.1917
## sexFemale:agegrp60-74 0.269202 1.308920 0.131729 2.044 0.0410 *
## sexFemale:agegrp75+ 0.266506 1.305395 0.149243 1.786 0.0741 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sexFemale 0.6115 1.6353 0.4969 0.7526
## agegrp45-59 1.1720 0.8532 0.9915 1.3854
## agegrp60-74 1.5493 0.6455 1.3116 1.8300
## agegrp75+ 2.4476 0.4086 1.9927 3.0062
## year8594Diagnosed 85-94 0.7883 1.2686 0.7177 0.8657
## subsiteTrunk 1.3454 0.7433 1.1714 1.5453
## subsiteLimbs 1.0023 0.9977 0.8666 1.1594
## subsiteMultiple and NOS 1.2608 0.7931 1.0308 1.5423
## sexFemale:agegrp45-59 1.1988 0.8342 0.9131 1.5739
## sexFemale:agegrp60-74 1.3089 0.7640 1.0111 1.6945
## sexFemale:agegrp75+ 1.3054 0.7661 0.9743 1.7490
##
## Concordance= 0.645 (se = 0.01 )
## Rsquare= 0.035 (max possible= 0.969 )
## Likelihood ratio test= 279.2 on 11 df, p=0
## Wald test = 273.9 on 11 df, p=0
## Score (logrank) test = 285.7 on 11 df, p=0